1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Google PDF Anomalies

Discussion in 'Google' started by T0PS3O, Oct 27, 2004.

  1. #1
    As reported by Leo...

    ... PDF SERPs have gone tits up.

    Try search for 'instructions' (one would expect a million instruction manuals) and set it to PDF only.

    http://www.google.com/search?num=10&hl=en&as_qdr=all&q=instructions+filetype:pdf&btnG=Search&meta=

    Notice just 3 results but Results 1 - 3 of about 4,640,000 for instructions filetype: pdf.

    Now if yuo mess around in the address bar, changing the results num ber (num) to 30, 50, 90 etc. you will get a few more. But still under 10 results....

    The Gooooooogle NEXT >> buttons are gone too...

    Very strange... Or is it just Leo, Jan and me seeing this?
     
    T0PS3O, Oct 27, 2004 IP
  2. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #2
    minstrel, Oct 27, 2004 IP
  3. Jan

    Jan Peon

    Messages:
    129
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Well, nobody said ALL pdf's are gone. Just a lot of them, maybe most of them. For more than a day now.

    E.g., a search on adobe filetype: pdf gives only 7 (!) results.

    :(
     
    Jan, Oct 28, 2004 IP
  4. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I thought it deserved its own thread since it was off-topic in the other one. And it isn't so much ducplicate since I had a closer look and posted more details.

    It is very strange indeed. Who wants to try and e-mail G?

    Minstrel, your post is here http://forums.digitalpoint.com/showpost.php?p=48281&postcount=27 and you proof exactly our point.

    See in your own links how just 3 to 7 results show up. No next page, no more results. But it does say 1-3 out of 4,000,000,whatever results.
     
    T0PS3O, Oct 28, 2004 IP
  5. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #5
    In my post in the other thread, those examples weren't an exhaustive list -- they were simply about 5 minutes work with a few test phrases, all of which produced some PDF files in the SERPs. That doesn't look unusual for me -- in the kinds of searches I do, I don't usually get a ton of PDFs returned. That was my point: it looks normal to me.
     
    minstrel, Oct 28, 2004 IP
  6. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #6
    But those pages I showed, and even those you linked to show exactly that it is far from normal.

    Why isn't there the next page link at the bottom?

    Why does it show 3 results only out of 4 million?

    If that looks normal to you (OK G has been behaving strange a bit) I'd recommend you get your eyes checked.
     
    T0PS3O, Oct 28, 2004 IP
  7. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #7
    Thanks -- I did recently get my eyes checked. But I don't think it's necessary to be insulting simply because I'm not seeing the problem you seem to see.

    No SERP contains a "next" link unless there are more than 10 (or whatever you have set as your default for the search) results to list.

    Did you see this sample search from my other post? http://www.google.com/search?hl=en&...2coff=1&as_qdr=all&q=psychopathy+filetype:pdf -- that one has a whole lot of "next" pages.

    And that is, as I said earlier, just 4 quick searches, not exhaustive research. I'm sure it would not be difficult to find other examples if I wanted to invest more than a few minutes to do it.

    I don't think this proves your point at all -- I think it demonstrates the opposite, actually. As for the "But it does say 1-3 out of 4,000,000,whatever results" part, isn't that just telling you there are 4,000,000 results of which 3 are PDF files? That number may be a little low for some searches (it would be for my example above) but it would be entirely normal for others.

    In my experience, the majority of PDF files on the net contain larger manuals or research papers, etc., that don't easily fit into a manageable HTML page, or documents that require special formatting, such as an application form. I would not therefore expect the SERPs to contain very many of them in most cases.

    That's what I meant by "it looks normal to me"...
     
    minstrel, Oct 28, 2004 IP
  8. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #8
    I didn't mean to be insulting at all. I appologize if it came across like that.

    I was just surprised to see that you thought all was fine whilst the 2 links I clicked from your examples just confirmed my findings.

    The ones limited to PDF files only seem to have this error in common.
     
    T0PS3O, Oct 28, 2004 IP
  9. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #9
    minstrel, Oct 28, 2004 IP
  10. Jan

    Jan Peon

    Messages:
    129
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Let's try to summarize:

    * Two persons in this forum (me and Leo) have reported losing all their pdf files from the G index;
    * A number of the highest ranking pdf files on the web (from Adobe) also seem to have disappeared from the G index;
    * Searches for "filetype: pdf" (without the space) or just "pdf" tend to return an unusually limited number of .pdf hits;
    * For some pdf files that are still found, the toolbar shows they are not ranked or indexed;
    * Some people (Minstrel c.s.) do not notice anything unusual;
    * Just 4 persons have discussed this topic in the last 36 hours; this could be because others - like Minstrel - do not notice anything unusual, or they may not be very interested or experienced because they don't use pdf files; or whatever.

    It seems to me a lot (or even most) pdf files have suddenly disappeared from the G index. What further tests could be done to clarify the situation?

    ----------

    Added example

    Google help page links to a pdf file. The pdf file is called The Anatomy of a Search Engine and is written by Sergey Brin and Lawrence Page. It is linked from a PR 10 page but not in the G index.

    Help page:

    http://www.google.com/help/features.html

    Under File Types we find the referenced article: The Anatomy of a Search Engine. View as HTML option does not work.

    URL of the pdf:

    http://www-db.stanford.edu/pub/papers/google.pdf

    Toolbar shows article is not indexed and pdf is not found in the G index...

    :D
     
    Jan, Oct 28, 2004 IP
  11. Jan

    Jan Peon

    Messages:
    129
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Jan, Oct 30, 2004 IP
  12. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #12
    How odd. Especially since, even in the posts you reference (as well as at least one of the examples I posted), it doesn't seem to affect ALL pdf searches.
     
    minstrel, Oct 30, 2004 IP
  13. leo

    leo Peon

    Messages:
    174
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #13
    I wouldn't say they have disappeared. I had a BL update recently (but AFTER having sent my very first mail regarding this matter) and the total of pages still indicated - that means, both normal html-pages as well as all my pdf-pages have been indexed by G!.

    Still, not a single one of these pdf-pages is shown by the site:xxx-command, and if I manipulate the settings of G! to show pdf ONLY I get ZERO returns after having typed the same command. Rather strange indeed, isn't it.

    Otherwise I completely agree with what was said by Jan.
     
    leo, Oct 31, 2004 IP
  14. Jan

    Jan Peon

    Messages:
    129
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #14
    Hi Leo,

    Could you elaborate a bit?

    Most of us had a BL update very recently. I had another one a few days earlier which may have been caused by the loss of pdf files, as I reported then. But since I first noticed the loss of pdf files, nothing seems to have changed, and I do not have any indication that they may still be there somehow.

    So, could you explain your observation in more detail? What search did you do, what result did you get, and what conclusion do you draw from it? If your conclusion is based on the number of backlinks reported, I would be very sceptical about it.
     
    Jan, Oct 31, 2004 IP
  15. minstrel

    minstrel Illustrious Member

    Messages:
    15,082
    Likes Received:
    1,243
    Best Answers:
    0
    Trophy Points:
    480
    #15
    Again, Jan, I do agree this is all very curious, but I wonder if you are focusing on the wrong questions here: Maybe it's not a PDF filetype issue at all but something else which is manifesting itself in PDF searches?

    Any attempts to figure out what is going on have to explain not only the absences you are seeing but the fact that searches like the one I listed above -- http://www.google.com/search?hl=en&...2coff=1&as_qdr=all&q=psychopathy+filetype:pdf -- show

    The "display as HTML" feature does work on those listings. But what is especially curious about the output for that URL is that results 1-10 are displayed, along with the usual navigation information for pages 1-10.

    However, when you click on either "Next page" or any of the number 1 to 10, you get a second page with only 4 listings and the notice

    and the navigation links now say

    There are no "view as HTML" options on that second page, by the way. The other odd thing is that although page 1 lists results 1-10, you get that second abbreviated page whether you click on "Next page" or any of the numbers in the Google string, even though clicking on the digit 5 for example adds "&start=40" to the search string.

    Clicking on repeat the search with the omitted results included generates the same results -- and the second page again is abbreviated as above -- except this time the search string has "&filter=0" appended.

    My preliminary conclusion is not that PDF files aren't in the index, but rather that something odd is happening with how those results are displayed...
     
    minstrel, Oct 31, 2004 IP
  16. leo

    leo Peon

    Messages:
    174
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Sorry about the confusion I may have caused. What I meant is the following:

    Two days or so after discovering that the site:xxx-command doesn't yield any of my pdfs I had a BL update as seen in the backlink table of the Keyword Tracker, which at the same time gives the total number of pages (i.e. html and pdfs altogether) - and there was no change in the total number of pages at the time of the BL update.

    From this I conclude that G! still indexes all my pages, i.e. html and pdf. Does that explain what you are asking?
     
    leo, Oct 31, 2004 IP
  17. Jan

    Jan Peon

    Messages:
    129
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #17
    Minstrel, interesting indeed. Looks like a bug to me; its nature is open for speculation.

    Leo, do you know what query your Keyword Tracker uses to report the total number of pages? Or where can it be found?

    In theory it is possible that the missing pdf files are still indexed in G, and the software simply does not show them. But so far I did not find a way to make them visible, or to prove their presence. They simply don't show up in any searches as they used to. The "site:" query does not show them. The toolbar - a separate mechanism - says they are not indexed. The "View as HTML" option does not work for them. Until further evidence I feel that my earlier statement is quite justified:

    It seems to me a lot (or even most) pdf files have suddenly disappeared from the G index.

    Can anyone demonstrate the presence (in the G index) of the Google pdf I mentioned before? - http://www-db.stanford.edu/pub/papers/google.pdf
     
    Jan, Oct 31, 2004 IP
  18. leo

    leo Peon

    Messages:
    174
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #18
    No idea. I am using the Digital Point Keyword Tracker, of course, so most probably one of the KT gurus will know. If necessary I could supply the settings I have chosen.
     
    leo, Nov 1, 2004 IP
  19. leo

    leo Peon

    Messages:
    174
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #19
    Good news: this morning all PDFs are back again! :D
     
    leo, Nov 1, 2004 IP
  20. T0PS3O

    T0PS3O Feel Good PLC

    Messages:
    13,219
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    0
    #20
    Good, it took them a while though!
     
    T0PS3O, Nov 2, 2004 IP