Hello. I just noticed today that a PDF on my site is displaying PR7 in the Google toolbar: http://www.nesfiles.com/NES/Zelda/Zelda.pdf The home page is PR5, so this really amazed me. Most PDFs are PR2 or so. I immediately did a link: search on google to see who was linking to me, and how many. I got this: http://www.google.com/search?hl=en&q=link:www.nesfiles.com/NES/Zelda/Zelda.pdf Only 5 links, but from 2 PR7's, and one PR8. Funny thing is, I can't find a link to my site on any of those-- not even in the google cache'd version. I can't understand why a page on that site at that high of a level would link to me, either. Did Google make a mistake? I'm kinda disapointed this PR7 is a PDF file, and not a regular HTML page-- meaning I can't take advantage of it for PR transfer to my pages that I need more traffic on(right?)... I thought about replacing it with an html file and link to the original PDF and some other pages... But then, chances are, I'll lose the PR on the next update, anyways... Thoughts? Theories? Thanks. -- Derek
That's what I was thinking, too. I wonder if it parses links. I'd have to change the PDF file, of course... Wonder how hard that is-- I originally created it with a tool to make PDF from images... -- Derek
Download a trial version of Acrobat. Just add a link to your header and footer. Including download and install, I'd say you are talking about a 30 minute project tops. I'll do it for you if you give me a link in there
You underestimate me here OT: If it can receive PR, I'd imagine it can transfer it away again as well. Else the entire model would fail. With those 5 links (and possible/likely more unreported ones) it's not impossible to get the PR7. You could 301 the pdf to the homepage so the PR isn't lost and transfered entirely. Then put it up again somewhere else and link to it to be nice to those looking for it. Not sure for how long Google will like that strategy and how quick they'll assume the pdf is gone and mark them as dead links...
On the off chance that you would take me up on that... http://www.agilelive.com/Zelda.pdf Here you go!
Well my 2 cents says use acrobat to insert desired (relevant) linking and keep the page live as it is. Why fix it if it aint broke! Google does parse PDFs. T0PS3O's advice is good... But a little risky for me. The big question is: Is this flawed PR value? or Is there something especially important about this page? I suspect the latter as all pages are scans (unspiderable text). I see nothing overly interesting from an SEO perspective? I would enjoy the ride and try and pass it ASAP.
That's exactly what I just did. Just a simple link back to my main page. We'll see if it does anything-- while hopeful, I don't realistically expect to get anything out of it, and I expect the PR to drop on the PDF to 2 or 3 on the next update-- particularly since I can NOT find any links on those pages that Google did... I wish there was an easy way to convert all those PDFs to spiderable text... It sure would help SEO value... Thanks for your input, everyone. -- Derek
Yes there is a way to convert this. Try OCR scanning. It will take a bit of time but I think It will be worth it. Good luck
The problem, is there's 300+ of these manuals, most 15-20 pages, but some 80+ pages. It took many, many, many hours to just scan them. Then crop them nicely, then resize them, stamp them, and make them into PDF's... I can't imagine how many hours it'd be to do that again with OCR... If there was some sort of tool that could OCR a PDF, matching fonts, handling busy backgrounds, etc., that'd be awesome-- but I think that's just a dream....... -- Derek
Well I use a program call Pagis pro 3.0. This programm allows you to open multiple scanned pages and ocr scan. If you resolution is decent to begin with the accuracy of OCR is good. Best case scenerio- Open all scans and OCR scan. In minutes you will have text. Worst Case Scenerio- You may have to rescan if resolution is not high enough. You say it will take along time...and it may. The bigger question is...will it be worth it? That question you'll have to answer. You sound well informed with SEO, and that's what this comes down to.