Ok, I'm in deep trouble. To make a long story short, I noticed that my traffic was dropping like a rock. I did a site: check and almost fainted when I saw that my 6,000 indexed pages have dropped to 500 This is the problem: I am using a gateway. When a user goes to my site and clicks on an image, he first gets sent to a gateway, he needs to download a small app., and then is forwarded to the image. But, the googlebot got trapped in the gateway, so it thought that all my pages were the same (meaning that it thought that all my images were the gateway page) and started to drop them because of duplicate content. So, I have lost thousands of pages that had my images indexed. The page calls the gateway from an include And gateway.php had this on it: Now, if I get rid of that include and just use that javascript code on my page, will the spider be able to continue to the image page? I have read that spiders don't follow javascript, so it shouldn't get stuck anymore, am I right? So, if I do that simple change and use this: Will the bot be able to crawl the pages like it was before I put up the gateway? Please help, I am trying to fix this horrible mess. Thanks
I don't totally undertand the problem, but I think your suggested revision would not work, since the include is server side, meaning the revision would produce the exact same client side code that the bot reads ..
I need this to be crystal clear so people can help me out, let me know what don't you understand, but the problem is more or less like this: You go to my site and see an image you like. You click on it to go to the download page where you can download that image. But, before that, you are taken to a gateway page that says "in order to be able to download that image you need to install our website's appplication". Once you install that aplication, you are forwarded to the image and you can download it. Every image has the gateway in front of it, and the spider is stuck and can't index the download pages. You say Why? Can the bot read javascript?
I'm not sure in this case, I know Google have said they can interpret some javascript. But , that wasn't my point, what I meant was that the actual HTML document produced (which is what the google bot will read) would seem to be identical in both cases, since all you done is placed the identical code directly in the specific file, instead of calling it via an include.
If you remove the javascript part, does the page still load normally? If yes, then you could cloak the page.
Could you modify your code so that the Program Installation part is bypassed if you can identify the visitor as an SE? Maybe not the best solution but it's what came to mind. Similar to not starting a PHP session if the visitor is a bot.
Yes, if I remove the javascript code when you click on an image you go straight to the download page. I already lost over 4,000 pages, last thing I need is to get banned and end up with 0 traffic due to cloaking I was hoping there was another way
Can you post the actual JS file, or atleast the important parts of it? I think that would make the problem clearer
I assume you are talking about the gateway page, but since it is provided by the advertising network I'm affiliated to I don't think I can go around posting it on forums, I have PM'd you the details, check it out and let me know if you figure out some solution Thanks!
OK, you could probably solve this easily if you could rewrite the JS file yourself, but since it's hosted on your affiliates server you can't. My only suggestion would be to something like this: - Create a js file on your server, call it something like gateway.js for example. The js file should contain a function which uses document.write method to write in the html call (using script tag and the src attribute) to the the js file on your affiliates server. The function should take two arguments, one for the val field, and one for productid field, and then use these in the affiliate JS url. - The top of your download page should only include js file on your server (which will in turn include the affiliate js file). It will need to pass the val and productid parameters to the function in your js file, the correct values can be filled in server side via PHP. - Disallow the JS file on your server in your robots.txt. The idea is, the page will act normally for regular users. But, so long as all bots respect the robots.txt file, the js file on your affiliates site will never be called by them, and they should load the rest of the page as normal.
What I did was move the javascript code right to the end of the page. So what is happening now is that you are sent to the download page, and a second later the gateway jumps in. However, the download page IS getting loaded now before being forwarded to the gateway page, so I think the spider will be able to index it now. You say Can't I do that with the code I already have there? Why do I need to change the code and then disallow it? And how do you even do that? I know how to block spiders from pages, but how do you block them from javascript codes?
No, since the main JS file is hosted on the affiliate site, so adding it to robots.txt will not work. My suggestion puts a buffer between the affiliate JS file, and the user/spider. By excluding the buffer JS file (on your server) in your robots.txt, the crawlers will never attempt to load the affiliate JS file.
Nullbit, is there some way I can delay a JS from loading? For example, in this case <script language="javascript" src="http://somesite.com/gateway.aspx?productid=3333&val=3333"></script> Is what is bringing up the gateway, how can I make this code wait 2 seconds before executing?
You can do it via PHP: // code to show dowload page flush(); sleep(2); include("gateway.php"); PHP: It's far from a perfect solution. That will only work if the closing </html> tag has been output before sleep is called, otherwise the page might not render in some browsers until after the include("gateway.php"); line. It can be done with javascript as well, but because the javascript is in a external file and you can't modify it, it gets slightly complex.
What I meant is how to have this code <script language="javascript" src="http://somesite.com/gateway.aspx?productid=3333&val=3333"></script> Wait for 5 seconds before loading