1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Deep trouble, need some advice from experts

Discussion in 'PHP' started by fryman, Apr 14, 2005.

  1. #1
    Ok, I'm in deep trouble. To make a long story short, I noticed that my traffic was dropping like a rock. I did a site: check and almost fainted when I saw that my 6,000 indexed pages have dropped to 500

    This is the problem: I am using a gateway. When a user goes to my site and clicks on an image, he first gets sent to a gateway, he needs to download a small app., and then is forwarded to the image. But, the googlebot got trapped in the gateway, so it thought that all my pages were the same (meaning that it thought that all my images were the gateway page) and started to drop them because of duplicate content.

    So, I have lost thousands of pages that had my images indexed.

    The page calls the gateway from an include
    And gateway.php had this on it:

    Now, if I get rid of that include and just use that javascript code on my page, will the spider be able to continue to the image page? I have read that spiders don't follow javascript, so it shouldn't get stuck anymore, am I right?

    So, if I do that simple change and use this:

    Will the bot be able to crawl the pages like it was before I put up the gateway?

    Please help, I am trying to fix this horrible mess. :(

    Thanks
     
    fryman, Apr 14, 2005 IP
  2. nullbit

    nullbit Peon

    Messages:
    489
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I don't totally undertand the problem, but I think your suggested revision would not work, since the include is server side, meaning the revision would produce the exact same client side code that the bot reads ..
     
    nullbit, Apr 14, 2005 IP
  3. fryman

    fryman Kiss my rep

    Messages:
    9,604
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    370
    #3
    I need this to be crystal clear so people can help me out, let me know what don't you understand, but the problem is more or less like this:

    You go to my site and see an image you like. You click on it to go to the download page where you can download that image. But, before that, you are taken to a gateway page that says "in order to be able to download that image you need to install our website's appplication". Once you install that aplication, you are forwarded to the image and you can download it.

    Every image has the gateway in front of it, and the spider is stuck and can't index the download pages.

    You say
    Why? Can the bot read javascript?
     
    fryman, Apr 14, 2005 IP
  4. nullbit

    nullbit Peon

    Messages:
    489
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I'm not sure in this case, I know Google have said they can interpret some javascript.

    But , that wasn't my point, what I meant was that the actual HTML document produced (which is what the google bot will read) would seem to be identical in both cases, since all you done is placed the identical code directly in the specific file, instead of calling it via an include.
     
    nullbit, Apr 14, 2005 IP
  5. fryman

    fryman Kiss my rep

    Messages:
    9,604
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    370
    #5
    Any idea then of how can I solve this?
     
    fryman, Apr 14, 2005 IP
  6. nullbit

    nullbit Peon

    Messages:
    489
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #6
    If you remove the javascript part, does the page still load normally? If yes, then you could cloak the page.
     
    nullbit, Apr 14, 2005 IP
  7. rvarcher

    rvarcher Peon

    Messages:
    69
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Could you modify your code so that the Program Installation part is bypassed if you can identify the visitor as an SE? Maybe not the best solution but it's what came to mind. Similar to not starting a PHP session if the visitor is a bot.
     
    rvarcher, Apr 14, 2005 IP
  8. nullbit

    nullbit Peon

    Messages:
    489
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Um, that's what I just said, "cloak the page"
     
    nullbit, Apr 14, 2005 IP
  9. rvarcher

    rvarcher Peon

    Messages:
    69
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #9
    must have posted while I was composing it.
     
    rvarcher, Apr 14, 2005 IP
  10. nullbit

    nullbit Peon

    Messages:
    489
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #10
    Heh, nevermind, you explained it better than me anyhow
     
    nullbit, Apr 14, 2005 IP
  11. rvarcher

    rvarcher Peon

    Messages:
    69
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    0
    #11
    I'm just glad someone was thinking along the same lines I was. At least I'm not totally off base.
     
    rvarcher, Apr 14, 2005 IP
  12. fryman

    fryman Kiss my rep

    Messages:
    9,604
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    370
    #12
    Yes, if I remove the javascript code when you click on an image you go straight to the download page.

    I already lost over 4,000 pages, last thing I need is to get banned and end up with 0 traffic due to cloaking :D

    I was hoping there was another way
     
    fryman, Apr 14, 2005 IP
  13. nullbit

    nullbit Peon

    Messages:
    489
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #13
    Can you post the actual JS file, or atleast the important parts of it? I think that would make the problem clearer
     
    nullbit, Apr 14, 2005 IP
  14. fryman

    fryman Kiss my rep

    Messages:
    9,604
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    370
    #14
    I assume you are talking about the gateway page, but since it is provided by the advertising network I'm affiliated to I don't think I can go around posting it on forums, I have PM'd you the details, check it out and let me know if you figure out some solution

    Thanks!
     
    fryman, Apr 14, 2005 IP
  15. nullbit

    nullbit Peon

    Messages:
    489
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #15
    OK, you could probably solve this easily if you could rewrite the JS file yourself, but since it's hosted on your affiliates server you can't.

    My only suggestion would be to something like this:

    - Create a js file on your server, call it something like gateway.js for example. The js file should contain a function which uses document.write method to write in the html call (using script tag and the src attribute) to the the js file on your affiliates server. The function should take two arguments, one for the val field, and one for productid field, and then use these in the affiliate JS url.

    - The top of your download page should only include js file on your server (which will in turn include the affiliate js file). It will need to pass the val and productid parameters to the function in your js file, the correct values can be filled in server side via PHP.

    - Disallow the JS file on your server in your robots.txt.

    The idea is, the page will act normally for regular users. But, so long as all bots respect the robots.txt file, the js file on your affiliates site will never be called by them, and they should load the rest of the page as normal.
     
    nullbit, Apr 14, 2005 IP
  16. fryman

    fryman Kiss my rep

    Messages:
    9,604
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    370
    #16
    What I did was move the javascript code right to the end of the page. So what is happening now is that you are sent to the download page, and a second later the gateway jumps in.
    However, the download page IS getting loaded now before being forwarded to the gateway page, so I think the spider will be able to index it now.


    You say
    Can't I do that with the code I already have there? Why do I need to change the code and then disallow it? And how do you even do that? I know how to block spiders from pages, but how do you block them from javascript codes?
     
    fryman, Apr 14, 2005 IP
  17. nullbit

    nullbit Peon

    Messages:
    489
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #17
    No, since the main JS file is hosted on the affiliate site, so adding it to robots.txt will not work.

    My suggestion puts a buffer between the affiliate JS file, and the user/spider. By excluding the buffer JS file (on your server) in your robots.txt, the crawlers will never attempt to load the affiliate JS file.
     
    nullbit, Apr 14, 2005 IP
  18. fryman

    fryman Kiss my rep

    Messages:
    9,604
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    370
    #18
    Nullbit, is there some way I can delay a JS from loading? For example, in this case

    <script language="javascript" src="http://somesite.com/gateway.aspx?productid=3333&val=3333"></script>

    Is what is bringing up the gateway, how can I make this code wait 2 seconds before executing?
     
    fryman, Apr 14, 2005 IP
  19. nullbit

    nullbit Peon

    Messages:
    489
    Likes Received:
    19
    Best Answers:
    0
    Trophy Points:
    0
    #19
    You can do it via PHP:

    
    // code to show dowload page
    flush();
    sleep(2);
    include("gateway.php");
    
    PHP:
    It's far from a perfect solution. That will only work if the closing </html> tag has been output before sleep is called, otherwise the page might not render in some browsers until after the include("gateway.php"); line.

    It can be done with javascript as well, but because the javascript is in a external file and you can't modify it, it gets slightly complex.
     
    nullbit, Apr 14, 2005 IP
  20. fryman

    fryman Kiss my rep

    Messages:
    9,604
    Likes Received:
    777
    Best Answers:
    0
    Trophy Points:
    370
    #20
    What I meant is how to have this code
    <script language="javascript" src="http://somesite.com/gateway.aspx?productid=3333&val=3333"></script>

    Wait for 5 seconds before loading
     
    fryman, Apr 15, 2005 IP