1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Issues with query and Sitemap

Discussion in 'Google Sitemaps' started by swapshop, Jul 8, 2006.

  1. #1
    Can we get advice on how we are attempting to do our sitemaps

    Mod Rewrite .htaccess

    RewriteEngine on
    #RewriteRule ^~(.*)$ /profile.php?profile=$1 [L]
    RewriteRule ^([a-zA-Z0-9]*).html detail.php?siteid=$1

    Our site works like this

    http://www.swapshop.co.nz/classifieds2/detail.php?siteid=1295

    we want to remove the ?siteid=1295 and make to 1295.html

    Our scripts creates four files
    Rewritten
    http://www.swapshop.co.nz/sitemap.txt
    http://www.swapshop.co.nz/sitemap.xml
    Orginal
    http://www.swapshop.co.nz/sitemap2.txt
    http://www.swapshop.co.nz/sitemap2.xml

    So what we do is this

    Mod rewrite is working then this should go from

    http://www.swapshop.co.nz/classifieds2/detail.php?siteid=1295

    to

    http://www.swapshop.co.nz/classifieds2/1295.html

    Which it does

    example here: www.swapshop.co.nz/classifieds2/1295.html

    So at the moment we have the following sitemaps

    sitemap.txt Mod rewitten txt file
    sitemap.xml Mod rewitten XML file
    sitemap2.txt Orginal txt file
    sitemap2.xml Orginal XML

    We currently submit sitemap.xml Mod rewitten XML file

    Can some one check this is make sure we have it right? Ie is the format ok as google was complaining about a blank line at the start which is now removed.

    sitemap.xml is the complete list of current adverts as this is a classifieds site.
    we had a issue where the host rewrote our .htaccess to std and lost a lot of hits from google. Google site maps report this issue.

    Question is it better to submit a txt or XML file

    The idea of the Mod rewrite from details.php?site=666 to 666.htm is to remove the query string. We will get if to retrieve the advert catergory instead of advert number once we work this issue out.

    Our issue is checking web stats I see a huge amount of direct hits to details.php with out the query string

    See here http://www.swapshop.co.nz/classifieds2/detail.php

    This page is just to tell us no query was listed. This page only had contact email details and if a engine hit this url it would get lost. We have now put a link to the site here to allow the engines to continue.

    Is it possbile to put the sitemap here whether sitemap.xml or a html map?

    Other things we do are

    RewriteRule ([a-zA-Z0-9]*)\.htm$ http://www.swapshop.co.nz/classifieds2/detail.php?siteid=$1

    RewriteCond %{HTTP_HOST} ^swapshop\.co.nz
    RewriteRule ^(.*)$ http://www.swapshop.co.nz/$1 [R=301,L]

    Robots.txt block google and only google to the deatils.php where google gets the info from the submitted site map.

    Details.php and Index.php strip sessions out.

    <?
    // Use this to start a session only if the UA is *not* at search engine
    $searchengines=array("Google", "Fast", "Slurp", "Ink", "Atomz", "Scooter", "Crawler", "MSNbot", "Poodle", "Genius");
    $is_search_engine=0;
    foreach($searchengines as $key => $val) {
    //if(strstr("$HTTP_USER_AGENT", $val)) {
    if(strstr($_SERVER['HTTP_USER_AGENT'], $val)) {
    $is_search_engine++;
    }
    } if($is_search_engine==0) {
    // visitor is not a search engine - start the session
    ini_set("session.save_handler", "files");
    session_start();
    //You can put anything else in here that needs to be hidden from search engines
    } else {
    // visitor is a search engine - Put anything you want only a search engine to see in here
    }

    Thanks in advance
     
    swapshop, Jul 8, 2006 IP
  2. MaxPowers

    MaxPowers Peon

    Messages:
    261
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #2
    This is a confusing post to follow, but the one thing I see is that you are rewriting your URLs to hide the ugly version and replace i twith a prettier version...

    In your mod_rewrite rules, you keep using [L] which forces the client (browser, Google) to change the URL... This works the way you intend when switching your domain from the non-www. version to the www. version, but it's nearly pointless to 'hide' ugly URLs then add the [L] to the rewrite rules... it sets the URL back to the ugly version. This may explain why Google has visited the ugly versioned URL so often.

    Perhaps a better way to get around your sessionIDs is to send them a sitemap without the sessionIDs in the URLs. It looks as if your script at the end is setup to show different content to SE's than to human visitors which may get you blacklisted. Your script is checking against 'Google' but some variations of the 'googlebot' use a lower case G and your script will miss it.

    AutoMapIt lets you setup sessionID's as an 'ignore' parameter... The URL will be listed in your sitemap, but the sessionID will be removed from your query string.

    Forgive me if I misunderstood what you are asking, but I was confused reading this at times.
     
    MaxPowers, Jul 13, 2006 IP
  3. Cryogenius

    Cryogenius Peon

    Messages:
    1,280
    Likes Received:
    118
    Best Answers:
    0
    Trophy Points:
    0
    #3
    I've done something similar on my web site. The trick is to make all the old urls completely disappear from the site. Do this by redirecting any requests for details.php to the rewritten URLs:

    RewriteRule ^detail.php?siteid=([a-zA-Z0-9]*) http://www_domain_com/path/$1 [R=301]
    
    Code (markup):
    The 301 redirect is important to tell the bots that the page has moved permanently. They will gradually get the hint. You could return 404 error for the requests for detail.php without a siteid.

    I would dump your old sitemaps as well as your txt ones.

    Hope this helps...

    Cryo.
     
    Cryogenius, Jul 21, 2006 IP
  4. MaxPowers

    MaxPowers Peon

    Messages:
    261
    Likes Received:
    5
    Best Answers:
    0
    Trophy Points:
    0
    #4
    I was wondering where this thread went... nevermind my last comments, I was completely wrong about the L versus the r=301. That was a drunken post that has ached in the back of my head since I left it :)

    Sure wish the edit button that post still worked...
     
    MaxPowers, Jul 22, 2006 IP