Get your directory fully indexed (tutorial + php code)

Discussion in 'Directories' started by ErectADirectory, Feb 14, 2007.

  1. #1
    Are you tired of not having your entire directory indexed? Me too! I just fixed the problem and want to offer you my solution. It's all about rss/xml feeds.

    I was looking at my stats the other day and noticed something weird. I have an rss.php page (root of the domain) that was my 2nd most popular page with 1284 views in 14 days. The odd part is that it is not linked anywhere on my site.

    I do have 2 rss/xml feeds published on my directory, one is for the newly accepted sites and the other is for the most recently updated pages. Both of these feeds are located in their own private directory (/pages/rss.php & /sites/rss.php) These 2 pages only had 24 & 26 views in the same 14 days and they are linked on the footer of every page!

    So what happened? The huge quantity of hits tells me that it is all robots (logs confirmed). The obvious answer is that robots/spiders love feeds because they are so link and content rich.

    Lets work with what is happening and use this to our advantage. Spiders have the unique ability to tell a search engine what to index and stick in their db for SERPs. You see, if the pages of your directory never get spidered, your whole web site will never get indexed and the links (and content) inside your directory will be worthless. Most spiders will start at the home page and follow links but never get to those really deep links. Today you are going to fix this!!!!

    Step 1: Creating an rss / xml feed

    Bear with the code, it is a necessary evil as we are creating a rss/xml. What we are doing below is querying our database for the most recently approved listings. When we find a new record, we do not want to insert their url in the xml, we want to list the page they are located on in our directory. This way the spider visits our inner page and finds their link there. Get it?

    <?php
    
    header("Content-type: text/xml\n\n");
    
    // Need pointers on the below code? 
    // Visit the PHP section of DP
    // There is a good chance they will be answered there.
    
    // Connect to db
    include_once("db_connect.php") ;?>
    
    <?xml version="1.0" ?>
    <rss version="2.0">
    <channel>
    
    <title>Directory Name - Your directory slogan goes here</title>
    <description>An original description of your site goes here</description>
    <link>http://www.mydirectory.com/</link>
          
    <?php
    // table_of_links = the table where you store your links 
    // in openld its 'openld_links'
    // approved = sites you have reviewed and approved, not just links submitted
    // in openld its 'active'
    $sqlquery = "SELECT * FROM table_of_links WHERE approved = 1 ORDER BY date_submitted DESC LIMIT 0, 50";
    $result = mysql_query($sqlquery);
    if ($result)
      {
      $i = 0 ;
      $number = mysql_numrows($result);
      while ($i < $number && $i < 50)
        {?>
    <item>
    <?
    // get the title, descritpion and category for this record
        $link_title = mysql_result($result,$i,"title");
        $link_desc = mysql_result($result,$i,"description");
        $link_cat_id = mysql_result($result,$i,"category_id");  
    
    //  get the category title for this category id
    //  needed to build the url
    //  table_of_categories = the table where you store your categories
    //  in openld its 'openld_categories'
            $sqlqueryNo2 = "SELECT title FROM table_of_categories WHERE id = ".$link_cat_id." ORDER BY id DESC";
            $resultNo2 = mysql_query($sqlqueryNo2);
            $category_name = mysql_result($resultNo2,0,"title")
            
        ?>
    <title><?= htmlentities($link_title) ?></title>
    <description><?= htmlentities($link_desc) ?></description>
    <link>http://www.mydirectory.com/TOP/<?= $link_cat_id ?>/<?= $category_name ?>/</link>
    <guid>http://www.mydirectory.com/TOP/<?= $link_cat_id ?>/<?= $category_name ?>/</guid>
    </item>
    
    <?
        $i ++ ;
        }
      }
    ?>
    
    </channel>
    
    </rss>
    Code (markup):
    As I mentioned in the script, please do not post inquiries on how to make this work here. I briefly commented them in the code but if you need more explanation do a search or post questions in the PHP section of DP.

    Once you get the feed working save it at as rss.php in your root directory. I would suggest linking to it in your footer but as you can tell by my story that might not be necessary.

    There are better ways to force indexing of an entire web site such as storing the user-agent in a db and then feeding the spiders exactly what you want to to eat. This is way beyond the scope of this post. Besides there is a side benefit of listing the category after a new link gets approved. This page just changed (link added) so the spider SHOULD visit this page again.

    I just implemented this about an hour ago, before this rss.php did list the sites that were recently accepted it just showed their url and not my url where their listing is located. Since I just put the code on my site I'll give some stats here so that you can see how well it has worked. My directory was launched a few weeks ago and currently has 43 pages indexed in google, 48 indexed in msn & 29 in yahoo.

    Enjoy comments and green appreciated!
     
    ErectADirectory, Feb 14, 2007 IP
    britishguy, Obelia and wwws like this.
  2. britishguy

    britishguy Prominent Member

    Messages:
    7,949
    Likes Received:
    892
    Best Answers:
    0
    Trophy Points:
    360
    #2
    Thanks for this information will get our techy to look at all the info and we will do an evaluation

    Good OP :)
     
    britishguy, Feb 14, 2007 IP
  3. rajun

    rajun Peon

    Messages:
    71
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Thanks. I will try it. :)
     
    rajun, Feb 14, 2007 IP
  4. TheBest

    TheBest Peon

    Messages:
    70
    Likes Received:
    8
    Best Answers:
    0
    Trophy Points:
    0
    #4
    good tutorial, i believe above code will work with PHPLD script
     
    TheBest, Feb 14, 2007 IP
  5. ErectADirectory

    ErectADirectory Guest

    Messages:
    656
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #5
    A little update a few hours later 102 pages indexed by google, 49 by msn and still 29 by yahoo. I'll check again later as it might be an blip from a different datacenter. I cannot believe it worked this well, perhaps I have some caching issues with my SEO quake plugin for firefox.

    Googlebot has been to my site recently but most of the action has been from msn bot and Y! slurp. These guys have been all over the place today. Let's see how long it will take to get 100+ pages indexed on MSN or Y! For statistical purposes I have 131 categories in my directory and about 100 more pages for a total of around 250 pages.

    By the way: I don't know if anyone picked up on this but this rss/xml feed will not escort the bots to all of your pages, just all of the ones who have recently had a link added. It will take some tweaking to get a total index but probably nothing more than switching "ORDER BY date_submitted DESC" to "ORDER BY date_submitted ASC" for a few days.
     
    ErectADirectory, Feb 14, 2007 IP
  6. Obelia

    Obelia Notable Member

    Messages:
    2,083
    Likes Received:
    171
    Best Answers:
    0
    Trophy Points:
    210
    #6
    Good suggestion ErectaDirectory.

    By the way, I'm seeing your RSS feed displayed as plain text. I'm not sure whether that effects the validity of the feed. You may need to put this at the top of the page:

    <?php
    header("Content-type: text/xml\n\n");
    ?>
     
    Obelia, Feb 14, 2007 IP
  7. ErectADirectory

    ErectADirectory Guest

    Messages:
    656
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #7
    Obelia,

    Thanks for the green and comment, I edited the code above and added your revision. You made me look into feed validation and wouldn't you know it, it didn't validate. That does not mean the feed didn't get read, just that it was not "grammatically correct". The feed is plain text (not html) and it should contain -- header("Content-type: text/xml\n\n"); -- thanks again

    My next issue to address is that I have multiple guid's (links) that point to the same page. This is unavoidable as in directories since you can approve more that one link on each category page. Since we are referencing our url (not theirs) this will happen every time 2 people get approved in the same category.

    This could be addressed in the code but why? The whole purpose of this is to get bots to follow our links to find the pages of our directory. The code accomplishes this. KISS

    This rss / xml feed is also useful if you are on a shared server and are not allowed to respond to submissions via email. Just point them out to your feed and say "If it's important that you are accepted just bookmark the category page or subscribe to the rss feed".
     
    ErectADirectory, Feb 14, 2007 IP
  8. ! Ask !

    ! Ask ! Peon

    Messages:
    3,282
    Likes Received:
    260
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Nice tip, I don’t have problems indexing my deep pages, but I will try it.
     
    ! Ask !, Feb 14, 2007 IP
  9. Obelia

    Obelia Notable Member

    Messages:
    2,083
    Likes Received:
    171
    Best Answers:
    0
    Trophy Points:
    210
    #9
    You need a "group by" clause. Something like "group by category_id", whatever you call the unique identifier for your category. That should fix the dupe issue.
     
    Obelia, Feb 14, 2007 IP
  10. an0n

    an0n Prominent Member

    Messages:
    5,688
    Likes Received:
    915
    Best Answers:
    0
    Trophy Points:
    360
    #10
    my question is; can you obtain mid level PR on deep pages even if they are not getting indexed. hrmmmmm.
     
    an0n, Feb 14, 2007 IP
  11. ErectADirectory

    ErectADirectory Guest

    Messages:
    656
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #11
    So far this has been a totally successful experiment!!!

    It has been 4 days since I started this thread. I will now update things here and and attempt to explain how effective this has been for getting my site indexed.

    Google index = 158 pages (+ 56)
    MSN index = 117 pages (+ 68)
    Y! index = 174 pages (+143)

    Not bad for just a few days. I have added a tool for my benefit on the left bar of my pages to show the time of and page of the last visit from the 3 above bots. Slurp visits the rss feed I created daily and comes around about 3x as much as the others. I guess it makes sense that their rate of index is much faster than the other 2.

    For those of you looking to duplicate this in your directories, do not expect the same results as above. I am at an advantage because each of my pages serves up fresh content regularly. The more you change your pages the more often bots visit, they like fresh content. The point of this experiment was not to get the bots visiting more frequently but to escort them to the correct pages via our rss feed (the ones that we just added a new link to).

    I dig the rhetorical question but for the benefit of our readers, no. Even if you could, what would be the point. In order for the google juice to be passed on to the links search engines must find and index the sites contained in our directories. If bots don't index our site they will not index our links either, therefore the links will count for absolutely nothing.

    For those of you looking to buy links from directories let me give you a piece of advice. If you are buying your link for SEO benefits, Check the PR and whether your category's page has been indexed before buying. I have seen some very big players in our industry that have PR0 on every page because they get all their links to the home page then have the directory on a different section of their site with no link from the home page. This passes on no PR to the directory.

    As an example of what you should look for I will point to An0n's Directory Dump. Follow along on his site if you desire: home page (PR5) --> Internet (PR5) --> Directories (PR4) --> SEO Directory (PR3). Do you see how the PR floats downstream? This is how a directory should flow, not a PR7 homepage with no link to a PR0 directory. This offers no value to the links contained.

    I'll check back to this thread in a few days with another update.

    Until then
     
    ErectADirectory, Feb 18, 2007 IP
  12. songchai

    songchai Well-Known Member

    Messages:
    503
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    118
    #12
    I love you for this solutiion. Great Really Great for new common on Directory business
     
    songchai, Feb 18, 2007 IP
  13. wwws

    wwws Notable Member

    Messages:
    3,385
    Likes Received:
    285
    Best Answers:
    0
    Trophy Points:
    225
    #13
    Doesn't the phpld comes with the RSS and all it takes is to activate it? Is it that the same as the one you are trying to implement? Thanks!
     
    wwws, Feb 18, 2007 IP
  14. ErectADirectory

    ErectADirectory Guest

    Messages:
    656
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #14
    I honestly have no idea, I've never used phplinksdirectory. My dir is a hacked openld which, as far as I remember, had no rss feed included. It would not surprise me if this was a feature available in most paid versions of directory scripts, especially as a plug-in.

    Are you looking to implement this with your phpld directory? If so just post the directory fields that apply and I'll show you how to implement it.
     
    ErectADirectory, Feb 18, 2007 IP
  15. ruby

    ruby Well-Known Member

    Messages:
    1,854
    Likes Received:
    40
    Best Answers:
    1
    Trophy Points:
    125
    #15
    OK I have coded the RSS feed up for my directory but at the bottom I needed to use the htmlspecialchars function to get it to generate without errors. Works a charm! Interested to see if it makes a diff!!
     
    ruby, Feb 19, 2007 IP
  16. ErectADirectory

    ErectADirectory Guest

    Messages:
    656
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #16
    Excellent, a couple of questions for our readers. How long did it take you to implement? What was your flavor of directory script (openld, phpld, lynx, esyndicate, homegrown, etc.)?

    I would love for you to post your results here as it would help give creditability to this little experiment.
     
    ErectADirectory, Feb 19, 2007 IP
  17. agnivo007

    agnivo007 Peon

    Messages:
    4,290
    Likes Received:
    289
    Best Answers:
    0
    Trophy Points:
    0
    #17
    I'd prefer to be naturally indexed than some sort of forced indexing...and so far natural indexing has been quite smooth for me.
     
    agnivo007, Feb 19, 2007 IP
  18. ErectADirectory

    ErectADirectory Guest

    Messages:
    656
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #18
    While I see many, many benefits of natural link building I see absolutely none for natural indexing, except laziness. Perhaps you can educate me and show me what benefits I am missing.

    Aside from just getting your site indexed quickly, creating this RSS / XML file has another side benefit. This feed will always contain your most recent link additions at the top of the feed so spiders will be sure to index these page changes quickly. This might not mean much to you, but I am sure your advertisers will love it because this means they get credit for their backlink very quickly.
     
    ErectADirectory, Feb 19, 2007 IP
  19. ErectADirectory

    ErectADirectory Guest

    Messages:
    656
    Likes Received:
    65
    Best Answers:
    0
    Trophy Points:
    0
    #19
    Today makes exactly 1 week from the start of this post so I will post my 7 day numbers then I am going to let this thread rest in peace and hopefully inspire someone down the road.

    Google index = 301 (+258)
    MSN index = 111 (+63)
    Y! index = 211 (+182)

    Not bad for a weeks work. Now I am going to put the feed on cruise control and something that is actually worth while.
     
    ErectADirectory, Feb 21, 2007 IP
  20. m1t0s1s

    m1t0s1s Peon

    Messages:
    125
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #20
    isn't the pagerank link dampening factor like 0.85?
    For example, you've got a pagerank 4 page with only one link, multiply 4 by 0.85, and that will give you approximate pagerank that the link will pass on?
     
    m1t0s1s, Feb 24, 2007 IP