1. Advertising
    y u no do it?

    Advertising (learn more)

    Advertise virtually anything here, with CPM banner ads, CPM email ads and CPC contextual links. You can target relevant areas of the site and show ads based on geographical location of the user if you wish.

    Starts at just $1 per CPM or $0.10 per CPC.

Solution to Getting your PHPBB Forum Spidered - Part 1

Discussion in 'PHP' started by misohoni, Jul 4, 2004.

  1. #1
    Been doing a bit of searching and tested and found:

    - A list of Bots to spider your site effectively.
    - A way of stopping Session ID's
    - Creating pages which can be easily spidered.



    Ok, lets do this step by step (recommend backing up all files):


    CHECK IF YOUR SITE HAS SESSIONS

    Go to http://www.tools.summitmedia.co.uk/spider/ to check how your site is spidered. You should see the Session Id's there and perhaps unlinkable pages?


    STOPPING SID'S (Guests won't be able to post unless registered, but can't find a better way of stopping SIDS)

    #
    #-----[ OPEN ]------------------------------------------
    #
    includes/sessions.php

    #
    #-----[ FIND ]------------------------------------------
    #
    $SID = 'sid=' . $session_id;

    #
    #-----[ REPLACE WITH ]------------------------------------------
    #
    if ( $userdata['session_user_id'] != ANONYMOUS ){
    $SID = 'sid=' . $session_id;
    } else {
    $SID = '';
    }
    #---[EOM]----


    GETTING THE SITE SPIDERED

    See the post above above and add these spiders to the list in sessions.php:

    //
    // robots array all in lower case (feel free to add more robots)
    //
    $seRobots = array(
    'almaden.ibm.com',
    'appie 1.1',
    'architext',
    'ask jeeves',
    'asterias2.0',
    'augurfind',
    'baiduspider',
    'bannana_bot',
    'bdcindexer',
    'crawler',
    'crawler@fast',
    'docomo',
    'fast-webcrawler',
    'fluffy the spider',
    'frooglebot',
    'geobot',
    'googlebot',
    'gulliver',
    'henrythemiragorobot',
    'ia_archiver',
    'infoseek',
    'kit_fireball',
    'lachesis',
    'lycos_spider',
    'mantraagent',
    'mercator',
    'moget/1.0',
    'muscatferret',
    'nationaldirectory-webspider',
    'naverrobot',
    'ncsa beta',
    'netresearchserver',
    'ng/1.0',
    'osis-project',
    'polybot',
    'pompos',
    'scooter',
    'seventwentyfour',
    'sidewinder',
    'sleek spider',
    'slurp/si',
    'slurp@inktomi.com',
    'steeler/1.3',
    'szukacz',
    't-h-you-n-d-e-r-s-t-o-n-e',
    'teoma',
    'turnitinbot',
    'ultraseek',
    'vagabondo',
    'voilabot',
    'w3c_validator',
    'zao/0',
    'zyborg/1.0',
     
    misohoni, Jul 4, 2004 IP
  2. misohoni

    misohoni Notable Member

    Messages:
    1,717
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    200
    #2
    This is Part 2:

    CREATE A ROBOTS.TXT FILE

    This file stops spiders from accessing certain areas of your forum. Create a simple robots.txt file and save in the root. My Forums were held in "forums/", you should change this to your directory name. The robots.txt file should contain:

    User-agent: *
    Disallow: forums/admin/
    Disallow: forums/attach_mod/
    Disallow: forums/db/
    Disallow: forums/files/
    Disallow: forums/images/
    Disallow: forums/includes/
    Disallow: forums/language/
    Disallow: forums/templates/
    Disallow: forums/common.php
    Disallow: forums/config.php
    Disallow: forums/glance_config.php
    Disallow: forums/groupcp.php
    Disallow: forums/memberlist.php
    Disallow: forums/modcp.php
    Disallow: forums/posting.php
    Disallow: forums/printview.php
    Disallow: forums/privmsg.php
    Disallow: forums/profile.php
    Disallow: forums/ranks.php
    Disallow: forums/search.php
    Disallow: forums/statistics.php
    Disallow: forums/tellafriend.php
    Disallow: forums/viewonline.php
    Disallow: /your-forum-folder/sutra*.html$
    Disallow: /your-forum-folder/ptopic*.html$
    Disallow: /your-forum-folder/ntopic*.html$
    Disallow: /your-forum-folder/ftopic*asc*.html$
     
    misohoni, Jul 4, 2004 IP
  3. misohoni

    misohoni Notable Member

    Messages:
    1,717
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    200
    #3
    I can't insert the next lines of code since this forum limits me (you don't get this in PHPBB, haha).

    If you want the rest of the code, email me...
     
    misohoni, Jul 4, 2004 IP
  4. alsenor

    alsenor Peon

    Messages:
    63
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    It looks like something very useful, but I would need instructions what to do with it!
     
    alsenor, Jul 4, 2004 IP
  5. misohoni

    misohoni Notable Member

    Messages:
    1,717
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    200
    #5
    hmm, I don't understand your question. My instructions tell you what to do with it! You need to open the relevant PHPBB files and edit them
     
    misohoni, Jul 5, 2004 IP
  6. Nitin M

    Nitin M White/Gray/Black Hat

    Messages:
    640
    Likes Received:
    93
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Here are some more links to SEO'ing phpbb:

    http://www.computerbb.org/about580.html
    http://www.able2know.com/forums/about15132.html

    There are lots of different levels you can take it. I think on my boards at seopark ( http://www.seopark.com/forums ) I have taken it is far as it can go...

    1) limit front-end to only links and content that we want the spiders to follow when user is not logged in.

    2) modified links to be dates on main forum page if user is not logged in.

    3) make all links that are spiderable appear as static using isapirewrite

    I played with just about all the mods listed above and then customized based on my own requirements. If you get stuck or need help, you can PM me.
     
    Nitin M, Jul 5, 2004 IP
  7. alsenor

    alsenor Peon

    Messages:
    63
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #7
    You know that it is hard for a person on any level of expertise to imagine that his directions are anything but totally clear. However, we do know better, don't we?
    Attempting to go step by step, I went to summitmedia and entered my URL. Got a few pointers of value, and fixed them. Then I came back for the next step:
    and have no idea what to do with this! You see, to start with one has to understand the terms (stopping SIDS? what are SIDS - do I have those, and do I need to stop them, and where are they?).
    At that point, as so soften, the instruction session is over, because you speak a foreign language.
    Sorry, but this is no citicism. I know exactly how it comes about, because I am doing it to people now and then. You know exactly when you lost them, by the glazed look in their eyes.
    Like generation gaps, there are IT gaps, or whatever we want to call this.
    [​IMG]
     
    alsenor, Jul 5, 2004 IP
  8. debunked

    debunked Prominent Member

    Messages:
    7,298
    Likes Received:
    416
    Best Answers:
    0
    Trophy Points:
    310
    #8
    SIDs - Session ID's
     
    debunked, Jul 5, 2004 IP
  9. alsenor

    alsenor Peon

    Messages:
    63
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Thanks - that sounds reasonable, but are the instructions clearer now?
     
    alsenor, Jul 5, 2004 IP
  10. stephfoster

    stephfoster Well-Known Member

    Messages:
    567
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    138
    #10
    It's step by step. If your host doesn't have a way for you to edit files directly from your file manager, you'll need to use FTP to download to your machine, open a text editor, make the changes and upload the updated file. To make the changes, you need to open your sessions.php file in the includes directory, find the appropriate text and replace it as shown.

    I've done this kind of modification to my forums, and even though I could edit directly through my file manager, I found it simpler to drop the text into NotePad and use the find function to locate the areas that need to be changed. There can be a lot of information in these files, and finding the right place to make the changes can cause a headache.
     
    stephfoster, Jul 5, 2004 IP
  11. misohoni

    misohoni Notable Member

    Messages:
    1,717
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    200
    #11
    Alsenor, my instructions weren't written for a complete novice, but for users with some common knowledge of HTML works. Editing PHPBB a forum is pretty difficult!

    I edited the files using Dreamweaver and did a "Find and Replace".

    If you have any problems with PHPBB, I'd recommend taking a look at www.phpbb.com/forums/ or www.phpbbhacks.com
     
    misohoni, Jul 5, 2004 IP
  12. alsenor

    alsenor Peon

    Messages:
    63
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #12
    Well, a complete web design novice I am not, but I have not done any php work yet. Have to find time to go through Kevin Yank's book first, which I've planned for a long time already.

    My forums are asp & access based, and I did a lot of mods in it: http://www.ggholiday.com/bg/FORUMS/default.asp

    My Game and Adult sites are mostly htm and asp pages:
    The Battle Group: www.ggholiday.com/bg/
    Adult: www.erotical.list4.us/
    (Sorry, I am not allowed to post live links yet)

    All the same, although editing files on my servers is no problem, I still have no idea what you are getting at - in principle. Session ID is something I don't know about.
    Indulge me, please!
     
    alsenor, Jul 5, 2004 IP
  13. misohoni

    misohoni Notable Member

    Messages:
    1,717
    Likes Received:
    32
    Best Answers:
    0
    Trophy Points:
    200
    #13
    So you haven't even got PHPBB installed?
     
    misohoni, Jul 5, 2004 IP
  14. alsenor

    alsenor Peon

    Messages:
    63
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #14
    php and mySql are installed, as per Kevin's tutorial.
    But I didn't have time yet to go into the juicier parts of the book.
     
    alsenor, Jul 6, 2004 IP
  15. Help Desk

    Help Desk Well-Known Member

    Messages:
    1,365
    Likes Received:
    25
    Best Answers:
    0
    Trophy Points:
    180
    #15
    phpBB should really just add this to the default installation.
     
    Help Desk, Jul 6, 2004 IP
  16. alsenor

    alsenor Peon

    Messages:
    63
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #16
    You mean they should install automatically as per Kevin Yank's instructions? That may be taking it a step too far, since there are many variations of users.
     
    alsenor, Jul 6, 2004 IP
  17. Help Desk

    Help Desk Well-Known Member

    Messages:
    1,365
    Likes Received:
    25
    Best Answers:
    0
    Trophy Points:
    180
    #17
    The phpBB dev crew should either set the default install to be able to be spidered or create a flag/checkbox in the administration panel. If you have a forum that you don't want spidered, you should use the appropriate robots.txt file.
     
    Help Desk, Jul 6, 2004 IP
  18. alsenor

    alsenor Peon

    Messages:
    63
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #18
    I am much more ignorant about this sybject than you realize. What is phpBB?
    I suppose a bulletin board. I am only familiar with Snitz, which mine is based on.
    http://www.ggholiday.com/bg/FORUMS/default.asp
     
    alsenor, Jul 6, 2004 IP
  19. stephfoster

    stephfoster Well-Known Member

    Messages:
    567
    Likes Received:
    17
    Best Answers:
    0
    Trophy Points:
    138
    #19
    Ok, there's the problem. The instructions are meant for phpBB. If you're wanting to get other kinds of forums spidered, you'd have to find out how to remove session IDs for them. I don't know anything about the kind of forums you have installed, so I can't help you there.

    Now, for other kinds of php files, removing session ids is only something you need be concerned with if you're doing something that uses them. If not, don't worry about it.
     
    stephfoster, Jul 6, 2004 IP
  20. alsenor

    alsenor Peon

    Messages:
    63
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #20
    Snitz boards are a fine piece of work, but based on asp.
    They also have an excellent support group: http://forum.snitz.com/forum/default.asp

    Since I plan to work with php soon, I might as well find out now about phpBB - where can I get it?

    BTW, I think this board here is a fine design as well!
     
    alsenor, Jul 6, 2004 IP