Building a crawler

Discussion in 'Google Sitemaps' started by Personaltrainer, May 10, 2007.

  1. #1
    Hi,
    We are in the process of building a cutomised site crawlers. We are quiet successful in building one. But I have a question for the expert coders. Is it possible to fetch last modified data of a page from anywhere if so how is it done?
     
    Personaltrainer, May 10, 2007 IP
  2. infoway

    infoway Guest

    Messages:
    145
    Likes Received:
    2
    Best Answers:
    0
    Trophy Points:
    0
    #2
    please let me know your budget at my PM, we can discuss, already done crawler development project.
     
    infoway, May 23, 2007 IP
  3. MaxPowers

    MaxPowers Well-Known Member

    Messages:
    264
    Likes Received:
    5
    Best Answers:
    1
    Trophy Points:
    120
    #3
    The server will give up Last-Mod in the headers... other than that, servers are built securely enough that they don't release this info unless configured to do so.

    Apache does it, IIS does it... PHP usually breaks it since the pages are technically created 'on the fly'. You would need to incorporate code like I have on AutoMapIt that adds these headers to the pages (http://www.automapit.com/servhead.zip), but that is on the webmaster to add to their site. From the Spiders perspective, it's all about what comes via HTTP headers. Either you get it or you don't... depends on the server setup and other factors.
     
    MaxPowers, Jun 6, 2007 IP