MSN not crawling site much.

Discussion in 'All Other Search Engines' started by MBenkert, Dec 3, 2006.

  1. #1
    Hi,

    In the past MSN has been the search engine that brought me the most traffic. With my new site, it is spidering the least.

    Is there any way to get msn to crawl more? Do they accept sitemaps?

    I have a sitemap submitted to google and yahoo and this has helped incredibly.

    Thanks so much!

    Mindy
     
    MBenkert, Dec 3, 2006 IP
  2. jitendra

    jitendra Peon

    Messages:
    50
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #2
    No msn is also crawling the site
     
    jitendra, Dec 4, 2006 IP
  3. hhheng

    hhheng Banned

    Messages:
    2,633
    Likes Received:
    37
    Best Answers:
    0
    Trophy Points:
    0
    #3
    If you have some budget, try to submit to Looksmart PPC directory, which was previous the data center of MSN. Otherwise, place links to other sites and wait.
     
    hhheng, Dec 4, 2006 IP
  4. 010081

    010081 Banned

    Messages:
    4,657
    Likes Received:
    222
    Best Answers:
    0
    Trophy Points:
    0
    #4
    i have same problem too,MSN is too lazy to crawl my site.
     
    010081, Dec 4, 2006 IP
  5. khasmoth

    khasmoth Well-Known Member

    Messages:
    1,211
    Likes Received:
    96
    Best Answers:
    0
    Trophy Points:
    165
    #5
    It's because MSN is still upgrading their search algorithm aka live.com
    I think this issue has been around for 2 months now.

    Too bad for people who rely their traffic from MSN.
     
    khasmoth, Dec 4, 2006 IP
  6. jitendra

    jitendra Peon

    Messages:
    50
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Site Owner Help
    MSN Search Web Crawler and Site IndexingAbout site indexing on MSN Search
    Guidelines for successful indexing
    About site ranking
    About your site description
    Control which pages of your website are indexed
    Block your website from MSN Search Preview
    Troubleshoot issues with MSNBot and site crawling
    What to do when your site moves
    Remove your website from the MSN Search index

    Recent customer questionsHow can I control which pages of my website are indexed?
    How can I troubleshoot issues with MSNBot and site indexing?
    Control which pages of your website are indexedUse robots.txt to control access to your website or part of the server
    Restrict indexing and link crawling within your website
    Use metadata tags to control page indexing and link crawling
    Limit crawl frequency
    The MSN Search web crawler MSNBot enables website owners to control which pages MSN Search indexes and how often MSNBot accesses your website.

    You can prevent MSNBot and other standards-compliant crawlers from crawling a server or collecting information and links from specific pages on your website by using a robots.txt file and/or meta tags.

    Note

    If other sites link to your site, your site's URL and any text you include in HTML anchor tags may still be added to our index. However, your site content is not added to the index.


    Use the robots.txt file to control access to your website or part of the server (back to top)
    To control how and when your website is crawled, create a robots.txt file in the top-level (root) directory of your website. In the robots.txt file, you can specify which web crawlers to allow or block. Note that while MSNBot complies with the standards for robots.txt, not all web crawlers comply.



    To conform to the Robots Exclusion Standard, MSNBot searches for robots.txt. When you create the file, make sure that the file is named robots.txt. Crawling and indexing restrictions may not work correctly if you name the file robot.txt.


    Each time MSNBot crawls your website, it looks in your web server's root directory for a robots.txt file. If the file exists, MSNBot checks to see if MSNBot is an allowed user agent, and if any crawling or indexing restrictions have been set.

    To set which web crawlers can access your website, use the syntax in the table below for your robots.txt file. MSN Search also includes image searching provided by Picsearch. If you do not want your images indexed, you can block the Picsearch crawler, Psbot, as described in the following table.

    Text strings in the robots.txt file are not case-sensitive.

    To do this: Use this syntax:
    Allow all robots full access and to prevent "file not found: robots.txt" errors Create an empty robots.txt file
    Allow all robots complete access User-agent: *
    Disallow:
    Allow only MSNBot access User-agent: msnbotDisallow:User-agent: *Disallow: /
    Exclude all robots from the entire server User-agent: *Disallow: /
    Exclude only MSNBot User-agent: msnbotDisallow: /
    Exclude only Psbot (Picsearch) User-agent: psbotDisallow: /

    Restrict indexing and link crawling within your website (back to top)
    You can block MSNBot from crawling specific file types linked to your website by specifying MSNBot as the user-agent for a Disallow tag that specifies the file types to exclude.

    To do this: Use this syntax: Examples
    Restrict MSNBot from indexing specific file types User-agent: msnbotDisallow: /*.[file extension]$
    (the "$" is required) User-agent: msnbotDisallow: /*.PDF$Disallow: /*.jpeg$
    Disallow: /*.exe$

    Note

    For more information about robots.txt files, go to the Web Robots pages.

    Use metadata tags to control page indexing and link crawling (back to top)
    You can allow MSNBot to crawl your website and still restrict access to specific web pages and documents by using the noindex and nofollow meta tags within the page code. The noindex tag allows the web page to be retrieved by MSNBot, but blocks indexing of its content. The nofollow tag blocks the web crawler from following links in the web page that go to other web pages or documents. Note that not all web crawling robots obey these tags.

    If you want to set access and indexing restrictions for your website, replace the user-agent name robotswith msnbotor "*".msnbot in the tag syntax examples below. You can use each tag alone or combine both tags into a single meta tag.

    To do this: Add this to the page header:
    Restrict MSNBot from indexing a page <META NAME="msnbot" CONTENT="noindex" />
    Restrict all robots from indexing a page <META NAME="*" CONTENT="noindex" />
    Restrict MSNBot from following links on a page <META NAME="msnbot" CONTENT="nofollow" />
    Restrict all robots from following links on a page <META NAME="robots" CONTENT="nofollow" />
    Block MSNBot from both indexing and following links <META NAME="msnbot" CONTENT="noindex,nofollow" />
    Prevent MSNBot from caching a page <META NAME="msnbot" CONTENT="nocache" />
    or

    <META NAME="msnbot" CONTENT="noarchive" />

    Limit crawl frequency (back to top)
    If you occasionally get high traffic from MSNBot, you can specify a crawl delay parameter in the robots.txt file to specify how often, in seconds, MSNBot can access your website. To do this, add this syntax to your robots.txt file:

    User-agent: msnbotCrawl-delay: 120
    When you contact us about an issue, include the following information so that we can help you more quickly:

    The target website address that MSNBot put in the robots.txt file
    The date range when the issue occurred
    The access logs
     
    jitendra, Dec 5, 2006 IP
  7. Claudek

    Claudek Well-Known Member

    Messages:
    1,379
    Likes Received:
    81
    Best Answers:
    0
    Trophy Points:
    165
    #7
    I have found on some sites that if the msnbot detects media such as videos on your site, the msn media bot comes to visit like mad. It does appear to get you into MSN slightly faster from what i've experienced.
     
    Claudek, Dec 5, 2006 IP
  8. jitendra

    jitendra Peon

    Messages:
    50
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #8
    Google and MSN not crawling new site

    --------------------------------------------------------------------------------

    We have re-styled and updated our website and uploaded the new database (CMS) to the old URL. All search engines bar Google and MSN list the website. The website has been up now for 7 weeks and I can't for the life of me think why they won't crawl it. We use WebTrends and it shows that the site has been visited by Google but not crawled.

    In Google's listing it shows the old listing and cache which was taken on the 3rd October.

    --------------------------------------------------------------------------------
     
    jitendra, Dec 5, 2006 IP
  9. PinotNoir

    PinotNoir Peon

    Messages:
    505
    Likes Received:
    40
    Best Answers:
    0
    Trophy Points:
    0
    #9
    Yeah, I have noticed that Yahoo has been really on the ball these days. MSN used to be constantly crawling.
     
    PinotNoir, Dec 5, 2006 IP
  10. Bison

    Bison Active Member

    Messages:
    160
    Likes Received:
    3
    Best Answers:
    0
    Trophy Points:
    60
    #10
    MSN seems to have a problem with spidering deep. It just does not want to do it. I had made some site maps, but it has not yielded any results yet. Its been about a month now. Still waiting.

    Bison
     
    Bison, Dec 6, 2006 IP
  11. jitendra

    jitendra Peon

    Messages:
    50
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #11
    Microsoft Crawling Google Results For New Search Engine? Jason Dowdell
    Expert Author
    Published: 2004-11-11

    Insider Reports RSS Feed




    I was questioned today by a developer who was watching a particular IP address scan his site. The IP was 65.54.188.86 and is registered to Microsoft Corp. located at One Microsoft Way, Redmond, Washington 98052. This visitor was not sending the normal header information associated with a crawler to the web server such as an http robot name or identifying info or even a browser name.


    Is MSN Crawling Google?


    Is Microsoft "using" Google's search results to populate their index? Discuss Microsoft's behavior at WebProWorld.

    The behavior it demonstrated made it look like a crawler, especially since it was spidering urls that were no longer in existence (search engine spiders crawl site segments at regular intervals and often come back when an initial crawl left urls uncrawled) and doing so at the rate of 1 page every 3 - 5 seconds. The visitor started their visit at 7:37 am and was still on the site at 12:00 pm.

    Correction, the data was there after all, here's the crawler info... msnbot/0.3 (+http://search.msn.com/msnbot.htm)

    Here's the kicker

    So now you're saying, so what, big deal. But this really is a big deal. It's a big deal not only because the urls this visitor was making requests to don't exist any longer but because the only place these urls can be found is in Google's search results using site:www.sitename.com. A similar query on MSN Search doesn't show the urls at all, even on the beta version of their new Microsoft search engine. But then within just hours of the visitors exit from the site the new same search at Microsoft's new search engine shows all of the urls in question being fully indexed within its results.

    My Theory On This Mysterious Microsoft Crawler

    The old msn required a fee to be crawled by its spider. But a few months back MSN dropped the fee and said they were going to begin crawling the entire web and doing it without charge. However, that's no easy task. So I believe MSN is using the results from Google and possibly even Yahoo to get all of the pages they've indexed on sites that have a relatively low page count in the current msn search engine.

    First off, that's the fastest way to get the relevant pages from a web site. Sure they could just go to the site directly and start crawling but in doing so they're going to get tons of duplicate urls and urls that seem different but point to the same content. Crawling Google's results will eliminate the bandwidth to some extent but will not completely take care of the duplicate content issue their spider will encounter.

    Secondly, crawling Google's results can act as a qualitative measure for their new search engine. By creating a baseline number of pages per site when the new Microsoft Search is launched and running a comparison on a regular interval for the next 6 months, they'll be able to determine internally if their engine is finding and indexing the same links and as many links as Google. Call it competitive analysis or whatever you want.

    So Microsoft's Screen Scraping?

    Obviously my conclusion should be taken as a grain of salt but it's a definite possibility. Microsoft very well could be screen scraping Google (or maybe even using their API, LOL) and crawling the urls it finds. It makes sense from a business case but I wonder if there are any legal issues there. I doubt it. It's like putting garbage out to the curb. Once it's out there it's fair game but I bet Google's lawyers would have more to say than that on the case.

    Has anyone out there seen similar behavior on their own sites? Please comment with your qualitative/objective data if so.
     
    jitendra, Dec 8, 2006 IP
  12. jitendra

    jitendra Peon

    Messages:
    50
    Likes Received:
    1
    Best Answers:
    0
    Trophy Points:
    0
    #12
    MSN is working on a Web crawler, known as the MSNBot, that it will likely make a part of its next MSN search engine.


    A growing number of Web sites are reporting sightings of the MSNBot as it crawls their Web sites.


    "Looks like the competition will shortly be h(ea)ting up in the search engine world," says one poster, Martin Belam. "Microsoft have started sending their own MSNBot out scouring the web to build a new index for MSN search."


    Microsoft has created a Web page, "MSN Search Prototype Web Crawler" to a number of questions about its bot.


    "MSNBOT is still in the development and research stage," notes Microsoft on its site.


    "This crawl is a prototype work. MSNBOT is not currently indexing for the MSN Search Engine, so your site may or may not show up in MSN Search results today," according to the site. "Although we have not set a date, it is our intention to eventually integrate the crawled contents into MSN Search results."


    A number of industry watchers have been expecting Microsoft to build its own bot since Yahoo purchased Inktomi, one of Microsoft's OEM search partners, earlier this year.


    "With Yahoo grabbing Inktomi from out under MSN, I suspect that they're deciding to build/buy their own Internet crawler," said one search-engine consultant. "And I'm curious as to whether they view paid inclusion as a business model they're interested in."


    The current MSN Search engine does not include its own built-in spider, or bot. Instead, MSN has relied on LookSmart and Inktomi for its directory and search results. MSN also works with Overture on paid placements and sponsorships for its site. MSN has denied talk that it plans to dump Overture as its paid-search partner and get into that business itself.


    When asked for comment on the timing of MSNBot, an MSN spokeswoman had this to say: "While MSN does not have any specific timelines or plans to discuss at this time, they are committed to delivering the most relevant search results for consumers. Today, MSN Search takes an approach that utilizes both internal technology, as well as the technology of third-party companies, including Looksmart, Inktomi, Overture and Girafa. MSN is strongly committed to continuing to improve their search experience through developing technology internally and continuing to work with partners."


    But Microsoft has made no bones about the fact that it wants to "out-google Google" — both with its next-generation MSN Search engine, as well as with new search technology that it is building into Longhorn Windows client and other future Microsoft product releases.


    Yusuf Mehdi, Corporate Vice President, MSN Personal Services and Business, recently told attendees of the Goldman Sachs Internet conference that Microsoft is focusing more on the algorithmic part of search than on paid placements.


    "What Google has done in terms of doing a great end-user experience — and what's going on with paid click — has led us to basically go back and redouble our efforts.," Mehdi said. "So, one of the things that we are doing now is really looking at this algorithmic search area, and we are investing a lot to go and build what we expect and hope will be the best-in-class search service in the near future."
     
    jitendra, Dec 8, 2006 IP