Tool to see all URLs of a Site?

Discussion in 'Search Engine Optimization' started by SEOThomas, May 6, 2010.

  1. #1
    I am working on cleaning up the URLs at my company. We have literally thousands of URLs out there. Is there a tool that will show all of them? Yahoo Site Explorer only shows 1k but I would like to have a gigantic list.
     
    SEOThomas, May 6, 2010 IP
  2. jaredg

    jaredg Peon

    Messages:
    28
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #2
    I don't know any tools that would directly do this (although they exist), but you might want to get your hands on a "link verifier" that would crawl your pages checking the integrity of your backlinks--I used one years ago that spit out all the URLs it crawled while it did this.

    Good luck!
     
    jaredg, May 6, 2010 IP
  3. TheGoogleGurus

    TheGoogleGurus Guest

    Messages:
    52
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #3
    Have you tried typing site:yourdomain.com into Google? This will show you what pages they have in their index
     
    TheGoogleGurus, May 6, 2010 IP
  4. haritash

    haritash Guest

    Messages:
    130
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #4
    site:yourdomain.com But by this command we can get only those pages those pages Google have indexed.
     
    haritash, May 6, 2010 IP
  5. PhilipSEO

    PhilipSEO Notable Member

    Messages:
    467
    Likes Received:
    48
    Best Answers:
    4
    Trophy Points:
    225
    #5
    It will not show all of them, not for a site of this size. The site: operator is hugely buggy and unreliable, forget it for large sites. Plus, there will be pages that Google has not indexed.

    What you need is a tool that will start from your home page and follow all the links to all the pages and report the URLs. Luckily, there is such a tool, and it's free: Xenu's Link Sleuth. You can download is here:
    http://home.snafu.de/tilman/xenulink.html

    I don't know if it will find orphaned pages (ones that are not linked from the site).

    Since you are cleaning up URLs, I suggest you don't simply delete any of them. Use 301 redirects to other pages (to save your link juice and PR). Don't use any other redirects: only 301s are SEO-friendly.

    Also recall that URLs with www. and without www. are different URLs.

    I hope this helps!
     
    PhilipSEO, May 6, 2010 IP
  6. SEOThomas

    SEOThomas Peon

    Messages:
    2
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #6
    Thanks very much.....and I do plan on 301 a bunch of the junk URLs to the more top level domains. This is a mass cleanup of sorts
     
    SEOThomas, May 7, 2010 IP
  7. asghar.paracha

    asghar.paracha Well-Known Member

    Messages:
    720
    Likes Received:
    4
    Best Answers:
    0
    Trophy Points:
    118
    #7
    is this tool work for dynamic website?
     
    asghar.paracha, May 7, 2010 IP
  8. growclicks

    growclicks Peon

    Messages:
    37
    Likes Received:
    0
    Best Answers:
    0
    Trophy Points:
    0
    #8
    google: Link Gopher..... it is a firefox plugin, very handy.
     
    growclicks, May 8, 2010 IP
  9. social-media

    social-media Member

    Messages:
    311
    Likes Received:
    9
    Best Answers:
    0
    Trophy Points:
    35
    #9
    You will want to use several sources and consolidate the data into a master list. Download Xenu Link Sleuthe for free. Enter your home page URL under Check URL and watch it run. Not only will it crawl your site and itemize all URLs, it will include all image URLs, javascript URLs, CSS URLS, etc. It will show your the value for the <title> of every page. It will show you the return status (200, 404, 301, 302, etc.) so it's very useful for detecting your redirects and broken links. Once it's finished you can export all of the data to a tab separated file that you can load into Excel to work with.

    Use the SITE: command at all of the major engines and dump them into Excel.

    If you have a web analytics package like Omniture Discover then query your analytics to get a list of all unique URLs requested on your server(s) over the last X months (how ever far back you have data).

    Get a copy of your entire folder structure from your web server (assuming your not using a CMS) and copy it to your harddrive. Go through it folder by folder to look for pages that can still be requested but might not be linked to from your existing site. I did this for a huge commercial PR7 site when we were doing a redesign and actually found 5 other very old versions of the site that were still on their servers, still accessilbe, and still indexed w/ the search engines because they still had inbound links. I redirected them all to the pages on the new site which most closely resembled the corresponding pages on the old versions of the site.

    Get all of this data into Excel, sort by URL, and writ a little macro to eliminate duplicates.

    PS: If your URLs sometime contain query string parameters then you'll want to know about those as well. Treat each version of a URL with different query string parameter combinations as a different URL/page because that is how the search engines will see them (and often times they render different content on a site).
     
    social-media, May 8, 2010 IP