anyone know a good 404 finder for your site?

dimmakherbs Active Member

Messages:: 1,330

Likes Received:: 31

Best Answers:: 0

Trophy Points:: 90

#1

id like to be able to plug in my site and have it find errors, like 404s, but most importantly HOW it got to the link, where it was or something so i can find these errors.

dimmakherbs, Dec 11, 2008 IP

busman3000 Peon

Messages:: 543

Likes Received:: 4

Best Answers:: 0

Trophy Points:: 0

#2

you can always have a custom 404 page. I do this and it saves alot of visitors.

Heres a guide: http://www.thesitewizard.com/archive/custom404.shtml

busman3000, Dec 12, 2008 IP

kk5st Prominent Member

Messages:: 3,497

Likes Received:: 376

Best Answers:: 29

Trophy Points:: 335

#3

Check linklint.org.

cheers,

gary

kk5st, Dec 12, 2008 IP

2advance Well-Known Member

Messages:: 2,614

Likes Received:: 72

Best Answers:: 0

Trophy Points:: 140

#4

To do this, simply go to the Start button and choose the Run command. Then, insert cmd. When the command window pops up, all you have to do is to insert 'ping website name' and you will know if the server recognizes the website or if you have wrongly entered the name of the website.

2advance, Dec 13, 2008 IP

dimmakherbs Active Member

Messages:: 1,330

Likes Received:: 31

Best Answers:: 0

Trophy Points:: 90

#5

2advance said: ↑

To do this, simply go to the Start button and choose the Run command. Then, insert cmd. When the command window pops up, all you have to do is to insert 'ping website name' and you will know if the server recognizes the website or if you have wrongly entered the name of the website.
Click to expand...

Im trying to flush out the errors in my site. I am getting 404 errors on my site map and google is crawling some, but i can't figure out where they GO to get to that URL.

dimmakherbs, Dec 13, 2008 IP

kk5st Prominent Member

Messages:: 3,497

Likes Received:: 376

Best Answers:: 29

Trophy Points:: 335

#6

Any get that results in an error should leave a line in the error log, e.g. /var/log/apache2/error.log. It will look something like this:


[Sat Dec 13 23:04:48 2008] [error] [client 192.168.1.47] File does not exist: /home/gt/public_html/some.html, referer: http://koko/~gt/test.html

Code (markup):

From there, you can extract the bad link address, and the page it is on.

You could also spider your site. Use the utility wget. See wget for windows, or use your Linux package manager.

From the command line, enter

$ wget --spider -r http://mysite.com/

Code (markup):

I made a local test file for demo purposes. There are two links, one good, one not.

gt@aretha:~$ wget --spider -r http://koko/~gt/test.html
Spider mode enabled. Check if remote file exists.
--2008-12-13 23:34:47--  http://koko/~gt/test.html
Resolving koko... 192.168.1.10
Connecting to koko|192.168.1.10|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 722 [text/html]
Remote file exists and could contain links to other resources -- retrieving.

--2008-12-13 23:34:47--  http://koko/~gt/test.html
Reusing existing connection to koko:80.
HTTP request sent, awaiting response... 200 OK
Length: 722 [text/html]
Saving to: `koko/~gt/test.html'

100%[======================================>] 722         --.-K/s   in 0s      

2008-12-13 23:34:47 (93.9 MB/s) - `koko/~gt/test.html' saved [722/722]

Loading robots.txt; please ignore errors.
--2008-12-13 23:34:47--  http://koko/robots.txt
Reusing existing connection to koko:80.
HTTP request sent, awaiting response... 404 Not Found
2008-12-13 23:34:47 ERROR 404: Not Found.

Removing koko/~gt/test.html.

Spider mode enabled. Check if remote file exists.
--2008-12-13 23:34:47--  http://koko/~gt/some.html
Reusing existing connection to koko:80.
HTTP request sent, awaiting response... 404 Not Found
Remote file does not exist -- broken link!!!

Spider mode enabled. Check if remote file exists.
--2008-12-13 23:34:47--  http://koko/~gt/new.html
Connecting to koko|192.168.1.10|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 463 [text/html]
Remote file exists and could contain links to other resources -- retrieving.

--2008-12-13 23:34:47--  http://koko/~gt/new.html
Reusing existing connection to koko:80.
HTTP request sent, awaiting response... 200 OK
Length: 463 [text/html]
Saving to: `koko/~gt/new.html'

100%[======================================>] 463         --.-K/s   in 0s      

2008-12-13 23:34:47 (131 MB/s) - `koko/~gt/new.html' saved [463/463]

Removing koko/~gt/new.html.

Found 1 broken link.

http://koko/~gt/some.html

FINISHED --2008-12-13 23:34:47--
Downloaded: 2 files, 1.2K in 0s (105 MB/s)
gt@aretha:~$

Code (markup):

I don't know how Google uses the sitemap.xml. Assuming your sitemap looks something like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://gtwebdev.com/</loc>
    <priority>1.0</priority>
  </url>  
  
  ...

</urlset>

Code (markup):

Make a working copy of the unzipped xml file. Run a couple of find/replace operations so the <loc> lines,

 <loc>http://gtwebdev.com/</loc>

Code (markup):

looks like this:

 <a href="http://gtwebdev.com/">xxx</a>

Code (markup):

Then run wget again with different options.

wget --spider --force-html -i sitemap.xml

Code (markup):

cheers,

gary

kk5st, Dec 13, 2008 IP

Log in or Sign up

anyone know a good 404 finder for your site?

dimmakherbs Active Member

busman3000 Peon

kk5st Prominent Member

2advance Well-Known Member

dimmakherbs Active Member

kk5st Prominent Member

Useful Searches