link checker

By: on June 8, 2006

I was looking around for an easy-to-use, no-fuss command line tool to check the links on a web site. First I tried [wget][]:

wget -o wget.log -nv -r -p

The resulting `wget.log` contains all the links that were followed. It’s easy to spot the errors but there is no obvious way to get hold of the referrer.

Next was [linkchecker][]:

linkchecker -t3 –no-warnings -Fblacklist/blacklist.out http:// > linkchecker.log

This produces a list of broken links in `blacklist.out`. There is no referrer information in that, but one can get hold of it by cross-referencing the full log in `linkchecker.log`. That is not entirely trivial though; it’s certainly beyond `grep`. More significantly, linkchecker seems to run *forever* and checking the same links over and over again – I gave up after it had spent 1 hour and checked 100,000 links on a site that contains no more than a few hundred actual links.

Finally, I tried [linklint][]:

linklint -error -warn -xref -forward -out linklint.out -net -http -host /@

This completed in a few minutes and produced a nice report in `linklint.out`. The report contains a summary of the kinds of links, files and errors found, a per-referrer break-down of all broken links, and a list of all moved URLs referenced by the site. This is pretty much exactly what I was after!

All three tools are available as debian packages. linklint development seems to have stopped a few years ago, yet it was the best of the bunch for what I was trying to achieve. YMMV.

[wget]: http://www.gnu.org/software/wget/
[linkchecker]: http://linkchecker.sourceforge.net/
[linklint]: http://www.linklint.org/

Share

Comment

  1. nosebreaker.com says:

    Linklint is available at http://www.linklint.org

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*