I was looking around for an easy-to-use, no-fuss command line tool to check the links on a web site. First I tried [wget]:
wget -o wget.log -nv -r -p
The resulting `wget.log` contains all the links that were followed. It’s easy to spot the errors but there is no obvious way to get hold of the referrer.
Next was [linkchecker]:
linkchecker -t3 –no-warnings -Fblacklist/blacklist.out http://
This produces a list of broken links in `blacklist.out`. There is no referrer information in that, but one can get hold of it by cross-referencing the full log in `linkchecker.log`. That is not entirely trivial though; it’s certainly beyond `grep`. More significantly, linkchecker seems to run *forever* and checking the same links over and over again – I gave up after it had spent 1 hour and checked 100,000 links on a site that contains no more than a few hundred actual links.
Finally, I tried [linklint]:
linklint -error -warn -xref -forward -out linklint.out -net -http -host
This completed in a few minutes and produced a nice report in `linklint.out`. The report contains a summary of the kinds of links, files and errors found, a per-referrer break-down of all broken links, and a list of all moved URLs referenced by the site. This is pretty much exactly what I was after!
All three tools are available as debian packages. linklint development seems to have stopped a few years ago, yet it was the best of the bunch for what I was trying to achieve. YMMV.