Posts

Showing posts with the label bot

Check your site for broken links as Crawler or Spider (useful for Linux)

Image
Check your site for broken links as Crawler or Spider (useful for Linux) http://www.pc-freak.net/blog/checking-your-website-for-broken-links-on-linux-with-linkchecker-and-htcheck-how-to-find-broken-links-on-your-website/ http://htcheck.sourceforge.net Pros: The "Spider" or "Crawler" - HTTP/1.1 compliant with persistent connections and cookies support - HTTP Basic authentication supported - HTTP Proxy support (basic authentication included) - Crawl customisable through many configuration attributes which let the user limit the digging on URLs pattern matchings and distance ("hops") from the first URL. - MySQL databases directly created by the spider - MySQL connections through user or general option files as defined by the database system (/etc/my.cnf or ~/.my.cnf) Cons:  for htcheck but not for linkcheckr : No support for Javascript and other protocols like HTTPS, FTP, NNTP and local files.