0 Replies - 1661 Views - Last Post: 31 August 2012 - 11:24 AM

#1 G0rman  Icon User is offline

  • New D.I.C Head
  • member icon

Reputation: 7
  • View blog
  • Posts: 46
  • Joined: 16-October 11

Simple website monitor

Posted 31 August 2012 - 11:24 AM

Description: This is a small program demonstrating how to extract information from a website. Additionally the extracted info is hashed to reduce the size of the file that has to be saved - a useful trick if you only care if /something/ has changed, and not /what/ has changed. If you want to learn more about scraping and how to make a more complex script, I wrote a short tutorial on scraping recently.

You need the following programs to run this script; bash, cat, echo, wget, awk, openssl, mv

If you have bash but don't have cat, echo, or mv, then that is pretty impressive, but you will definitely have trouble running this script!

$ wget dl.dropbox.com/u/13649456/code/websitechecker
$ chmod +x websitechecker
$ ./websitechecker --help
usage: websitechecker url [start] [end]
Checks if there have been any changes to the content.
If the optional start and end parameters are used then it checks only between those strings.
example usage: websitechecker hyperboleandahalf.blogspot.com.au "blog-posts hfeed" "blog-pager"
$ ./websitechcker hyperboleandahalf.blogspot.com.au "blog-posts hfeed" "blog-pager"
$ (crontab -l; echo '0 0 * * * ~/websitechcker hyperboleandahalf.blogspot.com.au "blog-posts hfeed" "blog-pager" >> ~/.hyperboleUpdates') | crontab -
$ echo 'tail -n1 ~/.hyperboleUpdates' >> .bashrc
$ echo "Now cron will check hyperboleandahalf for updates every day, and when I open a new bash it will read out the update status for me!"
Now cron will check hyperboleandahalf for updates every day, and when I open a new bash it will read out the update status for me!Checks if a website has new content since last check. Can check entire website, or only a snippet.
if [ "$#" -ne "1" -a "$#" -ne "3" -o "$1" = "--help" ] ; then
	echo "usage: websitechecker url [start] [end]"
	echo "Checks if there have been any changes to the content."
	echo "If the optional start and end parameters are used then it checks only between those strings."
	echo "example usage: websitechecker hyperboleandahalf.blogspot.com.au "blog-posts hfeed" "blog-pager""
	exit
fi

tmpfile=/tmp/hash
url=$1
hashfile=~/.$url.hash
if [ "$#" -eq "1" ] ; then
	wget -q -O - $url | openssl md5 > $tmpfile
else
	wget -q -O - $url | awk '/$2/,/$3/' | openssl md5 > $tmpfile
fi
if [ -a $hashfile ] ; then
	if [ $(cat $hashfile) != $(cat $tmpfile) ] ; then
		echo "$url has new content."
	else
		echo "$url has no new content."
	fi
else
	echo "hash file for $url created! Next time you run this script it will check for updates."
fi
mv $tmpfile $hashfile



Is This A Good Question/Topic? 1
  • +

Page 1 of 1