hi
does n e one have an idea to program a web crawler with the following
functions
1)to extract all the links from the webpage and further extract links from
each extracted link till depth level 3 and store them in a
database.
2) further functions of this crawler it should sort for each file type
.html, .pdf, .ps, ..txt etc..
3)to extract the meta information such as title, abstract and urls atleast
for two file types such as .pdf and .html
4) to calculate the MD5 or Sha 1 for every distinct entry in database.
5) the databse sys used shall be Mysql
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/