thr3ads.net - Xapian discuss - [Xapian-discuss] hi all [Jan 2007]

If this information is useful, please help other people find it:
Share via:

syed ahmed

2007-Jan-29 13:50 UTC

[Xapian-discuss] hi all

hi

does n e one have an idea to program a web crawler with the following 
functions

1)to extract all the links from the webpage and further extract links from
each extracted link till depth level 3 and store them in a
database.
2) further functions of this crawler it should sort for each file type
.html, .pdf, .ps, ..txt etc..
3)to extract the meta information such as title, abstract and urls atleast
for two file types such as .pdf and .html
4) to calculate the MD5 or Sha 1 for every distinct entry in database.
5) the databse sys used shall be Mysql

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

syed ahmed

2007-Jan-29 13:50 UTC

head link

[Xapian-discuss] hi all

hi

does n e one have an idea to program a web crawler with the following 
functions

1)to extract all the links from the webpage and further extract links from
each extracted link till depth level 3 and store them in a
database.
2) further functions of this crawler it should sort for each file type
.html, .pdf, .ps, ..txt etc..
3)to extract the meta information such as title, abstract and urls atleast
for two file types such as .pdf and .html
4) to calculate the MD5 or Sha 1 for every distinct entry in database.
5) the databse sys used shall be Mysql

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/

Xapian discuss - Jan 2007 - hi all

[Xapian-discuss] hi all

[Xapian-discuss] hi all