oscaruser@programmer.net
2006-May-16 07:48 UTC
[Xapian-discuss] [Omindex] How to associate a web URL in search results based on a document stored as a local file?
Folks, I download a web page and want to add it to the index. I am using omindex (as below). When I search for the document, I see in search results that the hyper text link URL is to a file (e.g. http://www.mysite.com/shoe/tennis_shoe/tennis_shoe.html). What I want to be able to do is download the HTML file, save it, have it appear with a link back to the original web URL. How can I do this? I have been looking at modifying the omega.cc, but am just getting started with the source code. I thought perhaps there is a better way or tool to use. Thanks, OSC ~/xapian/bin/omindex --db /var/data/omega/data/default --url 'http://www.mysite.com/shoe/tennis_shoe' /tmp/shoe/tennis_shoe.html -- ___________________________________________________ Play 100s of games for FREE! http://games.mail.com/
Olly Betts
2006-May-16 09:47 UTC
[Xapian-discuss] [Omindex] How to associate a web URL in search results based on a document stored as a local file?
On Mon, May 15, 2006 at 03:43:21PM -0800, oscaruser@programmer.net wrote:> I download a web page and want to add it to the index. I am using > omindex (as below). When I search for the document, I see in search > results that the hyper text link URL is to a file (e.g. > http://www.mysite.com/shoe/tennis_shoe/tennis_shoe.html).That's what you asked for when you said: --url 'http://www.mysite.com/shoe/tennis_shoe'> What I want to be able to do is download the HTML file, save it, have > it appear with a link back to the original web URL.You mean like Google's "cached copy"? After indexing, save each page using a filename which can be derived from the URL (e.g. MD5SUM of the URL). Then you can write a simple template page in PHP or similar (called cached.php, say) which takes a parameter "url" and loads the text of the cached page and displays it with a link to the url. Then you can just set the link in the omegascript query template to be something like: <a href="$html{cached.php?url=$field{url}}">$field{title}</a> Cheers, Olly
oscaruser@programmer.net
2006-May-16 23:13 UTC
[Xapian-discuss] [Omindex] How to associate a web URL in search results based on a document stored as a local file?
how can i have it display only the specified URL (e.g. http://www.mysite.com/shoe/tennis_shoe) ? it is appending "/tennis_shoe.html" representing the local file -- not what i want. thanks> ----- Original Message ----- > From: "Olly Betts" <olly@survex.com> > To: oscaruser@programmer.net > Subject: Re: [Xapian-discuss] [Omindex] How to associate a web URL in search results based on a document stored as a local file? > Date: Tue, 16 May 2006 09:46:58 +0100 > > > On Mon, May 15, 2006 at 03:43:21PM -0800, oscaruser@programmer.net wrote: > > I download a web page and want to add it to the index. I am using > > omindex (as below). When I search for the document, I see in search > > results that the hyper text link URL is to a file (e.g. > > http://www.mysite.com/shoe/tennis_shoe/tennis_shoe.html). > > That's what you asked for when you said: > > --url 'http://www.mysite.com/shoe/tennis_shoe' > > > What I want to be able to do is download the HTML file, save it, have > > it appear with a link back to the original web URL. > > You mean like Google's "cached copy"? > > After indexing, save each page using a filename which can be derived > from the URL (e.g. MD5SUM of the URL). Then you can write a simple > template page in PHP or similar (called cached.php, say) which takes a > parameter "url" and loads the text of the cached page and displays it > with a link to the url. > > Then you can just set the link in the omegascript query template to be > something like: > > <a href="$html{cached.php?url=$field{url}}">$field{title}</a> > > Cheers, > Olly>-- ___________________________________________________ Play 100s of games for FREE! http://games.mail.com/
oscaruser@programmer.net
2006-May-17 01:41 UTC
[Xapian-discuss] [Omindex] How to associate a web URL in search results based on a document stored as a local file?
fyi, resolved using the following change. note i assume only a single file in the directory. thanks oscar@delta:~/xapian/omega-0.9.6$ diff omindex.cc ../orig/omega-0.9.6/omindex.cc 399,400c399 < //string record = "url=" + baseurl + url + "\nsample=" + sample; < string record = "url=" + baseurl + "\nsample=" + sample; ---> string record = "url=" + baseurl + url + "\nsample=" + sample;oscar@delta:~/xapian/omega-0.9.6$> ----- Original Message ----- > From: oscaruser@programmer.net > To: xapian-discuss@lists.xapian.org > Subject: Re: [Xapian-discuss] [Omindex] How to associate a web URL in search results based on a document stored as a local file? > Date: Tue, 16 May 2006 14:12:19 -0800 > > > how can i have it display only the specified URL (e.g. > http://www.mysite.com/shoe/tennis_shoe) ? it is appending > "/tennis_shoe.html" representing the local file -- not what i want. > thanks > > > ----- Original Message ----- > > From: "Olly Betts" <olly@survex.com> > > To: oscaruser@programmer.net > > Subject: Re: [Xapian-discuss] [Omindex] How to associate a web > > URL in search results based on a document stored as a local file? > > Date: Tue, 16 May 2006 09:46:58 +0100 > > > > > > On Mon, May 15, 2006 at 03:43:21PM -0800, oscaruser@programmer.net wrote: > > > I download a web page and want to add it to the index. I am using > > > omindex (as below). When I search for the document, I see in search > > > results that the hyper text link URL is to a file (e.g. > > > http://www.mysite.com/shoe/tennis_shoe/tennis_shoe.html). > > > > That's what you asked for when you said: > > > > --url 'http://www.mysite.com/shoe/tennis_shoe' > > > > > What I want to be able to do is download the HTML file, save it, have > > > it appear with a link back to the original web URL. > > > > You mean like Google's "cached copy"? > > > > After indexing, save each page using a filename which can be derived > > from the URL (e.g. MD5SUM of the URL). Then you can write a simple > > template page in PHP or similar (called cached.php, say) which takes a > > parameter "url" and loads the text of the cached page and displays it > > with a link to the url. > > > > Then you can just set the link in the omegascript query template to be > > something like: > > > > <a href="$html{cached.php?url=$field{url}}">$field{title}</a> > > > > Cheers, > > Olly > > > > > > -- > ___________________________________________________ > Play 100s of games for FREE! http://games.mail.com/ > > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss@lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss>-- ___________________________________________________ Play 100s of games for FREE! http://games.mail.com/