Hi, I'm currently reviewing my originally proposed API design and I added two new fields(idField, stemmer) to the xapian_index() function. As my next task I'm planning to determine the output data structure and format of xapian_search() function. Afterwards I will focus back on xapian_index() function and review the format of valueSlots parameter. An outline of 'simple indexing' functionality: xapian_index(dbpath=??, datapath=??, idField=c(0), indexFields=NULL, stemmer=??,valueSlots=NULL, ?) dbpath: Path to a xapian database datapath: Path to a data source idField: Column number of a column in the data frame whose row value will be used as a unique identifier indexFields: A list of character vectors each containing a field name and a prefix stemmer: language stemmer xapian_index() function can be used to index the content of a data frame. Convert the data frame(df) to a csv. (Skip this step if data source is already a csv file):>> write.csv(df, ?location/of/data.csv?)>> f1 <- c(?Title?,?S?)>> f2<- c(?Description?,?XD?)>> fields<- list(f1,f2)>> idField <-c(0)>> xapian_index(?path/to/database?,?location/of/data.csv?, idField=c(0),indexFields=fields,stemmer=?en?) For indexing multiple data frames of similar format:>> dataLoc <-c(?path1?,?path2?,?path3?, ?)>> for(dataSource in dataLoc){xapian_index(?path/to/database?,dataSource, idField=c(0), indexFields=fields,stemmer=?en?) } Best regards, Amanda -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160430/482c4f32/attachment.html>
On Sat, Apr 30, 2016 at 08:32:54PM +0530, Amanda Jayanetti wrote:> dbpath: Path to a xapian database > datapath: Path to a data sourceHi Amanda. A couple of questions: Is there a reason you're using a stored CSV rather than doing it from a data frame directly? That would avoid having to read from a foreign format, write to CSV and then read again in order to index it. Also, do R users expect to use numeric indexing into their data, or name indexing? Or would it be better to support both? J -- James Aylett, occasional trouble-maker xapian.org
> > >Is there a reason you're using a stored CSV rather than doing it from > >a data frame directly? That would avoid having to read from a foreign > >format, write to CSV and then read again in order to index it.No. The input data structure to xapian_index() will indeed be a data frame. Even a stored CSV can be conveniently converted to a data frame. There are few further modifications to xapian_index() function that are not indicated in the example and I will provide a complete draft of the function after reviewing those further.>Also, do R users expect to use numeric indexing into their data, or > >name indexing? Or would it be better to support both?Is there a specific way of implementing numeric indexing with Xapian? In R numeric indexing can be used to extract content of a data frame (and other R data structures). Since xapian_search() will return a data frame, its elements can be conveniently extracted using various functions in R. Best regards, Amanda -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160503/80533fa9/attachment.html>