Hello! I am trying to get specific fields from an XML document and I am totally puzzled. I hope someone can help me. # URL URL<-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11877539,11822933,11871444&retmode=xml&rettype=citation" # download a XML file tmp <- xmlTreeParse(URL, isURL = TRUE) tmp <- xmlRoot(tmp) Now I want to extract only node 'pubdate' and its children, but I don't know how to do that unless I try to dig into the structure of the XML file. The problem is that structure can differ and then hardcoded set of list indices i.e. tmp[[i]][[j]]... doesn't help me. I've read xmlEventParse but I don't understand handlers part up to the point that I could get anything usable from it. Here is something not very usable ;) PubDate <- function(x, ...) { print(x) } xmlEventParse(URL, isURL = TRUE, handlers=list(PubDate=PubDate), addContext = FALSE) Thanks in advance! Lep pozdrav / With regards, Gregor Gorjanc ---------------------------------------------------------------------- University of Ljubljana Biotechnical Faculty URI: http://www.bfro.uni-lj.si/MR/ggorjan Zootechnical Department mail: gregor.gorjanc <at> bfro.uni-lj.si Groblje 3 tel: +386 (0)1 72 17 861 SI-1230 Domzale fax: +386 (0)1 72 17 888 Slovenia, Europe ---------------------------------------------------------------------- "One must learn by doing the thing; for though you think you know it, you have no certainty until you try." Sophocles ~ 450 B.C.
Gregor, I'm not answering your question directly, but have you looked at the bioconductor package "annotate"? I bet it does much of what you are trying to do.... http://www.bioconductor.org/repository/release1.5/package/html/index.html List of functions: http://www.bioconductor.org/repository/release1.5/package/html/descrips/annotateDesc.html Sean ----- Original Message ----- From: "Gorjanc Gregor" <Gregor.Gorjanc at bfro.uni-lj.si> To: <r-help at stat.math.ethz.ch> Sent: Sunday, May 08, 2005 12:29 PM Subject: [R] Extract just some fields from XML> Hello! > > I am trying to get specific fields from an XML document and I am totally > puzzled. I hope someone can help me. > > # URL > URL<-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11877539,11822933,11871444&retmode=xml&rettype=citation" > # download a XML file > tmp <- xmlTreeParse(URL, isURL = TRUE) > tmp <- xmlRoot(tmp) > > Now I want to extract only node 'pubdate' and its children, but I don't > know how to do that unless I try to dig into the structure of the XML > file. The problem is that structure can differ and then hardcoded set > of list indices i.e. tmp[[i]][[j]]... doesn't help me. > > I've read xmlEventParse but I don't understand handlers part up to the > point that I could get anything usable from it. Here is something not > very usable ;) > > PubDate <- function(x, ...) > { > print(x) > } > xmlEventParse(URL, isURL = TRUE, > handlers=list(PubDate=PubDate), > addContext = FALSE) > > Thanks in advance! > > Lep pozdrav / With regards, > Gregor Gorjanc > > ---------------------------------------------------------------------- > University of Ljubljana > Biotechnical Faculty URI: http://www.bfro.uni-lj.si/MR/ggorjan > Zootechnical Department mail: gregor.gorjanc <at> bfro.uni-lj.si > Groblje 3 tel: +386 (0)1 72 17 861 > SI-1230 Domzale fax: +386 (0)1 72 17 888 > Slovenia, Europe > ---------------------------------------------------------------------- > "One must learn by doing the thing; for though you think you know it, > you have no certainty until you try." Sophocles ~ 450 B.C. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
-----Original Message----- From: Sean Davis [mailto:sdavis2 at mail.nih.gov] Sent: pon 2005-05-09 02:38 To: Gorjanc Gregor; r-help at stat.math.ethz.ch Subject: Re: [R] Extract just some fields from XML>Gregor, > >I'm not answering your question directly, but have you looked at the >bioconductor package "annotate"? I bet it does much of what you are trying >to do.... > >http://www.bioconductor.org/repository/release1.5/package/html/index.html > >List of functions: > >http://www.bioconductor.org/repository/release1.5/package/html/descrips/annotateDesc.htmlSean, thank you for this. I'm aware of functions in 'annotate' and I also usem them in my work. Lep pozdrav / With regards, Gregor Gorjanc ---------------------------------------------------------------------- University of Ljubljana Biotechnical Faculty URI: http://www.bfro.uni-lj.si/MR/ggorjan Zootechnical Department mail: gregor.gorjanc <at> bfro.uni-lj.si Groblje 3 tel: +386 (0)1 72 17 861 SI-1230 Domzale fax: +386 (0)1 72 17 888 Slovenia, Europe ---------------------------------------------------------------------- "One must learn by doing the thing; for though you think you know it, you have no certainty until you try." Sophocles ~ 450 B.C.