thr3ads.net - R help - [R] Extract just some fields from XML [May 2005]

If this information is useful, please help other people find it:
Share via:

Gorjanc Gregor

2005-May-08 16:29 UTC

[R] Extract just some fields from XML

Hello!

I am trying to get specific fields from an XML document and I am totally
puzzled. I hope someone can help me.

# URL
URL<-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11877539,11822933,11871444&retmode=xml&rettype=citation"
# download a XML file
tmp <- xmlTreeParse(URL, isURL = TRUE)
tmp <- xmlRoot(tmp)

Now I want to extract only node 'pubdate' and its children, but I
don't
know how to do that unless I try to dig into the structure of the XML
file. The problem is that structure can differ and then hardcoded set
of list indices i.e. tmp[[i]][[j]]... doesn't help me.

I've read xmlEventParse but I don't understand handlers part up to the 
point that I could get anything usable from it. Here is something not
very usable ;)

  PubDate <- function(x, ...)
  {
    print(x)
  }
  xmlEventParse(URL, isURL = TRUE,
                handlers=list(PubDate=PubDate),
                addContext = FALSE)

Thanks in advance!

Lep pozdrav / With regards,
    Gregor Gorjanc

----------------------------------------------------------------------
University of Ljubljana
Biotechnical Faculty        URI: http://www.bfro.uni-lj.si/MR/ggorjan
Zootechnical Department     mail: gregor.gorjanc <at> bfro.uni-lj.si
Groblje 3                   tel: +386 (0)1 72 17 861
SI-1230 Domzale             fax: +386 (0)1 72 17 888
Slovenia, Europe
----------------------------------------------------------------------
"One must learn by doing the thing; for though you think you know it,
 you have no certainty until you try." Sophocles ~ 450 B.C.

Sean Davis

2005-May-09 00:38 UTC

head link

[R] Extract just some fields from XML

Gregor,

I'm not answering your question directly, but have you looked at the 
bioconductor package "annotate"?  I bet it does much of what you are
trying
to do....

http://www.bioconductor.org/repository/release1.5/package/html/index.html

List of functions:

http://www.bioconductor.org/repository/release1.5/package/html/descrips/annotateDesc.html

Sean

----- Original Message ----- 
From: "Gorjanc Gregor" <Gregor.Gorjanc at bfro.uni-lj.si>
To: <r-help at stat.math.ethz.ch>
Sent: Sunday, May 08, 2005 12:29 PM
Subject: [R] Extract just some fields from XML

> Hello!
>
> I am trying to get specific fields from an XML document and I am totally
> puzzled. I hope someone can help me.
>
> # URL
>
URL<-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11877539,11822933,11871444&retmode=xml&rettype=citation"
> # download a XML file
> tmp <- xmlTreeParse(URL, isURL = TRUE)
> tmp <- xmlRoot(tmp)
>
> Now I want to extract only node 'pubdate' and its children, but I
don't
> know how to do that unless I try to dig into the structure of the XML
> file. The problem is that structure can differ and then hardcoded set
> of list indices i.e. tmp[[i]][[j]]... doesn't help me.
>
> I've read xmlEventParse but I don't understand handlers part up to
the
> point that I could get anything usable from it. Here is something not
> very usable ;)
>
>  PubDate <- function(x, ...)
>  {
>    print(x)
>  }
>  xmlEventParse(URL, isURL = TRUE,
>                handlers=list(PubDate=PubDate),
>                addContext = FALSE)
>
> Thanks in advance!
>
> Lep pozdrav / With regards,
>    Gregor Gorjanc
>
> ----------------------------------------------------------------------
> University of Ljubljana
> Biotechnical Faculty        URI: http://www.bfro.uni-lj.si/MR/ggorjan
> Zootechnical Department     mail: gregor.gorjanc <at> bfro.uni-lj.si
> Groblje 3                   tel: +386 (0)1 72 17 861
> SI-1230 Domzale             fax: +386 (0)1 72 17 888
> Slovenia, Europe
> ----------------------------------------------------------------------
> "One must learn by doing the thing; for though you think you know it,
> you have no certainty until you try." Sophocles ~ 450 B.C.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Gorjanc Gregor

2005-May-09 05:28 UTC

head link

[R] Extract just some fields from XML

-----Original Message-----
From: Sean Davis [mailto:sdavis2 at mail.nih.gov]
Sent: pon 2005-05-09 02:38
To: Gorjanc Gregor; r-help at stat.math.ethz.ch
Subject: Re: [R] Extract just some fields from XML
 >Gregor,
>
>I'm not answering your question directly, but have you looked at the 
>bioconductor package "annotate"?  I bet it does much of what you
are trying
>to do....
>
>http://www.bioconductor.org/repository/release1.5/package/html/index.html
>
>List of functions:
>
>http://www.bioconductor.org/repository/release1.5/package/html/descrips/annotateDesc.html
Sean,

thank you for this. I'm aware of functions in 'annotate' and I also
usem them in my
work.

Lep pozdrav / With regards,
    Gregor Gorjanc

----------------------------------------------------------------------
University of Ljubljana
Biotechnical Faculty        URI: http://www.bfro.uni-lj.si/MR/ggorjan
Zootechnical Department     mail: gregor.gorjanc <at> bfro.uni-lj.si
Groblje 3                   tel: +386 (0)1 72 17 861
SI-1230 Domzale             fax: +386 (0)1 72 17 888
Slovenia, Europe
----------------------------------------------------------------------
"One must learn by doing the thing; for though you think you know it,
 you have no certainty until you try." Sophocles ~ 450 B.C.

Possibly Parallel Threads

Search for more maybe matching threads

R help - May 2005 - Extract just some fields from XML

[R] Extract just some fields from XML

[R] Extract just some fields from XML

[R] Extract just some fields from XML

Possibly Parallel Threads