Hi,
I have a rather complex xml document that I am attempting to parse based on
attributes:
<Manifest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<!-- eName : name of the element.
eValue : value of the element. -->
<OutputFilePath>D:\CN_data\Agilent\Results\</OutputFilePath>
<FilesList>
<File>
<Characteristic Type="File" eName="FileTypeId"
eValue="10"/>
<Characteristic Type="File"
eName="FilePath"
eValue="D:\CN_data\Agilent\TCGA-06-0875-01A-01D-0387-02_US23502331_251469343372_S01_CGH-v4_10_Apr08.txt"/>
<Characteristic Type ="Patient"
eName="PatientReference" eValue="TCGA-06-0875-01A"/>
<Characteristic Type ="Patient"
eName="SampleType" eValue="TUMOR"/>
<Characteristic Type ="Patient"
eName="SampleMarker" eValue="cy3"/>
<Characteristic Type ="Patient"
eName="PatientDateOfBirth" eValue="080808"/>
<Characteristic Type ="Patient"
eName="PatientGender" eValue="M"/>
<Characteristic Type ="Patient"
eName="PatientSampleConcentration" eValue="20mg"/>
</File>
File>
<Characteristic Type="File" eName="FileTypeId"
eValue="10"/>
<Characteristic Type="File"
eName="FilePath"
eValue="D:\CN_data\Agilent\TCGA-06-0875-01A-01D-0387-02_US23502331_251469343372_S02_CGH-v4_10_Apr08.txt"/>
<Characteristic Type ="Patient"
eName="PatientReference" eValue="TCGA-06-0875-02A"/>
<Characteristic Type ="Patient"
eName="SampleType" eValue="TUMOR"/>
<Characteristic Type ="Patient"
eName="SampleMarker" eValue="cy3"/>
<Characteristic Type ="Patient"
eName="PatientDateOfBirth" eValue="080808"/>
<Characteristic Type ="Patient"
eName="PatientGender" eValue="M"/>
<Characteristic Type ="Patient"
eName="PatientSampleConcentration" eValue="20mg"/>
</File>
<File>
<Characteristic Type="File"
eName="FileTypeId" eValue="20"/>
<Characteristic Type="File"
eName="FilePath"
eValue="D:\CN_data\Agilent\TCGA-06-0875-10A-01D-0387-02_US23502331_251469342195_S01_CGH-v4_10_Apr08.txt"/>
<Characteristic Type ="Patient"
eName="PatientReference" eValue="TCGA-06-0875-10A"/>
<Characteristic Type ="Patient"
eName="SampleType" eValue="NORMAL"/>
<Characteristic Type ="Patient"
eName="SampleMarker" eValue="cy3"/>
<Characteristic Type ="Patient"
eName="PatientDateOfBirth" eValue="080808"/>
<Characteristic Type ="Patient"
eName="PatientGender" eValue="M"/>
<Characteristic Type ="Patient"
eName="PatientSampleConcentration" eValue="20mg"/>
</File>
My requirement is to access eValues at each <File> node based on
FileTypeId. For example:
How can I get the eValue of eName="PatientReference" for all
Type="Patient" ,where the <Characteristic Type="File"
eName="FileTypeId" eValue="10"/>?
i.e. "TCGA-06-0875-01A" and "TCGA-06-0875-02A"
For the life of me, I can not get this to work!
Thanks,
-Aaron
[[alternative HTML version deleted]]
Duncan Temple Lang
2009-Feb-09 19:33 UTC
[R] XML package- accessing nodes based on attributes
XPath is your friend here.
getNodeSet(mf,
'//Characteristic[@Type="File" and
@eName="FileTypeId"
and @eValue="10"]/parent::File
/Characteristic[@Type="Patient"
and @eName="PatientReference"]/@eValue')
I have broken the XPath expression across lines to try to format it
more legibly.
The basic idea is to first find only the File nodes which have the
required Characteristic with the specific values for the
File, FileTypeId and eValue attributes.
Then go back up to the parent <File> element and
then extract the other Characteristic that you want.
You can then call unlist if you want a character vector.
D.
Skewes,Aaron wrote:> Hi,
>
> I have a rather complex xml document that I am attempting to parse based on
attributes:
>
> <Manifest
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
> <!-- eName : name of the element.
> eValue : value of the element. -->
> <OutputFilePath>D:\CN_data\Agilent\Results\</OutputFilePath>
> <FilesList>
> <File>
> <Characteristic Type="File"
eName="FileTypeId" eValue="10"/>
> <Characteristic Type="File"
eName="FilePath"
eValue="D:\CN_data\Agilent\TCGA-06-0875-01A-01D-0387-02_US23502331_251469343372_S01_CGH-v4_10_Apr08.txt"/>
> <Characteristic Type ="Patient"
eName="PatientReference" eValue="TCGA-06-0875-01A"/>
> <Characteristic Type ="Patient"
eName="SampleType" eValue="TUMOR"/>
> <Characteristic Type ="Patient"
eName="SampleMarker" eValue="cy3"/>
> <Characteristic Type ="Patient"
eName="PatientDateOfBirth" eValue="080808"/>
> <Characteristic Type ="Patient"
eName="PatientGender" eValue="M"/>
> <Characteristic Type ="Patient"
eName="PatientSampleConcentration" eValue="20mg"/>
> </File>
> File>
> <Characteristic Type="File"
eName="FileTypeId" eValue="10"/>
> <Characteristic Type="File"
eName="FilePath"
eValue="D:\CN_data\Agilent\TCGA-06-0875-01A-01D-0387-02_US23502331_251469343372_S02_CGH-v4_10_Apr08.txt"/>
> <Characteristic Type ="Patient"
eName="PatientReference" eValue="TCGA-06-0875-02A"/>
> <Characteristic Type ="Patient"
eName="SampleType" eValue="TUMOR"/>
> <Characteristic Type ="Patient"
eName="SampleMarker" eValue="cy3"/>
> <Characteristic Type ="Patient"
eName="PatientDateOfBirth" eValue="080808"/>
> <Characteristic Type ="Patient"
eName="PatientGender" eValue="M"/>
> <Characteristic Type ="Patient"
eName="PatientSampleConcentration" eValue="20mg"/>
> </File>
>
> <File>
> <Characteristic Type="File"
eName="FileTypeId" eValue="20"/>
> <Characteristic Type="File"
eName="FilePath"
eValue="D:\CN_data\Agilent\TCGA-06-0875-10A-01D-0387-02_US23502331_251469342195_S01_CGH-v4_10_Apr08.txt"/>
> <Characteristic Type ="Patient"
eName="PatientReference" eValue="TCGA-06-0875-10A"/>
> <Characteristic Type ="Patient"
eName="SampleType" eValue="NORMAL"/>
> <Characteristic Type ="Patient"
eName="SampleMarker" eValue="cy3"/>
> <Characteristic Type ="Patient"
eName="PatientDateOfBirth" eValue="080808"/>
> <Characteristic Type ="Patient"
eName="PatientGender" eValue="M"/>
> <Characteristic Type ="Patient"
eName="PatientSampleConcentration" eValue="20mg"/>
> </File>
>
>
> My requirement is to access eValues at each <File> node based on
FileTypeId. For example:
>
> How can I get the eValue of eName="PatientReference" for all
Type="Patient" ,where the <Characteristic Type="File"
eName="FileTypeId" eValue="10"/>?
>
> i.e. "TCGA-06-0875-01A" and "TCGA-06-0875-02A"
>
>
> For the life of me, I can not get this to work!
>
> Thanks,
> -Aaron
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.