Hi, I have a rather complex xml document that I am attempting to parse based on attributes: <Manifest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <!-- eName : name of the element. eValue : value of the element. --> <OutputFilePath>D:\CN_data\Agilent\Results\</OutputFilePath> <FilesList> <File> <Characteristic Type="File" eName="FileTypeId" eValue="10"/> <Characteristic Type="File" eName="FilePath" eValue="D:\CN_data\Agilent\TCGA-06-0875-01A-01D-0387-02_US23502331_251469343372_S01_CGH-v4_10_Apr08.txt"/> <Characteristic Type ="Patient" eName="PatientReference" eValue="TCGA-06-0875-01A"/> <Characteristic Type ="Patient" eName="SampleType" eValue="TUMOR"/> <Characteristic Type ="Patient" eName="SampleMarker" eValue="cy3"/> <Characteristic Type ="Patient" eName="PatientDateOfBirth" eValue="080808"/> <Characteristic Type ="Patient" eName="PatientGender" eValue="M"/> <Characteristic Type ="Patient" eName="PatientSampleConcentration" eValue="20mg"/> </File> File> <Characteristic Type="File" eName="FileTypeId" eValue="10"/> <Characteristic Type="File" eName="FilePath" eValue="D:\CN_data\Agilent\TCGA-06-0875-01A-01D-0387-02_US23502331_251469343372_S02_CGH-v4_10_Apr08.txt"/> <Characteristic Type ="Patient" eName="PatientReference" eValue="TCGA-06-0875-02A"/> <Characteristic Type ="Patient" eName="SampleType" eValue="TUMOR"/> <Characteristic Type ="Patient" eName="SampleMarker" eValue="cy3"/> <Characteristic Type ="Patient" eName="PatientDateOfBirth" eValue="080808"/> <Characteristic Type ="Patient" eName="PatientGender" eValue="M"/> <Characteristic Type ="Patient" eName="PatientSampleConcentration" eValue="20mg"/> </File> <File> <Characteristic Type="File" eName="FileTypeId" eValue="20"/> <Characteristic Type="File" eName="FilePath" eValue="D:\CN_data\Agilent\TCGA-06-0875-10A-01D-0387-02_US23502331_251469342195_S01_CGH-v4_10_Apr08.txt"/> <Characteristic Type ="Patient" eName="PatientReference" eValue="TCGA-06-0875-10A"/> <Characteristic Type ="Patient" eName="SampleType" eValue="NORMAL"/> <Characteristic Type ="Patient" eName="SampleMarker" eValue="cy3"/> <Characteristic Type ="Patient" eName="PatientDateOfBirth" eValue="080808"/> <Characteristic Type ="Patient" eName="PatientGender" eValue="M"/> <Characteristic Type ="Patient" eName="PatientSampleConcentration" eValue="20mg"/> </File> My requirement is to access eValues at each <File> node based on FileTypeId. For example: How can I get the eValue of eName="PatientReference" for all Type="Patient" ,where the <Characteristic Type="File" eName="FileTypeId" eValue="10"/>? i.e. "TCGA-06-0875-01A" and "TCGA-06-0875-02A" For the life of me, I can not get this to work! Thanks, -Aaron [[alternative HTML version deleted]]
Duncan Temple Lang
2009-Feb-09 19:33 UTC
[R] XML package- accessing nodes based on attributes
XPath is your friend here. getNodeSet(mf, '//Characteristic[@Type="File" and @eName="FileTypeId" and @eValue="10"]/parent::File /Characteristic[@Type="Patient" and @eName="PatientReference"]/@eValue') I have broken the XPath expression across lines to try to format it more legibly. The basic idea is to first find only the File nodes which have the required Characteristic with the specific values for the File, FileTypeId and eValue attributes. Then go back up to the parent <File> element and then extract the other Characteristic that you want. You can then call unlist if you want a character vector. D. Skewes,Aaron wrote:> Hi, > > I have a rather complex xml document that I am attempting to parse based on attributes: > > <Manifest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> > <!-- eName : name of the element. > eValue : value of the element. --> > <OutputFilePath>D:\CN_data\Agilent\Results\</OutputFilePath> > <FilesList> > <File> > <Characteristic Type="File" eName="FileTypeId" eValue="10"/> > <Characteristic Type="File" eName="FilePath" eValue="D:\CN_data\Agilent\TCGA-06-0875-01A-01D-0387-02_US23502331_251469343372_S01_CGH-v4_10_Apr08.txt"/> > <Characteristic Type ="Patient" eName="PatientReference" eValue="TCGA-06-0875-01A"/> > <Characteristic Type ="Patient" eName="SampleType" eValue="TUMOR"/> > <Characteristic Type ="Patient" eName="SampleMarker" eValue="cy3"/> > <Characteristic Type ="Patient" eName="PatientDateOfBirth" eValue="080808"/> > <Characteristic Type ="Patient" eName="PatientGender" eValue="M"/> > <Characteristic Type ="Patient" eName="PatientSampleConcentration" eValue="20mg"/> > </File> > File> > <Characteristic Type="File" eName="FileTypeId" eValue="10"/> > <Characteristic Type="File" eName="FilePath" eValue="D:\CN_data\Agilent\TCGA-06-0875-01A-01D-0387-02_US23502331_251469343372_S02_CGH-v4_10_Apr08.txt"/> > <Characteristic Type ="Patient" eName="PatientReference" eValue="TCGA-06-0875-02A"/> > <Characteristic Type ="Patient" eName="SampleType" eValue="TUMOR"/> > <Characteristic Type ="Patient" eName="SampleMarker" eValue="cy3"/> > <Characteristic Type ="Patient" eName="PatientDateOfBirth" eValue="080808"/> > <Characteristic Type ="Patient" eName="PatientGender" eValue="M"/> > <Characteristic Type ="Patient" eName="PatientSampleConcentration" eValue="20mg"/> > </File> > > <File> > <Characteristic Type="File" eName="FileTypeId" eValue="20"/> > <Characteristic Type="File" eName="FilePath" eValue="D:\CN_data\Agilent\TCGA-06-0875-10A-01D-0387-02_US23502331_251469342195_S01_CGH-v4_10_Apr08.txt"/> > <Characteristic Type ="Patient" eName="PatientReference" eValue="TCGA-06-0875-10A"/> > <Characteristic Type ="Patient" eName="SampleType" eValue="NORMAL"/> > <Characteristic Type ="Patient" eName="SampleMarker" eValue="cy3"/> > <Characteristic Type ="Patient" eName="PatientDateOfBirth" eValue="080808"/> > <Characteristic Type ="Patient" eName="PatientGender" eValue="M"/> > <Characteristic Type ="Patient" eName="PatientSampleConcentration" eValue="20mg"/> > </File> > > > My requirement is to access eValues at each <File> node based on FileTypeId. For example: > > How can I get the eValue of eName="PatientReference" for all Type="Patient" ,where the <Characteristic Type="File" eName="FileTypeId" eValue="10"/>? > > i.e. "TCGA-06-0875-01A" and "TCGA-06-0875-02A" > > > For the life of me, I can not get this to work! > > Thanks, > -Aaron > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.