Hadley,
Thank you. I am able to get the xml_ns_strip() function to work with my file
directly so I will likely be able to reach my immediate goal.
However, I still have had no success with understanding the namespace problem. I
am not able to use read_xml() using the object I generated for the reproducible
example, which is simply a character vector of length 4 having the contents of
the XML file as produce by readLines(). I then used dput() to define the
structure. The resulting structure apparently is not to the liking of
read_xml(). I have reproduced the necessary code here for your convenience.
There error is below.
##
library(xml2)
library(stringr)
with_ns_xml <- c("<?xml version=\"1.0\" ?>",
"<WorkSet
xmlns=\"http://labkey.org/etl/xml\">",
"<Description>MFIA 9-Plex
(CharlesRiver)</Description>",
"</WorkSet>")
## without str_c() collapse it complain of a vector of length > 1 also.
read_xml(str_c(with_ns_xml, collapse = TRUE))
Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html =
as_html, :
Start tag expected, '<' not found [4]
## produces the following error message.
Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html =
as_html, :
Start tag expected, '<' not found [4]
I have similar issues with xml2::xml_find_all
xml_find_all(str_c(with_ns_xml, collapse = TRUE),
"/WorkSet//Description")
## Produces the following error message.
Error in UseMethod("xml_find_all") :
no applicable method for 'xml_find_all' applied to an object of class
"character"
R. Mark Sharp, Ph.D.
msharp at TxBiomed.org
> On Jan 31, 2017, at 4:27 PM, Hadley Wickham <h.wickham at gmail.com>
wrote:
>
> See the last example in ?xml2::xml_find_all or use
xml2::xml2::xml_ns_strip()
>
> Hadley
>
> On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp <msharp at txbiomed.org>
wrote:
>> I am trying to read a series of XML files that use a namespace and I
have failed, thus far, to discover the proper syntax. I have a reproducible
example below. I have two XML character strings defined: one without a namespace
and one with. I show that I can successfully extract the node using the XML
string without the namespace and fail when using the XML string with the
namespace.
>>
>> Mark
>> PS I am having the same problem with the xml2 package and am hoping
understanding one with help with the other.
>>
>> ##
>> library(XML)
>> ## The first XML text (no_ns_xml) does not have a namespace defined
>> no_ns_xml <- c("<?xml version=\"1.0\" ?>",
"<WorkSet>",
>> "<Description>MFIA 9-Plex
(CharlesRiver)</Description>",
>> "</WorkSet>")
>> l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE,
>> useInternalNodes = TRUE)
>> ## The node is found
>> getNodeSet(l_no_ns_xml, "/WorkSet//Description")
>>
>> ## The second XML text (with_ns_xml) has a namespace defined
>> with_ns_xml <- c("<?xml version=\"1.0\"
?>",
>> "<WorkSet
xmlns=\"http://labkey.org/etl/xml\">",
>> "<Description>MFIA 9-Plex
(CharlesRiver)</Description>",
>> "</WorkSet>")
>>
>> l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD =
FALSE,
>> useInternalNodes = TRUE)
>> ## The node is not found
>> getNodeSet(l_with_ns_xml, "/WorkSet//Description")
>> ## I attempt to provide the namespace, but fail.
>> ns <- "http://labkey.org/etl/xml"
>> names(ns)[1] <- "xmlns"
>> getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces
= ns)
>>
>> R. Mark Sharp, Ph.D.
>> Director of Data Science Core
>> Southwest National Primate Research Center
>> Texas Biomedical Research Institute
>> P.O. Box 760549
>> San Antonio, TX 78245-0549
>> Telephone: (210)258-9476
>> e-mail: msharp at TxBiomed.org
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> CONFIDENTIALITY NOTICE: This e-mail and any files
and/or...{{dropped:10}}
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> http://hadley.nz
CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}
Hadley Wickham
2017-Jan-31 23:52 UTC
[R] Failure to understand namespaces in XML::getNodeSet
I think you want
x <- read_xml('<?xml version="1.0" ?>
<WorkSet xmlns="http://labkey.org/etl/xml">
<Description>MFIA 9-Plex (CharlesRiver)</Description>
</WorkSet>')
The collapse argument do what you think it does.
Hadley
On Tue, Jan 31, 2017 at 5:36 PM, Mark Sharp <msharp at txbiomed.org>
wrote:> Hadley,
>
> Thank you. I am able to get the xml_ns_strip() function to work with my
file directly so I will likely be able to reach my immediate goal.
>
> However, I still have had no success with understanding the namespace
problem. I am not able to use read_xml() using the object I generated for the
reproducible example, which is simply a character vector of length 4 having the
contents of the XML file as produce by readLines(). I then used dput() to define
the structure. The resulting structure apparently is not to the liking of
read_xml(). I have reproduced the necessary code here for your convenience.
There error is below.
>
> ##
> library(xml2)
> library(stringr)
> with_ns_xml <- c("<?xml version=\"1.0\" ?>",
> "<WorkSet
xmlns=\"http://labkey.org/etl/xml\">",
> "<Description>MFIA 9-Plex
(CharlesRiver)</Description>",
> "</WorkSet>")
> ## without str_c() collapse it complain of a vector of length > 1 also.
> read_xml(str_c(with_ns_xml, collapse = TRUE))
> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html
= as_html, :
> Start tag expected, '<' not found [4]
>
> ## produces the following error message.
> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html
= as_html, :
> Start tag expected, '<' not found [4]
>
> I have similar issues with xml2::xml_find_all
> xml_find_all(str_c(with_ns_xml, collapse = TRUE),
"/WorkSet//Description")
>
> ## Produces the following error message.
> Error in UseMethod("xml_find_all") :
> no applicable method for 'xml_find_all' applied to an object of
class "character"
>
>
>
> R. Mark Sharp, Ph.D.
> msharp at TxBiomed.org
>
>
>
>
>
>> On Jan 31, 2017, at 4:27 PM, Hadley Wickham <h.wickham at
gmail.com> wrote:
>>
>> See the last example in ?xml2::xml_find_all or use
xml2::xml2::xml_ns_strip()
>>
>> Hadley
>>
>> On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp <msharp at
txbiomed.org> wrote:
>>> I am trying to read a series of XML files that use a namespace and
I have failed, thus far, to discover the proper syntax. I have a reproducible
example below. I have two XML character strings defined: one without a namespace
and one with. I show that I can successfully extract the node using the XML
string without the namespace and fail when using the XML string with the
namespace.
>>>
>>> Mark
>>> PS I am having the same problem with the xml2 package and am hoping
understanding one with help with the other.
>>>
>>> ##
>>> library(XML)
>>> ## The first XML text (no_ns_xml) does not have a namespace defined
>>> no_ns_xml <- c("<?xml version=\"1.0\"
?>", "<WorkSet>",
>>> "<Description>MFIA 9-Plex
(CharlesRiver)</Description>",
>>> "</WorkSet>")
>>> l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD =
FALSE,
>>> useInternalNodes = TRUE)
>>> ## The node is found
>>> getNodeSet(l_no_ns_xml, "/WorkSet//Description")
>>>
>>> ## The second XML text (with_ns_xml) has a namespace defined
>>> with_ns_xml <- c("<?xml version=\"1.0\"
?>",
>>> "<WorkSet
xmlns=\"http://labkey.org/etl/xml\">",
>>> "<Description>MFIA 9-Plex
(CharlesRiver)</Description>",
>>> "</WorkSet>")
>>>
>>> l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD
= FALSE,
>>> useInternalNodes = TRUE)
>>> ## The node is not found
>>> getNodeSet(l_with_ns_xml, "/WorkSet//Description")
>>> ## I attempt to provide the namespace, but fail.
>>> ns <- "http://labkey.org/etl/xml"
>>> names(ns)[1] <- "xmlns"
>>> getNodeSet(l_with_ns_xml, "/WorkSet//Description",
namespaces = ns)
>>>
>>> R. Mark Sharp, Ph.D.
>>> Director of Data Science Core
>>> Southwest National Primate Research Center
>>> Texas Biomedical Research Institute
>>> P.O. Box 760549
>>> San Antonio, TX 78245-0549
>>> Telephone: (210)258-9476
>>> e-mail: msharp at TxBiomed.org
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE: This e-mail and any files
and/or...{{dropped:10}}
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> http://hadley.nz
>
> CONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments
transmitted, may contain privileged and confidential information and is intended
solely for the exclusive use of the individual or entity to whom it is
addressed. If you are not the intended recipient, you are hereby notified that
any review, dissemination, distribution or copying of this e-mail and/or
attachments is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender stating that this transmission was
misdirected; return the e-mail to sender; destroy all paper copies and delete
all electronic copies from your system without disclosing its contents.
--
http://hadley.nz
Hadley, It?s sometimes amazing the mistakes I can make. No, it did not do what I wanted, which was read_xml(str_c(with_ns_xml, collapse = ?") Reproducible example follows: library(stringr) library(xml2) ## Given the correct argument value for collapse, the next two lines work no_ns <- read_xml(str_c(no_ns_xml, collapse = "")) with_ns <- read_xml(str_c(with_ns_xml, collapse = "")) ## The next line finds the node in the XML without a namespace xml_find_all(no_ns, "//WorkSet//Description") ## With a namespace designated in the XML ## Neither of the next two work, though I thought the second should xml_find_all(with_ns, "//WorkSet//Description") xml_find_all(with_ns, "/WorkSet//Description", ns = xml_ns(with_ns)) ## Using xml_ns_strip() works as predicted xml_find_all(xml_ns_strip(with_ns), "//WorkSet//Description") ## I was surprised to find the incorrect namespace value did not matter xml_find_all(no_ns, "//WorkSet//Description", ns = xml_ns(with_ns)) ## This also seems to ignore the namespace argument value xml_find_all(xml_ns_strip(with_ns), "/WorkSet//Description", ns = xml_ns(with_ns)) Full output follows:> ## Given the correct argument value for collapse, the next two lines work > no_ns <- read_xml(str_c(no_ns_xml, collapse = "")) > with_ns <- read_xml(str_c(with_ns_xml, collapse = "")) > ## The next line finds the node in the XML without a namespace > xml_find_all(no_ns, "//WorkSet//Description"){xml_nodeset (1)} [1] <Description>MFIA 9-Plex (CharlesRiver)</Description>> ## With a namespace designated in the XML > ## Neither of the next two work, though I thought the second should > xml_find_all(with_ns, "//WorkSet//Description"){xml_nodeset (0)}> xml_find_all(with_ns, "/WorkSet//Description", ns = xml_ns(with_ns)){xml_nodeset (0)}> ## Using xml_ns_strip() works as predicted > xml_find_all(xml_ns_strip(with_ns), "//WorkSet//Description"){xml_nodeset (1)} [1] <Description>MFIA 9-Plex (CharlesRiver)</Description>> ## I was surprised to find the incorrect namespace value did not matter > xml_find_all(no_ns, "//WorkSet//Description", ns = xml_ns(with_ns)){xml_nodeset (1)} [1] <Description>MFIA 9-Plex (CharlesRiver)</Description>> ## This also seems to ignore the namespace argument value > xml_find_all(xml_ns_strip(with_ns), "/WorkSet//Description", ns = xml_ns(with_ns)){xml_nodeset (1)} [1] <Description>MFIA 9-Plex (CharlesRiver)</Description> R. Mark Sharp, Ph.D. msharp at TxBiomed.org> On Jan 31, 2017, at 5:52 PM, Hadley Wickham <h.wickham at gmail.com> wrote: > > I think you want > > x <- read_xml('<?xml version="1.0" ?> > <WorkSet xmlns="http://labkey.org/etl/xml"> > <Description>MFIA 9-Plex (CharlesRiver)</Description> > </WorkSet>') > > The collapse argument do what you think it does. > > Hadley > > On Tue, Jan 31, 2017 at 5:36 PM, Mark Sharp <msharp at txbiomed.org> wrote: >> Hadley, >> >> Thank you. I am able to get the xml_ns_strip() function to work with my file directly so I will likely be able to reach my immediate goal. >> >> However, I still have had no success with understanding the namespace problem. I am not able to use read_xml() using the object I generated for the reproducible example, which is simply a character vector of length 4 having the contents of the XML file as produce by readLines(). I then used dput() to define the structure. The resulting structure apparently is not to the liking of read_xml(). I have reproduced the necessary code here for your convenience. There error is below. >> >> ## >> library(xml2) >> library(stringr) >> with_ns_xml <- c("<?xml version=\"1.0\" ?>", >> "<WorkSet xmlns=\"http://labkey.org/etl/xml\">", >> "<Description>MFIA 9-Plex (CharlesRiver)</Description>", >> "</WorkSet>") >> ## without str_c() collapse it complain of a vector of length > 1 also. >> read_xml(str_c(with_ns_xml, collapse = TRUE)) >> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, : >> Start tag expected, '<' not found [4] >> >> ## produces the following error message. >> Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, : >> Start tag expected, '<' not found [4] >> >> I have similar issues with xml2::xml_find_all >> xml_find_all(str_c(with_ns_xml, collapse = TRUE), "/WorkSet//Description") >> >> ## Produces the following error message. >> Error in UseMethod("xml_find_all") : >> no applicable method for 'xml_find_all' applied to an object of class "character" >> >> >> >> R. Mark Sharp, Ph.D. >> msharp at TxBiomed.org >> >> >> >> >> >>> On Jan 31, 2017, at 4:27 PM, Hadley Wickham <h.wickham at gmail.com> wrote: >>> >>> See the last example in ?xml2::xml_find_all or use xml2::xml2::xml_ns_strip() >>> >>> Hadley >>> >>> On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp <msharp at txbiomed.org> wrote: >>>> I am trying to read a series of XML files that use a namespace and I have failed, thus far, to discover the proper syntax. I have a reproducible example below. I have two XML character strings defined: one without a namespace and one with. I show that I can successfully extract the node using the XML string without the namespace and fail when using the XML string with the namespace. >>>> >>>> Mark >>>> PS I am having the same problem with the xml2 package and am hoping understanding one with help with the other. >>>> >>>> ## >>>> library(XML) >>>> ## The first XML text (no_ns_xml) does not have a namespace defined >>>> no_ns_xml <- c("<?xml version=\"1.0\" ?>", "<WorkSet>", >>>> "<Description>MFIA 9-Plex (CharlesRiver)</Description>", >>>> "</WorkSet>") >>>> l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE, >>>> useInternalNodes = TRUE) >>>> ## The node is found >>>> getNodeSet(l_no_ns_xml, "/WorkSet//Description") >>>> >>>> ## The second XML text (with_ns_xml) has a namespace defined >>>> with_ns_xml <- c("<?xml version=\"1.0\" ?>", >>>> "<WorkSet xmlns=\"http://labkey.org/etl/xml\">", >>>> "<Description>MFIA 9-Plex (CharlesRiver)</Description>", >>>> "</WorkSet>") >>>> >>>> l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE, >>>> useInternalNodes = TRUE) >>>> ## The node is not found >>>> getNodeSet(l_with_ns_xml, "/WorkSet//Description") >>>> ## I attempt to provide the namespace, but fail. >>>> ns <- "http://labkey.org/etl/xml" >>>> names(ns)[1] <- "xmlns" >>>> getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns) >>>> >>>> R. Mark Sharp, Ph.D. >>>> Director of Data Science Core >>>> Southwest National Primate Research Center >>>> Texas Biomedical Research Institute >>>> P.O. Box 760549 >>>> San Antonio, TX 78245-0549 >>>> Telephone: (210)258-9476 >>>> e-mail: msharp at TxBiomed.org >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}} >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> >>> -- >>> http://hadley.nz >> >> CONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments transmitted, may contain privileged and confidential information and is intended solely for the exclusive use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or copying of this e-mail and/or attachments is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender stating that this transmission was misdirected; return the e-mail to sender; destroy all paper copies and delete all electronic copies from your system without disclosing its contents. > > > > -- > http://hadley.nzCONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments transmitted, may contain privileged and confidential information and is intended solely for the exclusive use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or copying of this e-mail and/or attachments is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender stating that this transmission was misdirected; return the e-mail to sender; destroy all paper copies and delete all electronic copies from your system without disclosing its contents.