Hi, I have mined XML extensively with R before now, but my xpath chops seem to be regressing recently. I know that I can roll up my sleeves and search through the child nodes of the root, but I can't noodle out why using the xpath description returns an empty nodeset. Any suggestions and nudges most welcome. ### START library(xml2) library(httr) library(magrittr) daymet_uri <- "https://thredds.daac.ornl.gov/thredds/catalog/ornldaac/1328/catalog.xml" # run the following to show the node in a browser # httr::BROWSE(daymet_uri) daymet <- httr::GET(daymet_uri) %>% httr::content(type = "text/xml", encoding = "UTF-8") # list the children "service" and "dataset" daymet %>% xml2::xml_children() #{xml_nodeset (2)} #[1] <service name="all" serviceType="Compound" base="">\n <service name="odap" #serviceTyp ... #[2] <dataset name="Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Ve ... # find all descendants of node name "dataset" # # according to this tutorial we should find 'dataset' # https://www.w3schools.com/xml/xpath_syntax.asp daymet %>% xml2::xml_find_all(xpath = "//dataset") # {xml_nodeset (0)} # I have also tried every other xpath combination I think of e.g. # ".//dataset", "./dataset", "/dataset" and "dataset" # They each yield an empty nodeset ### END> sessionInfo()R version 3.5.1 (2018-07-02) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods [7] base other attached packages: [1] magrittr_1.5 httr_1.4.1 xml2_1.2.2 loaded via a namespace (and not attached): [1] compiler_3.5.1 R6_2.4.0 tools_3.5.1 curl_4.2 [5] yaml_2.2.0 Rcpp_1.0.3 Thanks, Ben Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org Ecological Forecasting: https://eco.bigelow.org/
> xml_ns(daymet)d1 <-> http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0 xlink <-> http://www.w3.org/1999/xlink> daymet %>% xml2::xml_find_all(xpath = "d1:dataset"){xml_nodeset (1)} [1] <dataset name="Daymet: Daily Surface Weather Data on a 1-km Grid for Nort ... Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Nov 12, 2019 at 11:35 AM Ben Tupper <btupper at bigelow.org> wrote:> Hi, > > I have mined XML extensively with R before now, but my xpath chops seem to > be regressing recently. I know that I can roll up my sleeves and search > through the child nodes of the root, but I can't noodle out why using the > xpath description returns an empty nodeset. > > Any suggestions and nudges most welcome. > > ### START > > library(xml2) > library(httr) > library(magrittr) > > daymet_uri <- " > https://thredds.daac.ornl.gov/thredds/catalog/ornldaac/1328/catalog.xml" > > # run the following to show the node in a browser > # httr::BROWSE(daymet_uri) > > daymet <- httr::GET(daymet_uri) %>% > httr::content(type = "text/xml", encoding = "UTF-8") > > # list the children "service" and "dataset" > daymet %>% xml2::xml_children() > #{xml_nodeset (2)} > #[1] <service name="all" serviceType="Compound" base="">\n <service > name="odap" #serviceTyp ... > #[2] <dataset name="Daymet: Daily Surface Weather Data on a 1-km Grid for > North America, Ve ... > > # find all descendants of node name "dataset" > # > # according to this tutorial we should find 'dataset' > # https://www.w3schools.com/xml/xpath_syntax.asp > daymet %>% xml2::xml_find_all(xpath = "//dataset") > # {xml_nodeset (0)} > > # I have also tried every other xpath combination I think of e.g. > # ".//dataset", "./dataset", "/dataset" and "dataset" > # They each yield an empty nodeset > > ### END > > > sessionInfo() > > R version 3.5.1 (2018-07-02) > Platform: x86_64-redhat-linux-gnu (64-bit) > Running under: CentOS Linux 7 (Core) > > Matrix products: default > BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods > [7] base > > other attached packages: > [1] magrittr_1.5 httr_1.4.1 xml2_1.2.2 > > loaded via a namespace (and not attached): > [1] compiler_3.5.1 R6_2.4.0 tools_3.5.1 curl_4.2 > [5] yaml_2.2.0 Rcpp_1.0.3 > > > Thanks, > Ben > > Ben Tupper > Bigelow Laboratory for Ocean Sciences > 60 Bigelow Drive, P.O. Box 380 > East Boothbay, Maine 04544 > http://www.bigelow.org > > Ecological Forecasting: https://eco.bigelow.org/ > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Forehead smack! Of course! Thank you, Bill!> On Nov 12, 2019, at 2:50 PM, William Dunlap <wdunlap at tibco.com> wrote: > > > xml_ns(daymet) > d1 <-> http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0 <http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0> > xlink <-> http://www.w3.org/1999/xlink <http://www.w3.org/1999/xlink> > > daymet %>% xml2::xml_find_all(xpath = "d1:dataset") > {xml_nodeset (1)} > [1] <dataset name="Daymet: Daily Surface Weather Data on a 1-km Grid for Nort ... > > Bill Dunlap > TIBCO Software > wdunlap tibco.com <http://tibco.com/> > > On Tue, Nov 12, 2019 at 11:35 AM Ben Tupper <btupper at bigelow.org <mailto:btupper at bigelow.org>> wrote: > Hi, > > I have mined XML extensively with R before now, but my xpath chops seem to be regressing recently. I know that I can roll up my sleeves and search through the child nodes of the root, but I can't noodle out why using the xpath description returns an empty nodeset. > > Any suggestions and nudges most welcome. > > ### START > > library(xml2) > library(httr) > library(magrittr) > > daymet_uri <- "https://thredds.daac.ornl.gov/thredds/catalog/ornldaac/1328/catalog.xml <https://thredds.daac.ornl.gov/thredds/catalog/ornldaac/1328/catalog.xml>" > > # run the following to show the node in a browser > # httr::BROWSE(daymet_uri) > > daymet <- httr::GET(daymet_uri) %>% > httr::content(type = "text/xml", encoding = "UTF-8") > > # list the children "service" and "dataset" > daymet %>% xml2::xml_children() > #{xml_nodeset (2)} > #[1] <service name="all" serviceType="Compound" base="">\n <service name="odap" #serviceTyp ... > #[2] <dataset name="Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Ve ... > > # find all descendants of node name "dataset" > # > # according to this tutorial we should find 'dataset' > # https://www.w3schools.com/xml/xpath_syntax.asp <https://www.w3schools.com/xml/xpath_syntax.asp> > daymet %>% xml2::xml_find_all(xpath = "//dataset") > # {xml_nodeset (0)} > > # I have also tried every other xpath combination I think of e.g. > # ".//dataset", "./dataset", "/dataset" and "dataset" > # They each yield an empty nodeset > > ### END > > > sessionInfo() > > R version 3.5.1 (2018-07-02) > Platform: x86_64-redhat-linux-gnu (64-bit) > Running under: CentOS Linux 7 (Core) > > Matrix products: default > BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods > [7] base > > other attached packages: > [1] magrittr_1.5 httr_1.4.1 xml2_1.2.2 > > loaded via a namespace (and not attached): > [1] compiler_3.5.1 R6_2.4.0 tools_3.5.1 curl_4.2 > [5] yaml_2.2.0 Rcpp_1.0.3 > > > Thanks, > Ben > > Ben Tupper > Bigelow Laboratory for Ocean Sciences > 60 Bigelow Drive, P.O. Box 380 > East Boothbay, Maine 04544 > http://www.bigelow.org <http://www.bigelow.org/> > > Ecological Forecasting: https://eco.bigelow.org/ <https://eco.bigelow.org/> > > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code.Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org Ecological Forecasting: https://eco.bigelow.org/ [[alternative HTML version deleted]]