Hi, I am scraping some data from the internet and I got what I want but in a big long string (sort of) and can't figure out a way to parse it. What I have gotten to is> mystring{xml_nodeset (1)} [1] <p>{\n "symbol": "ABI",\n "open": 21.04,\n "high": 21.05,\n "low": 20.06,\n "close": 20.2,\n "volume": 938700,\n "from": "2005-01-04"\n}</p> But I can't find a way to isolate the individual elements such as symbol, open, etc. I'll bet there is someone out there with a lot more experience at html parsing than me who can see a way to solve this in minutes. Any guidance would be appreciated. --John Sparks [[alternative HTML version deleted]]
? Sun, 5 Jan 2025 20:03:11 +0000 "Sparks, John via R-help" <r-help at r-project.org> ?????:> > mystring > {xml_nodeset (1)} > [1] <p>{\n "symbol": "ABI",\n "open": 21.04,\n "high": 21.05,\n > "low": 20.06,\n "close": 20.2,\n "volume": 938700,\n "from": > "2005-01-04"\n}</p> > > But I can't find a way to isolate the individual elements such as > symbol, open, etc.This is a JSON string inside a <p> tag. It's not how a web page is normally constructed (the <p> tag is usually for human-readable text, not machine readable JSON), but the good news is that mystring |> xml_text() |> jsonlite::parse_json() should give you a named list with the desired contents. -- Best regards, Ivan
On Sun, Jan 5, 2025 at 2:03?PM Sparks, John via R-help <r-help at r-project.org> wrote:> > Hi, > > I am scraping some data from the internet and I got what I want but in a big long string (sort of) and can't figure out a way to parse it. >What site are you scraping? There may be an easier way to get the data on it.> What I have gotten to is > > > mystring > {xml_nodeset (1)} > [1] <p>{\n "symbol": "ABI",\n "open": 21.04,\n "high": 21.05,\n "low": 20.06,\n "close": 20.2,\n "volume": 938700,\n "from": "2005-01-04"\n}</p> > > But I can't find a way to isolate the individual elements such as symbol, open, etc. > > I'll bet there is someone out there with a lot more experience at html parsing than me who can see a way to solve this in minutes. > > Any guidance would be appreciated. > > --John Sparks > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com