As others have pointed out its close to XML but not quite
there; however, you could use strapply in gsubfn to extract
the data. It pulls out the data matching the regular expression
giving vector, vec, consisting of: date price date price ...
Pulling out even and odd elements separately and
converting them to Date and numeric, respectively, gives the
resulting data.frame.
See
http://gsubfn.googlecode.com
for more on the gsubfn package and
the three zoo vignettes in the zoo package for more on it.
Lines <- '- <Temp diffgr:id="Temp14"
msdata:rowOrder="13">
<Date>2005-01-17T00:00:00+05:30</Date>
<SecurityID>10149</SecurityID>
<PriceClose>1288.40002</PriceClose>
</Temp>
- <Temp diffgr:id="Temp15" msdata:rowOrder="14">
<Date>2005-01-18T00:00:00+05:30</Date>
<SecurityID>10149</SecurityID>
<PriceClose>1291.69995</PriceClose>
</Temp>
- <Temp diffgr:id="Temp16" msdata:rowOrder="15">
<Date>2005-01-19T00:00:00+05:30</Date>
<SecurityID>10149</SecurityID>
<PriceClose>1288.19995</PriceClose>
</Temp>'
library(gsubfn)
vec <- strapply(Lines, "....-..-..|[0-9]+[.][0-9]+")[[1]]
ix <- seq_along(vec) %% 2 == 1
DF <- data.frame(date = as.Date(vec[ix]), price = as.numeric(vec[!ix]))
# or, instead of the last line, you could convert it to a zoo object so
# that its in a more convenient form for time series manipulation:
library(zoo)
z <- zoo(as.numeric(vec[!ix]), as.Date(vec[ix]))
On Wed, Nov 5, 2008 at 1:22 AM, RON70 <ron_michael70 at yahoo.com>
wrote:>
> Hi everyone,
>
> I have this kind of raw dataset :
>
> - <Temp diffgr:id="Temp14" msdata:rowOrder="13">
> <Date>2005-01-17T00:00:00+05:30</Date>
> <SecurityID>10149</SecurityID>
> <PriceClose>1288.40002</PriceClose>
> </Temp>
> - <Temp diffgr:id="Temp15" msdata:rowOrder="14">
> <Date>2005-01-18T00:00:00+05:30</Date>
> <SecurityID>10149</SecurityID>
> <PriceClose>1291.69995</PriceClose>
> </Temp>
> - <Temp diffgr:id="Temp16" msdata:rowOrder="15">
> <Date>2005-01-19T00:00:00+05:30</Date>
> <SecurityID>10149</SecurityID>
> <PriceClose>1288.19995</PriceClose>
> </Temp>
>
> I was looking for some R procedure to extract data from this, that should
be
> in following format :
>
> 2005-01-17 1288.40002
> 2005-01-18 1291.69995
> 2005-01-19 1288.19995
>
> Can R help me to do this?
>
> --
> View this message in context:
http://www.nabble.com/How-to-extract-following-data-tp20336690p20336690.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>