Hello,
I don't know if this is general purpose but try
x <- scan(what = "character", text="
L1o3maxG10
L1o3P10
L2o3G10
noxP10
pm25S_01
comeanS_03
noxP_04")
fun <- function(x){
r1 <- unlist(strsplit(x, "L[[:digit:]]+|G|P|S"))
r1 <- r1[nchar(r1) != 0]
r1 <- r1[rep(c(TRUE, FALSE), length(r1)/2)]
r1 <- unlist(strsplit(r1, "max|mean"))
r1 <- r1[nchar(r1) != 0]
r2 <- integer(length(x))
w2 <- grep("L[[:digit:]]+", x)
re2 <- regexpr("L[[:digit:]]+", x)
re2 <- unlist(strsplit(regmatches(x, re2), "L"))
re2 <- re2[nchar(re2) != 0]
r2[w2] <- re2
w2 <- grep("G_|P_|S_", x)
re2 <- regmatches(x, regexpr("(G_|P_|S_)[[:digit:]]+", x))
re2 <- unlist(strsplit(re2, "G_|P_|S_"))
re2 <- re2[nchar(re2) != 0]
r2[w2] <- re2
r3 <- regmatches(x, regexpr("G|P|S", x))
data.frame(r1, r2, r3)
}
fun(x)
Hope this helps,
Rui Barradas
Em 16-11-2012 00:05, Zlatan escreveu:> I need to split a data frame into 3 columns. The column I want to split
> contains indices of lag (prefix L1 or L2 and suffix 01, 03, 04), station
> name (shown in the sample data as capitalized G, P and S) and pollutant
> name. Names with no ?L? prefix or 01/04 suffix are lag 0. Lag 01 is average
> of lag 0 and 1, and 04 is average of 0 to 4 days. How can one do that in R?
> I will ignore the other components( e.g. 10 , max or mean)
>
>
>
> Current stand
>
> L1o3maxG10
> L1o3P10
> L2o3G10
> noxP10
> pm25S_01
> comeanS_03
> noxP_04
>
> What I want to get :
>
> pollutant Lag station
> o3 1 G
> o3 1 P
> o3 2 G
> nox 0 P
> Pm25 01 S
> co 03 S
> nox 04 P
>
>
> Thanks
>
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Split-data-frame-and-create-a-new-column-tp4649683.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.