HI,
One problem with using ?subst() would be it depends upon the number of digits,
characters etc.?
For eg.
substring("-005-190",6)
#[1] "190"
?substring("-0057-190",6)
#[1] "-190"
#whereas
gsub("^-[^-]*-","","-0057-190")
#[1] "190"
Probably, your dataset doesn't have that sort of problem.
dat1<- read.table(text="
project boro
123 m
134 k
123 m
123 m
543 q
543 q
134 k
",sep="",header=TRUE,stringsAsFactors=FALSE)
?res<-split(dat1,gsub("\\.","",as.character(interaction(dat1[,2],dat1[,1]))))
?res
$k134
#? project boro
#2???? 134??? k
#7???? 134??? k
#
#$m123
?# project boro
#1???? 123??? m
#3???? 123??? m
#4???? 123??? m
#
#$q543
?# project boro
#5???? 543??? q
#6???? 543??? q
?str(res$k134)
#'data.frame':??? 2 obs. of? 2 variables:
# $ project: int? 134 134
# $ boro?? : chr? "k" "k"
A.K.
I was able to split the extraneous stuff using
a<-substring(Project_NBR, first=6)
and then cbind to add the edited column to the df. I have a
sample but I am not sure how to provide it to you. I will try to produce
an example that's similar to what I have:
project boro
123 m
134 k
123 m
123 m
543 q
543 q
134 k
Basically I am trying to subset the data frame according to
project and boro with the name of the subset being boro-project (ex.
m123, k134)
I hope this provides more clarity to my problem.
----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc:
Sent: Wednesday, July 17, 2013 11:06 AM
Subject: Re: Splitting dataframes and cleaning extraneous characters
Hi,
YOu could try.
?split()
split(ats,ats$Project_NBR)
You also mentioned about two columns.
split(ats,list(ats$col1, ats$col2))
You should have provided an example dataset using ?dput() ( dput(head(data,10))
) for testing.
Also,
gsub("^-[^-]*-","","-005-190")
#[1] "190"
A.K.
Problem: I have a large data set and need to separate based on factors
in 2 columns. The final output would be a collection of dataframes
renamed to
the corresponding factor levels. ?
So far I know that for each corresponding factor I can execute
x190<-ats[which(Project_NBR=='-005-190'),]
However there are about 400 factors needing to be separated.
Also, I would like to remove the "-005-". ?Any guidance will be
greatly
appreciated. ?