I have a dataFrame
sID <- c("a", "1,2,3", "b", "4,5,6")
rID <- c("shr1125", "bwr331", "bwr330",
"vjhr1022")
tmp <- data.frame(cbind(sID,rID))
but I need to split tmp$sID into three different columns, filling locations
where tmp$sID has only one value with NA.
I can split tmp$sID by the comma
tmp.1 <- strsplit(tmp$sID, ",")
but I can't figure out how to convert the resulting list into a dataFrame.
Ideally, tmp will become four columns wide, something like
sID.a sID.b sID.c rID
NA NA a shr1125
1 2 3 bwr331
NA NA b bwr330
4 5 6 vjhr1022
Thoughts or suggestions?
I tried
havecomma - grep(',', tmp$sID)
for( i in 1:nrow(tmp)){
if (!(tmp[i,] %in% havecomma)){
tmp$sID[i] <- paste(', ,', tmp$sID[i], sep="")
}
}
and thought that I might be able to force the list into a dataframe once
each component had three items, but it just seemed to apply the paste()
function to everything which gave me a list with varying numbers of items.
I'm stuck.
Thanks for your help -
SR
Steven H. Ranney
[[alternative HTML version deleted]]
Try,
sID <- c("a", "1,2,3", "b", "4,5,6")
tmp1 <- strsplit(sID,',')
tmp2 <- lapply(tmp1,
function(x) if (length(x)==1) c('','',x) else x )
tmp3 <- matrix(unlist(tmp2),ncol=3, byrow=TRUE)
rID <- c("shr1125", "bwr331", "bwr330",
"vjhr1022")
newdf <- data.frame(cbind(tmp3,rID))
You'll need to name the first three columns.
As an aside, note that you don't need the cbind in your
data.frame(cbind(sID,rID))
because
data.frame(sID,rID)
does just as well.
But cbind is needed in my example, because tmp3 is a matrix.
-Don
--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
On 8/13/13 12:09 PM, "Steven Ranney" <steven.ranney at
gmail.com> wrote:
>I have a dataFrame
>
>sID <- c("a", "1,2,3", "b",
"4,5,6")
>rID <- c("shr1125", "bwr331", "bwr330",
"vjhr1022")
>
>tmp <- data.frame(cbind(sID,rID))
>
>but I need to split tmp$sID into three different columns, filling
>locations
>where tmp$sID has only one value with NA.
>
>I can split tmp$sID by the comma
>
>tmp.1 <- strsplit(tmp$sID, ",")
>
>but I can't figure out how to convert the resulting list into a
dataFrame.
>
>Ideally, tmp will become four columns wide, something like
>
>sID.a sID.b sID.c rID
>NA NA a shr1125
>1 2 3 bwr331
>NA NA b bwr330
>4 5 6 vjhr1022
>
>Thoughts or suggestions?
>
>I tried
>
>havecomma - grep(',', tmp$sID)
>
>for( i in 1:nrow(tmp)){
> if (!(tmp[i,] %in% havecomma)){
> tmp$sID[i] <- paste(', ,', tmp$sID[i], sep="")
> }
> }
>
>and thought that I might be able to force the list into a dataframe once
>each component had three items, but it just seemed to apply the paste()
>function to everything which gave me a list with varying numbers of items.
>
>I'm stuck.
>
>Thanks for your help -
>
>SR
>
>
>
>
>
>
>Steven H. Ranney
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
Hi,
You could try:
tmp[,1]<- as.character(tmp[,1])
?tmp[,1][-grep(",",tmp[,1])]<-paste0(",,",tmp[,1][-grep(",",tmp[,1])])
tmp2<-data.frame(read.table(text=tmp[,1],sep=",",header=FALSE,stringsAsFactors=FALSE),rID=tmp[,2],stringsAsFactors=FALSE)
? colnames(tmp2)[1:3]<-paste("sID",letters[1:3],sep=".")
tmp2
#? sID.a sID.b sID.c????? rID
#1??? NA??? NA???? a? shr1125
#2???? 1???? 2???? 3?? bwr331
#3??? NA??? NA???? b?? bwr330
#4???? 4???? 5???? 6 vjhr1022
BTW,
?data.frame(sID,rID,stringsAsFactors=FALSE)#cbind is not needed.? In this case,
it is okay,
#??? sID????? rID
#1???? a? shr1125
#2 1,2,3?? bwr331
#3???? b?? bwr330
#4 4,5,6 vjhr1022
#But if they were of different class:
str(data.frame(cbind(sID,Col2=1:4),stringsAsFactors=FALSE))
#'data.frame':??? 4 obs. of? 2 variables:
# $ sID : chr? "a" "1,2,3" "b" "4,5,6"
# $ Col2: chr? "1" "2" "3" "4"
?str(data.frame(sID,Col2=1:4,stringsAsFactors=FALSE))
#'data.frame':??? 4 obs. of? 2 variables:
# $ sID : chr? "a" "1,2,3" "b" "4,5,6"
# $ Col2: int? 1 2 3 4
A.K.
----- Original Message -----
From: Steven Ranney <steven.ranney at gmail.com>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc:
Sent: Tuesday, August 13, 2013 3:09 PM
Subject: [R] Convert list with missing values to dataFrame
I have a dataFrame
sID <- c("a", "1,2,3", "b", "4,5,6")
rID <- c("shr1125", "bwr331", "bwr330",
"vjhr1022")
tmp <- data.frame(cbind(sID,rID))
but I need to split tmp$sID into three different columns, filling locations
where tmp$sID has only one value with NA.
I can split tmp$sID by the comma
tmp.1 <- strsplit(tmp$sID, ",")
but I can't figure out how to convert the resulting list into a dataFrame.
Ideally, tmp will become four columns wide, something like
sID.a? sID.b? sID.c? rID
NA? ? NA? ? a? ? ? ? shr1125
1? ? ? ? 2? ? ? 3? ? ? ? bwr331
NA? ? NA? ? b? ? ? bwr330
4? ? ? ? 5? ? ? ? 6? ? ? vjhr1022
Thoughts or suggestions?
I tried
havecomma - grep(',', tmp$sID)
for( i in 1:nrow(tmp)){
? if (!(tmp[i,] %in% havecomma)){
? ? tmp$sID[i] <- paste(', ,', tmp$sID[i], sep="")
? ? }
? ? }
and thought that I might be able to force the list into a dataframe once
each component had three items, but it just seemed to apply the paste()
function to everything which gave me a list with varying numbers of items.
I'm stuck.
Thanks for your help -
SR
Steven H. Ranney
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.