Hi,
It would be better if you provided the output of dput(dataset).? I am not sure
about the structure of your dataset.
Just from reading the data as is shown.
dat1<- read.table(text="
separator,tissID>,>,2
,2,1
,6,5
,11,13>,>,4
,4,9
,6,2
,7,3
,21,1
,23,58
,25,9
,26,4>,>,11
,1,12>,>,21
,4,1
,11,3
",sep=",",header=TRUE,stringsAsFactors=FALSE,row.names=NULL)
indx<-which(grepl(">",dat1[,1]))
indx1<-diff(c(indx,nrow(dat1)+1))
res1<-do.call(rbind,lapply(seq_along(indx),function(i)
{x1<-dat1[indx[i]:(indx[i]+(indx1[i]-1)),];x1[-1,1]<- x1[1,3];x1}))
res2<- as.matrix(res1[,-1])
row.names(res2)<- res1[,1]
?res2
#?? separator tissID
#>? ">"?????? " 2"?
#2? "2"?????? " 1"?
#2? "6"?????? " 5"?
#2? "11"????? "13"?
#>? ">"?????? " 4"?
#4? "4"?????? " 9"?
#4? "6"?????? " 2"?
#4? "7"?????? " 3"?
#4? "21"????? " 1"?
#4? "23"????? "58"?
#4? "25"????? " 9"?
#4? "26"????? " 4"?
#>? ">"?????? "11"?
#11 "1"?????? "12"?
#>? ">"?????? "21"?
#21 "4"?????? " 1"?
#21 "11"????? " 3"?
A.K.
Hello,
I hope this is not too stupid a question, as I'm still new to R
(had a couple of days of tutorials this week, so still very wet behind
the ears).
A sample of my problem is here:
separator tissID > > 2
? ? ? ? 2 1
??? ? ? 6 5
? ? ? ? 11 13 > > 4
? ? ? ? 4 9
? ? ? ? 6 2
? ? ? ? 7 3
? ? ? ? 21 1
? ? ? ? 23 58
? ? ? ? 25 9
? ? ? ? 26 4 > > 11
? ? ? ? 1 12 > > 21
? ? ? ? 4 1
? ? ? ? 11 3
I have a table of data I can load into R no problem. What I'm
trying to do is for all those empty cells in the first column, replace
them with the value that is next to the second chevron for the
corresponding range (everything up until the next chevron). So it would
look like:
sep tissID > > 2
2 2 1
2 6 5
2 11 13 > > 4
4 4 9
4 6 2
4 7 3
4 21 1
4 23 58
4 25 9
4 26 4 > > 11
11 1 12 > > 21
21 4 1
21 11 3
(actually, I have to do something else with that value by
appending a piece of text in front, but you get the idea). So far, my
idea was to try:
for (i in 1:(length(targrow)-1)) {
label <- test2[targrow[i],3]
start <- targrow[i]+1
end <- targrow[i+1]-1
test2[start:end,1] <- label
}
where test2 is the read, delimited matrix (verified the columns
and such are properly formatted), and targrow was a vector I generated,
searching the main table to identify the rows that have those chevrons.
This returns an error though and it seems whatever I type to try to
change that label (text, number, whatever) returns the error message:
1: In `[<-.factor`(`*tmp*`, iseq, value = c(137L, 137L, 137L, 137L, ?:
? invalid factor level, NA generated
repeated multiple times for however many entries I'm using in my
test case. If I try manually outside of a loop (ie just
test2[start:end,1] <- 'test' for example) it works. I presume I have
overlooked something in terms of variable properties or something, that
it doesn't work in the loop. This has to be done for a table with about
half a million entries, hence my interest in finding a way to automate
the process. Any suggestions (specific to this code, or if there's
another way - I know, I have the feeling that what I've come up with
already isn't exactly elegant, but I was trying to debug) would be most
welcome.