Dimitri Liakhovitski
2009-Sep-04 12:57 UTC
[R] transforming a badly organized data base into a list of data frames
Dear R-ers! I have a badly organized data base in Excel. Once I read it into R it looks like this (all variables become factors because of many spaces and other characters in Excel): x<-data.frame(A=c("","Name1","text1","text2","text3","","","","Name2","text1","text2","text3","","","Name3","text1","text2","text3","","Name1","text1","text2","text3",""), B=c("","",1,2,3,"","","","",4,5,6,"","","",7,8,9,"","",3,3,3,""),C=c("","",1,1,2,"","","","",5,6,5,"","","",9,8,7,"","",2,2,2,""), D=c("","",3,2,1,"","","","",6,5,4,"","","",7,7,8,"","",1,1,1,"")) x I was wondering if it's at all possible to grab the data blocks from this file and put them into a list, e.g., so that the first element of the list is the data frame: Name1 1 1 3 Name12 1 2 Name13 2 1 second element of the list is the data frame Name2 4 5 6 Name2 5 6 5 Name2 6 5 4 etc. Notice that all blocks have the same number of rows. I have several challenges: 1. I cannot use the row number because the number of irrelevant lines between blocks is always different. 2. I'd love to replace all the "text1", "text2", "text3" strings (which are always the same within each block) with the name that is one cell above it. 3. All variables are now factors - I am afraid if I recode them later as "as numeric" - they numbers will change their values. Thank you very much for your suggestions! -- Dimitri Liakhovitski Ninah.com Dimitri.Liakhovitski at ninah.com
Henrique Dallazuanna
2009-Sep-04 13:25 UTC
[R] transforming a badly organized data base into a list of data frames
Try this: lapply(grep("Name", x$A), function(idx)x[idx + 1:3,]) On Fri, Sep 4, 2009 at 9:57 AM, Dimitri Liakhovitski <ld7631@gmail.com>wrote:> Dear R-ers! > > I have a badly organized data base in Excel. Once I read it into R it > looks like this (all variables become factors because of many spaces > and other characters in Excel): > > > x<-data.frame(A=c("","Name1","text1","text2","text3","","","","Name2","text1","text2","text3","","","Name3","text1","text2","text3","","Name1","text1","text2","text3",""), > > B=c("","",1,2,3,"","","","",4,5,6,"","","",7,8,9,"","",3,3,3,""),C=c("","",1,1,2,"","","","",5,6,5,"","","",9,8,7,"","",2,2,2,""), > D=c("","",3,2,1,"","","","",6,5,4,"","","",7,7,8,"","",1,1,1,"")) > x > > I was wondering if it's at all possible to grab the data blocks from > this file and put them into a list, e.g., so that the first element of > the list is the data frame: > > Name1 1 1 3 > Name12 1 2 > Name13 2 1 > > second element of the list is the data frame > > Name2 4 5 6 > Name2 5 6 5 > Name2 6 5 4 > > etc. > Notice that all blocks have the same number of rows. > > I have several challenges: > 1. I cannot use the row number because the number of irrelevant lines > between blocks is always different. > 2. I'd love to replace all the "text1", "text2", "text3" strings > (which are always the same within each block) with the name that is > one cell above it. > 3. All variables are now factors - I am afraid if I recode them later > as "as numeric" - they numbers will change their values. > > Thank you very much for your suggestions! > > -- > Dimitri Liakhovitski > Ninah.com > Dimitri.Liakhovitski@ninah.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
Gabor Grothendieck
2009-Sep-04 13:29 UTC
[R] transforming a badly organized data base into a list of data frames
Try this: x2 <- subset(x, A != "") split(x2, cumsum(x2$B == "")) On Fri, Sep 4, 2009 at 8:57 AM, Dimitri Liakhovitski<ld7631 at gmail.com> wrote:> Dear R-ers! > > I have a badly organized data base in Excel. Once I read it into R it > looks like this (all variables become factors because of many spaces > and other characters in Excel): > > x<-data.frame(A=c("","Name1","text1","text2","text3","","","","Name2","text1","text2","text3","","","Name3","text1","text2","text3","","Name1","text1","text2","text3",""), > B=c("","",1,2,3,"","","","",4,5,6,"","","",7,8,9,"","",3,3,3,""),C=c("","",1,1,2,"","","","",5,6,5,"","","",9,8,7,"","",2,2,2,""), > D=c("","",3,2,1,"","","","",6,5,4,"","","",7,7,8,"","",1,1,1,"")) > x > > I was wondering if it's at all possible to grab the data blocks from > this file and put them into a list, e.g., so that the first element of > the list is the data frame: > > Name1 1 1 3 > Name12 1 2 > Name13 2 1 > > second element of the list is the data frame > > Name2 4 5 6 > Name2 5 6 5 > Name2 6 5 4 > > etc. > Notice that all blocks have the same number of rows. > > I have several challenges: > 1. I cannot use the row number because the number of irrelevant lines > between blocks is always different. > 2. I'd love to replace all the "text1", "text2", "text3" strings > (which are always the same within each block) with the name that is > one cell above it. > 3. All variables are now factors - I am afraid if I recode them later > as "as numeric" - they numbers will change their values. > > Thank you very much for your suggestions! > > -- > Dimitri Liakhovitski > Ninah.com > Dimitri.Liakhovitski at ninah.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >