Hello R community, I am trying to combine two CSV files that look like this: File A Row_ID_CR, Data1, Data2, Data3 1, aa, bb, cc 2, dd, ee, ff File B Row_ID_N, Src_Row_ID, DataN1 1a, 1, This is comment 1 2a, 1, This is comment 2 3a, 2, This is comment 1 4a, 1, This is comment 3 And the output I am looking for is, comparing the values of Row_ID_CR and Src_Row_ID Output ROW_ID_CR, Data1, Data2, Data3, DataComment1, DataComment2, DataComment3 1, aa, bb, cc, This is comment1, This is comment2, This is comment 3 2, dd, ee, ff, This is comment1 I am a novice R user, I am able to replicate a left join but I need a bit more in the final result. Thanks!! [[alternative HTML version deleted]]
try this:> fileA <- read.csv(text = "Row_ID_CR, Data1, Data2, Data3+ 1, aa, bb, cc + 2, dd, ee, ff", as.is = TRUE)> > fileB <- read.csv(text = "Row_ID_N, Src_Row_ID, DataN1+ 1a, 1, This is comment 1 + 2a, 1, This is comment 2 + 3a, 2, This is comment 1 + 4a, 1, This is comment 3", as.is = TRUE)> > # get rid of leading/trailing blanks on comments > fileB$DataN1 <- gsub("^ *| *$", "", fileB$DataN1) > > # merge together > result <- merge(fileA, fileB, by.x = 'Row_ID_CR', by.y = "Src_Row_ID") > > # now partition by Row_ID_CR and aggregate the comments > result2 <- do.call(rbind,+ lapply(split(result, result$Row_ID_CR), function(.grp){ + cbind(.grp[1L, -c(5,6)], comment = paste(.grp$DataN1, collapse ', ')) + }) + )> result2Row_ID_CR Data1 Data2 Data3 comment 1 1 aa bb cc This is comment 1, This is comment 2, This is comment 3 2 2 dd ee ff This is comment 1>On Mon, Jun 10, 2013 at 4:38 PM, Shreya Rawal <rawal.shreya@gmail.com>wrote:> Hello R community, > > I am trying to combine two CSV files that look like this: > > File A > > Row_ID_CR, Data1, Data2, Data3 > 1, aa, bb, cc > 2, dd, ee, ff > > > File B > > Row_ID_N, Src_Row_ID, DataN1 > 1a, 1, This is comment 1 > 2a, 1, This is comment 2 > 3a, 2, This is comment 1 > 4a, 1, This is comment 3 > > And the output I am looking for is, comparing the values of Row_ID_CR and > Src_Row_ID > > Output > > ROW_ID_CR, Data1, Data2, Data3, DataComment1, > DataComment2, DataComment3 > 1, aa, bb, cc, This is > comment1, This is comment2, This is comment 3 > 2, dd, ee, ff, This is > comment1 > > > I am a novice R user, I am able to replicate a left join but I need a bit > more in the final result. > > > Thanks!! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. [[alternative HTML version deleted]]
Hi, Try this: dat1<-read.table(text=" Row_ID_CR,? Data1,??? Data2,??? Data3 1,????????????????? aa,????????? bb,????????? cc 2,????????????????? dd,????????? ee,????????? ff ",sep=",",header=TRUE,stringsAsFactors=FALSE) dat2<-read.table(text=" Row_ID_N,? Src_Row_ID,? DataN1 1a,????????????? 1,????????????????? This is comment 1 2a,????????????? 1,????????????????? This is comment 2 3a,????????????? 2,????????????????? This is comment 1 4a,????????????? 1,????????????????? This is comment 3 ",sep=",",header=TRUE,stringsAsFactors=FALSE) library(stringr) dat2$DataN1<-str_trim(dat2$DataN1) res<- merge(dat1,dat2,by.x=1,by.y=2) ?res1<-res[,-5] library(plyr) ?res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize, DataN1=list(DataN1)) ?res2 ?# Row_ID_CR??????????????? Data1??????? Data2??????? Data3 #1???????? 1?????????????????? aa?????????? bb?????????? cc #2???????? 2?????????????????? dd?????????? ee?????????? ff #?????????????????????????????????????????????????? DataN1 #1 This is comment 1, This is comment 2, This is comment 3 #2?????????????????????????????????????? This is comment 1 res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x) {x[duplicated(x)]<-NA;x}))) ?colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))])) res3 #? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1 #1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1 #2???????? 2?????????????????? dd?????????? ee?????????? ff This is comment 1 #?????? DataComment2????? DataComment3 #1 This is comment 2 This is comment 3 #2????????????? <NA>????????????? <NA> A.K. ----- Original Message ----- From: Shreya Rawal <rawal.shreya at gmail.com> To: r-help at r-project.org Cc: Sent: Monday, June 10, 2013 4:38 PM Subject: [R] Combining CSV data Hello R community, I am trying to combine two CSV files that look like this: File A Row_ID_CR,? Data1,? ? Data2,? ? Data3 1,? ? ? ? ? ? ? ? ? aa,? ? ? ? ? bb,? ? ? ? ? cc 2,? ? ? ? ? ? ? ? ? dd,? ? ? ? ? ee,? ? ? ? ? ff File B Row_ID_N,? Src_Row_ID,? DataN1 1a,? ? ? ? ? ? ? 1,? ? ? ? ? ? ? ? ? This is comment 1 2a,? ? ? ? ? ? ? 1,? ? ? ? ? ? ? ? ? This is comment 2 3a,? ? ? ? ? ? ? 2,? ? ? ? ? ? ? ? ? This is comment 1 4a,? ? ? ? ? ? ? 1,? ? ? ? ? ? ? ? ? This is comment 3 And the output I am looking for is, comparing the values of Row_ID_CR and Src_Row_ID Output ROW_ID_CR,? ? Data1,? ? Data2,? ? Data3,? ? DataComment1, DataComment2,? ? ? ? ? DataComment3 1,? ? ? ? ? ? ? ? ? ? ? aa,? ? ? ? bb,? ? ? ? cc,? ? ? ? This is comment1,? ? This is comment2,? ? This is comment 3 2,? ? ? ? ? ? ? ? ? ? ? dd,? ? ? ? ? ee,? ? ? ? ff,? ? ? ? ? This is comment1 I am a novice R user, I am able to replicate a left join but I need a bit more in the final result. Thanks!! ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
HI, You could use: result3<- data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE) colnames(result3)[5:7]<- paste0("DataComment",1:3) A.K. ________________________________ From: Shreya Rawal <rawal.shreya at gmail.com> To: arun <smartpink111 at yahoo.com> Sent: Tuesday, June 11, 2013 4:22 PM Subject: Re: [R] Combining CSV data Hey Arun, I guess you could guide me with this a little bit. I have been working on the solution Jim suggested (and also because that I could understand it with my little knowledge of R :)) So with these commands I am able to get the data in this format:> fileA <- read.csv(text = "Row_ID_CR, ? Data1, ? ?Data2, ? ?Data3+ 1, ? ? ? ? ? ? ? ? ? aa, ? ? ? ? ?bb, ? ? ? ? ?cc + 2, ? ? ? ? ? ? ? ? ? dd, ? ? ? ? ?ee, ? ? ? ? ?ff",?as.is?= TRUE)>? > fileB <- read.csv(text = "Row_ID_N, ? Src_Row_ID, ? DataN1+ 1a, ? ? ? ? ? ? ? 1, ? ? ? ? ? ? ? ? ? This is comment 1 + 2a, ? ? ? ? ? ? ? 1, ? ? ? ? ? ? ? ? ? This is comment 2 + 3a, ? ? ? ? ? ? ? 2, ? ? ? ? ? ? ? ? ? This is comment 1 + 4a, ? ? ? ? ? ? ? 1, ? ? ? ? ? ? ? ? ? This is comment 3",?as.is?= TRUE)>? > # get rid of leading/trailing blanks on comments > fileB$DataN1 <- gsub("^ *| *$", "", fileB$DataN1) >? > # merge together > result <- merge(fileA, fileB, by.x = 'Row_ID_CR', by.y = "Src_Row_ID") >? > # now partition by Row_ID_CR and aggregate the comments > result2 <- do.call(rbind,?+ ? ? lapply(split(result, result$Row_ID_CR), function(.grp){ + ? ? ? ? cbind(.grp[1L, -c(5,6)], comment = paste(.grp$DataN1, collapse = '|')) + ? ? }) + ) Row_ID_CR ? ? ? ? ? ? ? ? Data1 ? ? ? ?Data2 ? ? ? ?Data3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? comment 1 ? ? ? ? 1 ? ? ? ? ? ? ? ? ? ?aa ? ? ? ? ? bb ? ? ? ? ? cc ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?This is comment 1| This is comment 2| This is comment 3 2 ? ? ? ? 2 ? ? ? ? ? ? ? ? ? ?dd ? ? ? ? ? ee ? ? ? ? ? ff ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?This is comment 1| This is Comment 2 I can even split the last column by this:?strsplit(as.character(result2$comment), split='\\|') [[1]] [1] "This is comment 1" "This is comment 2" " This is comment 3" [[2]] [1] "This is comment 1" "This is comment 2" but now I am not sure how to combine everything together. I guess by now you must have realized how new I am to R :) Thanks!! Shreya On Tue, Jun 11, 2013 at 1:02 PM, arun <smartpink111 at yahoo.com> wrote: Hi,>If the dataset is like this with the comments in the order: > >dat2<-read.table(text=" >Row_ID_N,? Src_Row_ID,? DataN1 >1a,????????????? 1,????????????????? This is comment 1 >2a,????????????? 1,????????????????? This is comment 2 >3a,????????????? 2,????????????????? This is comment 1 >4a,????????????? 1,????????????????? This is comment 3 >",sep=",",header=TRUE,stringsAsFactors=FALSE) > >dat3<-read.table(text=" >Row_ID_N,? Src_Row_ID,? DataN1 >1a,????????????? 1,????????????????? This is comment 1 >2a,????????????? 1,????????????????? This is comment 2 >3a,????????????? 2,????????????????? This is comment 1?? # > >4a,????????????? 1,????????????????? This is comment 3 >5a,???????? 2,????????????????? This is comment 2? # > >",sep=",",header=TRUE,stringsAsFactors=FALSE) > > >library(stringr) >library(plyr) >fun1<- function(data1,data2){ >??? data2$DataN1<- str_trim(data2$DataN1)?? >??????? res<- merge(data1,data2,by.x=1,by.y=2) >??? res1<- res[,-5] >??? res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1)) >??? Mx1<- max(sapply(res2[,5],length)) >??? res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){ >????????????????????????????????? c(x,rep(NA,Mx1-length(x))) > >????????????????????????????????? })),stringsAsFactors=FALSE) >??? colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))])) >??? res3 >??? }??? > >????? >fun1(dat1,dat2) >#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1 >#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1 > >#2???????? 2?????????????????? dd?????????? ee?????????? ff This is comment 1 >#?????? DataComment2????? DataComment3 >#1 This is comment 2 This is comment 3 >#2????????????? <NA>????????????? <NA> > >?fun1(dat1,dat3) >#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1 >#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1 > >#2???????? 2?????????????????? dd?????????? ee?????????? ff This is comment 1 >?# ???? DataComment2????? DataComment3 >#1 This is comment 2 This is comment 3 > >#2 This is comment 2????????????? <NA> > > >Otherwise, you need to provide an example that matches the real dataset. >A.K. > >________________________________ >From: Shreya Rawal <rawal.shreya at gmail.com> >To: arun <smartpink111 at yahoo.com> >Cc: R help <r-help at r-project.org> >Sent: Tuesday, June 11, 2013 12:22 PM > >Subject: Re: [R] Combining CSV data > > > >Hi Arun, > >Thanks for your reply. Unfortunately the Comments are just text in the real data. There is no way to differentiate based on the value of the Comments column. I guess because of that reason I couldn't get your solution to work properly. Do you think I can try it for a more general case where we don't merger/split the comments based on the values? > >Thanks for your help, I appreciate! ?? > > > >On Mon, Jun 10, 2013 at 10:14 PM, arun <smartpink111 at yahoo.com> wrote: > >HI, >>I am not sure about your DataN1 column.? If there is any identifier to differentiate the comments (in this case 1,2,3), then it will easier to place that in the correct column. >>? My previous solution is not helpful in situations like these: >> >>dat2<-read.table(text=" >>Row_ID_N,? Src_Row_ID,? DataN1 >>1a,????????????? 1,????????????????? This is comment 1 >>2a,????????????? 1,????????????????? This is comment 2 >>3a,????????????? 2,????????????????? This is comment 2 >>4a,????????????? 1,????????????????? This is comment 3 >>",sep=",",header=TRUE,stringsAsFactors=FALSE) >>dat3<-read.table(text=" >> >>Row_ID_N,? Src_Row_ID,? DataN1 >>1a,????????????? 1,????????????????? This is comment 1 >>2a,????????????? 1,????????????????? This is comment 2 >>3a,????????????? 2,????????????????? This is comment 3 >>4a,????????????? 1,????????????????? This is comment 3 >>5a,??? ??? ?2,????????????????? This is comment 2 >>",sep=",",header=TRUE,stringsAsFactors=FALSE) >> >> >>library(stringr) >>library(plyr) >>fun1<- function(data1,data2){ >>??? data2$DataN1<- str_trim(data2$DataN1)??? >>??????? res<- merge(data1,data2,by.x=1,by.y=2) >>??? res1<- res[,-5] >>??? res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1)) >>??? Mx1<- max(sapply(res2[,5],length)) >>??? res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){ >>??? ??? ??? ??? ??? ??? ??? ??? ? indx<- as.numeric(gsub("[[:alpha:]]","",x)) >>??? ??? ??? ??? ??? ??? ??? ??? ? x[match(seq(Mx1),indx)] >>??? ??? ??? ??? ??? ??? ??? ??? ? })),stringsAsFactors=FALSE) >> >>??? colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))])) >>??? res3 >>??? }??? ??? ?? >>fun1(dat1,dat2) >> >>#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1 >>#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1 >>#2???????? 2?????????????????? dd?????????? ee?????????? ff????????????? <NA> >> >>#?????? DataComment2????? DataComment3 >>#1 This is comment 2 This is comment 3 >>#2 This is comment 2????????????? <NA> >>?fun1(dat1,dat3) >> >>#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1 >>#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1 >>#2???????? 2?????????????????? dd?????????? ee?????????? ff????????????? <NA> >> >>#?????? DataComment2????? DataComment3 >>#1 This is comment 2 This is comment 3 >>#2 This is comment 2 This is comment 3 >> >> >> >>A.K. >> >> >>----- Original Message ----- >> >>From: arun <smartpink111 at yahoo.com> >>To: Shreya Rawal <rawal.shreya at gmail.com> >>Cc: R help <r-help at r-project.org> >>Sent: Monday, June 10, 2013 6:41 PM >>Subject: Re: [R] Combining CSV data >> >>Hi, >>Try this: >> >>dat1<-read.table(text=" >>Row_ID_CR,? Data1,??? Data2,??? Data3 >>1,????????????????? aa,????????? bb,????????? cc >>2,????????????????? dd,????????? ee,????????? ff >>",sep=",",header=TRUE,stringsAsFactors=FALSE) >> >>dat2<-read.table(text=" >>Row_ID_N,? Src_Row_ID,? DataN1 >>1a,????????????? 1,????????????????? This is comment 1 >>2a,????????????? 1,????????????????? This is comment 2 >>3a,????????????? 2,????????????????? This is comment 1 >>4a,????????????? 1,????????????????? This is comment 3 >>",sep=",",header=TRUE,stringsAsFactors=FALSE) >>library(stringr) >>dat2$DataN1<-str_trim(dat2$DataN1) >>res<- merge(dat1,dat2,by.x=1,by.y=2) >>?res1<-res[,-5] >>library(plyr) >>?res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize, DataN1=list(DataN1)) >>?res2 >>?# Row_ID_CR??????????????? Data1??????? Data2??????? Data3 >>#1???????? 1?????????????????? aa?????????? bb?????????? cc >>#2???????? 2?????????????????? dd?????????? ee?????????? ff >>#?????????????????????????????????????????????????? DataN1 >>#1 This is comment 1, This is comment 2, This is comment 3 >>#2?????????????????????????????????????? This is comment 1 >> >> >> >>res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x) {x[duplicated(x)]<-NA;x}))) >>?colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))])) >>res3 >>#? Row_ID_CR??????????????? Data1??????? Data2??????? Data3????? DataComment1 >>#1???????? 1?????????????????? aa?????????? bb?????????? cc This is comment 1 >>#2???????? 2?????????????????? dd?????????? ee?????????? ff This is comment 1 >>#?????? DataComment2????? DataComment3 >>#1 This is comment 2 This is comment 3 >>#2????????????? <NA>????????????? <NA> >> >>A.K. >> >> >>----- Original Message ----- >>From: Shreya Rawal <rawal.shreya at gmail.com> >>To: r-help at r-project.org >>Cc: >>Sent: Monday, June 10, 2013 4:38 PM >>Subject: [R] Combining CSV data >> >>Hello R community, >> >>I am trying to combine two CSV files that look like this: >> >>File A >> >>Row_ID_CR,?? Data1,? ? Data2,? ? Data3 >>1,? ? ? ? ? ? ? ? ?? aa,? ? ? ? ? bb,? ? ? ? ? cc >>2,? ? ? ? ? ? ? ? ?? dd,? ? ? ? ? ee,? ? ? ? ? ff >> >> >>File B >> >>Row_ID_N,?? Src_Row_ID,?? DataN1 >>1a,? ? ? ? ? ? ?? 1,? ? ? ? ? ? ? ? ?? This is comment 1 >>2a,? ? ? ? ? ? ?? 1,? ? ? ? ? ? ? ? ?? This is comment 2 >>3a,? ? ? ? ? ? ?? 2,? ? ? ? ? ? ? ? ?? This is comment 1 >>4a,? ? ? ? ? ? ?? 1,? ? ? ? ? ? ? ? ?? This is comment 3 >> >>And the output I am looking for is, comparing the values of Row_ID_CR and >>Src_Row_ID >> >>Output >> >>ROW_ID_CR,? ? Data1,? ? Data2,? ? Data3,? ? DataComment1, >>DataComment2,? ? ? ? ? DataComment3 >>1,? ? ? ? ? ? ? ? ? ? ? aa,? ? ? ?? bb,? ? ? ?? cc,? ? ? ? This is >>comment1,? ? This is comment2,? ?? This is comment 3 >>2,? ? ? ? ? ? ? ? ? ? ? dd,? ? ? ? ? ee,? ? ? ?? ff,? ? ? ? ? This is >>comment1 >> >> >>I am a novice R user, I am able to replicate a left join but I need a bit >>more in the final result. >> >> >>Thanks!! >> >>??? [[alternative HTML version deleted]] >> >>______________________________________________ >>R-help at r-project.org mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. >> >> >
Hi Arun, Sorry to bother you again with this. When I was trying it out over the weekend with different data files I realized that its not exactly working the way it should. For this file B: ROW_ID |SRC_ROW_ID |NOTE 1a |1 |Comment 1 2a |2 |Comment 1a 3a |2 |Comment 2a 4a |1 |Comment 2 5a |1 |Comment 3 I am getting the following output for the comment columns: NOTE comment Comment.1a Comment.2a Comment 1a Comment 1a|Comment 2a Comment 2 Comment 3 Comment 1 Comment 1|Comment 2| Comment 3 Comment 2 Comment 3 Where as the expected result should be: NOTE comment Commentx Commentx Commentx Comment 1a Comment 1a|Comment 2a Comment 1a Comment 2a Comment 1 Comment 1|Comment 2| Comment 3 Comment 1 Comment 2 Comment 3 I am not really concerned about the column heading, they can all be same or even blank. Just the values needs to be under different columns. Thanks again!! On Wed, Jun 12, 2013 at 9:25 AM, arun <smartpink111@yahoo.com> wrote:> Hi, > No problem. > Arun > > > > > > > ________________________________ > From: Shreya Rawal <rawal.shreya@gmail.com> > To: arun <smartpink111@yahoo.com> > Cc: R help <r-help@r-project.org> > Sent: Wednesday, June 12, 2013 9:15 AM > Subject: Re: [R] Combining CSV data > > > > Ah that makes sense. This looks perfect now. Thanks for your help on this! > > > > > On Wed, Jun 12, 2013 at 9:10 AM, arun <smartpink111@yahoo.com> wrote: > > HI Shreya, > >#Looks like you run the two line code as a single line. > > > > > >result3<- > > >data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE) > > > > > >colnames(result3)[5:7]<- paste0("DataComment",1:3) > > > > result3 > > > ># Row_ID_CR Data1 Data2 Data3 > DataComment1 > >#1 1 aa bb cc This is > comment 1 > >#2 2 dd ee ff This is > comment 1 > ># DataComment2 DataComment3 > >#1 This is comment 2 This is comment 3 > >#2 <NA> <NA> > > > > > > > >A.K. > > > > > > > >________________________________ > >From: Shreya Rawal <rawal.shreya@gmail.com> > >To: arun <smartpink111@yahoo.com> > >Cc: R help <r-help@r-project.org>; jim holtman <jholtman@gmail.com> > >Sent: Wednesday, June 12, 2013 8:58 AM > > > >Subject: Re: [R] Combining CSV data > > > > > > > >Great, thanks Arun, but I seem to be running into this error. Not sure > what did I miss. > > > >> > result<-data.frame(final_ouput[,-5],read.table(text=as.character(final_output$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)colnames(result)[5:7]<-paste0("DataComment",1:3) > >Error: unexpected symbol in > "result<-data.frame(final_ouput[,-5],read.table(text=as.character(final_output$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)colnames" > > > > > > > >On Tue, Jun 11, 2013 at 5:09 PM, arun <smartpink111@yahoo.com> wrote: > > > > > >> > >> > >>HI, > >>You could use: > >>result3<- > data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE) > >>colnames(result3)[5:7]<- paste0("DataComment",1:3) > >> > >>A.K. > >>________________________________ > >>From: Shreya Rawal <rawal.shreya@gmail.com> > >>To: arun <smartpink111@yahoo.com> > >>Sent: Tuesday, June 11, 2013 4:22 PM > >> > >>Subject: Re: [R] Combining CSV data > >> > >> > >> > >>Hey Arun, > >> > >>I guess you could guide me with this a little bit. I have been working > on the solution Jim suggested (and also because that I could understand it > with my little knowledge of R :)) > >> > >>So with these commands I am able to get the data in this format: > >> > >>> fileA <- read.csv(text = "Row_ID_CR, Data1, Data2, Data3 > >>+ 1, aa, bb, cc > >>+ 2, dd, ee, ff", as.is = TRUE) > >>> > >>> fileB <- read.csv(text = "Row_ID_N, Src_Row_ID, DataN1 > >>+ 1a, 1, This is comment 1 > >>+ 2a, 1, This is comment 2 > >>+ 3a, 2, This is comment 1 > >>+ 4a, 1, This is comment 3", as.is > TRUE) > >>> > >>> # get rid of leading/trailing blanks on comments > >>> fileB$DataN1 <- gsub("^ *| *$", "", fileB$DataN1) > >>> > >>> # merge together > >>> result <- merge(fileA, fileB, by.x = 'Row_ID_CR', by.y = "Src_Row_ID") > >>> > >>> # now partition by Row_ID_CR and aggregate the comments > >>> result2 <- do.call(rbind, > >>+ lapply(split(result, result$Row_ID_CR), function(.grp){ > >>+ cbind(.grp[1L, -c(5,6)], comment = paste(.grp$DataN1, collapse > = '|')) > >>+ }) > >>+ ) > >> > >>Row_ID_CR Data1 Data2 Data3 > comment > >>1 1 aa bb cc > This is comment 1| This is comment 2| This > is comment 3 > >>2 2 dd ee ff > This is comment 1| This is Comment 2 > >> > >>I can even split the last column by > this: strsplit(as.character(result2$comment), split='\\|') > >> > >>[[1]] > >>[1] "This is comment 1" "This is comment 2" " This is comment 3" > >> > >>[[2]] > >>[1] "This is comment 1" "This is comment 2" > >> > >> > >>but now I am not sure how to combine everything together. I guess by now > you must have realized how new I am to R :) > >> > >>Thanks!! > >>Shreya > >> > >> > >> > >> > >> > >> > >>On Tue, Jun 11, 2013 at 1:02 PM, arun <smartpink111@yahoo.com> wrote: > >> > >>Hi, > >>>If the dataset is like this with the comments in the order: > >>> > >>>dat2<-read.table(text=" > >>>Row_ID_N, Src_Row_ID, DataN1 > >>>1a, 1, This is comment 1 > >>>2a, 1, This is comment 2 > >>>3a, 2, This is comment 1 > >>>4a, 1, This is comment 3 > >>>",sep=",",header=TRUE,stringsAsFactors=FALSE) > >>> > >>>dat3<-read.table(text=" > >>>Row_ID_N, Src_Row_ID, DataN1 > >>>1a, 1, This is comment 1 > >>>2a, 1, This is comment 2 > >>>3a, 2, This is comment 1 # > >>> > >>>4a, 1, This is comment 3 > >>>5a, 2, This is comment 2 # > >>> > >>>",sep=",",header=TRUE,stringsAsFactors=FALSE) > >>> > >>> > >>>library(stringr) > >>>library(plyr) > >>>fun1<- function(data1,data2){ > >>> data2$DataN1<- str_trim(data2$DataN1) > >>> res<- merge(data1,data2,by.x=1,by.y=2) > >>> res1<- res[,-5] > >>> res2<- > ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1)) > >>> Mx1<- max(sapply(res2[,5],length)) > >>> res3<- > data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){ > >>> c(x,rep(NA,Mx1-length(x))) > >>> > >>> })),stringsAsFactors=FALSE) > >>> colnames(res3)[grep("X",colnames(res3))]<- > paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))])) > >>> res3 > >>> } > >>> > >>> > >>>fun1(dat1,dat2) > >>># Row_ID_CR Data1 Data2 Data3 > DataComment1 > >>>#1 1 aa bb cc This is > comment 1 > >>> > >>>#2 2 dd ee ff This is > comment 1 > >>># DataComment2 DataComment3 > >>>#1 This is comment 2 This is comment 3 > >>>#2 <NA> <NA> > >>> > >>> fun1(dat1,dat3) > >>># Row_ID_CR Data1 Data2 Data3 > DataComment1 > >>>#1 1 aa bb cc This is > comment 1 > >>> > >>>#2 2 dd ee ff This is > comment 1 > >>> # DataComment2 DataComment3 > >>>#1 This is comment 2 This is comment 3 > >>> > >>>#2 This is comment 2 <NA> > >>> > >>> > >>>Otherwise, you need to provide an example that matches the real dataset. > >>>A.K. > >>> > >>>________________________________ > >>>From: Shreya Rawal <rawal.shreya@gmail.com> > >>>To: arun <smartpink111@yahoo.com> > >>>Cc: R help <r-help@r-project.org> > >>>Sent: Tuesday, June 11, 2013 12:22 PM > >>> > >>>Subject: Re: [R] Combining CSV data > >>> > >>> > >>> > >>>Hi Arun, > >>> > >>>Thanks for your reply. Unfortunately the Comments are just text in the > real data. There is no way to differentiate based on the value of the > Comments column. I guess because of that reason I couldn't get your > solution to work properly. Do you think I can try it for a more general > case where we don't merger/split the comments based on the values? > >>> > >>>Thanks for your help, I appreciate! > >>> > >>> > >>> > >>>On Mon, Jun 10, 2013 at 10:14 PM, arun <smartpink111@yahoo.com> wrote: > >>> > >>>HI, > >>>>I am not sure about your DataN1 column. If there is any identifier to > differentiate the comments (in this case 1,2,3), then it will easier to > place that in the correct column. > >>>> My previous solution is not helpful in situations like these: > >>>> > >>>>dat2<-read.table(text=" > >>>>Row_ID_N, Src_Row_ID, DataN1 > >>>>1a, 1, This is comment 1 > >>>>2a, 1, This is comment 2 > >>>>3a, 2, This is comment 2 > >>>>4a, 1, This is comment 3 > >>>>",sep=",",header=TRUE,stringsAsFactors=FALSE) > >>>>dat3<-read.table(text=" > >>>> > >>>>Row_ID_N, Src_Row_ID, DataN1 > >>>>1a, 1, This is comment 1 > >>>>2a, 1, This is comment 2 > >>>>3a, 2, This is comment 3 > >>>>4a, 1, This is comment 3 > >>>>5a, 2, This is comment 2 > >>>>",sep=",",header=TRUE,stringsAsFactors=FALSE) > >>>> > >>>> > >>>>library(stringr) > >>>>library(plyr) > >>>>fun1<- function(data1,data2){ > >>>> data2$DataN1<- str_trim(data2$DataN1) > >>>> res<- merge(data1,data2,by.x=1,by.y=2) > >>>> res1<- res[,-5] > >>>> res2<- > ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1)) > >>>> Mx1<- max(sapply(res2[,5],length)) > >>>> res3<- > data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){ > >>>> indx<- > as.numeric(gsub("[[:alpha:]]","",x)) > >>>> x[match(seq(Mx1),indx)] > >>>> })),stringsAsFactors=FALSE) > >>>> > >>>> colnames(res3)[grep("X",colnames(res3))]<- > paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))])) > >>>> res3 > >>>> } > >>>>fun1(dat1,dat2) > >>>> > >>>># Row_ID_CR Data1 Data2 Data3 > DataComment1 > >>>>#1 1 aa bb cc This is > comment 1 > >>>>#2 2 dd ee > ff <NA> > >>>> > >>>># DataComment2 DataComment3 > >>>>#1 This is comment 2 This is comment 3 > >>>>#2 This is comment 2 <NA> > >>>> fun1(dat1,dat3) > >>>> > >>>># Row_ID_CR Data1 Data2 Data3 > DataComment1 > >>>>#1 1 aa bb cc This is > comment 1 > >>>>#2 2 dd ee > ff <NA> > >>>> > >>>># DataComment2 DataComment3 > >>>>#1 This is comment 2 This is comment 3 > >>>>#2 This is comment 2 This is comment 3 > >>>> > >>>> > >>>> > >>>>A.K. > >>>> > >>>> > >>>>----- Original Message ----- > >>>> > >>>>From: arun <smartpink111@yahoo.com> > >>>>To: Shreya Rawal <rawal.shreya@gmail.com> > >>>>Cc: R help <r-help@r-project.org> > >>>>Sent: Monday, June 10, 2013 6:41 PM > >>>>Subject: Re: [R] Combining CSV data > >>>> > >>>>Hi, > >>>>Try this: > >>>> > >>>>dat1<-read.table(text=" > >>>>Row_ID_CR, Data1, Data2, Data3 > >>>>1, aa, bb, cc > >>>>2, dd, ee, ff > >>>>",sep=",",header=TRUE,stringsAsFactors=FALSE) > >>>> > >>>>dat2<-read.table(text=" > >>>>Row_ID_N, Src_Row_ID, DataN1 > >>>>1a, 1, This is comment 1 > >>>>2a, 1, This is comment 2 > >>>>3a, 2, This is comment 1 > >>>>4a, 1, This is comment 3 > >>>>",sep=",",header=TRUE,stringsAsFactors=FALSE) > >>>>library(stringr) > >>>>dat2$DataN1<-str_trim(dat2$DataN1) > >>>>res<- merge(dat1,dat2,by.x=1,by.y=2) > >>>> res1<-res[,-5] > >>>>library(plyr) > >>>> res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize, > DataN1=list(DataN1)) > >>>> res2 > >>>> # Row_ID_CR Data1 Data2 Data3 > >>>>#1 1 aa bb cc > >>>>#2 2 dd ee ff > >>>># DataN1 > >>>>#1 This is comment 1, This is comment 2, This is comment 3 > >>>>#2 This is comment 1 > >>>> > >>>> > >>>> > >>>>res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x) > {x[duplicated(x)]<-NA;x}))) > >>>> colnames(res3)[grep("X",colnames(res3))]<- > paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))])) > >>>>res3 > >>>># Row_ID_CR Data1 Data2 Data3 > DataComment1 > >>>>#1 1 aa bb cc This is > comment 1 > >>>>#2 2 dd ee ff This is > comment 1 > >>>># DataComment2 DataComment3 > >>>>#1 This is comment 2 This is comment 3 > >>>>#2 <NA> <NA> > >>>> > >>>>A.K. > >>>> > >>>> > >>>>----- Original Message ----- > >>>>From: Shreya Rawal <rawal.shreya@gmail.com> > >>>>To: r-help@r-project.org > >>>>Cc: > >>>>Sent: Monday, June 10, 2013 4:38 PM > >>>>Subject: [R] Combining CSV data > >>>> > >>>>Hello R community, > >>>> > >>>>I am trying to combine two CSV files that look like this: > >>>> > >>>>File A > >>>> > >>>>Row_ID_CR, Data1, Data2, Data3 > >>>>1, aa, bb, cc > >>>>2, dd, ee, ff > >>>> > >>>> > >>>>File B > >>>> > >>>>Row_ID_N, Src_Row_ID, DataN1 > >>>>1a, 1, This is comment 1 > >>>>2a, 1, This is comment 2 > >>>>3a, 2, This is comment 1 > >>>>4a, 1, This is comment 3 > >>>> > >>>>And the output I am looking for is, comparing the values of Row_ID_CR > and > >>>>Src_Row_ID > >>>> > >>>>Output > >>>> > >>>>ROW_ID_CR, Data1, Data2, Data3, DataComment1, > >>>>DataComment2, DataComment3 > >>>>1, aa, bb, cc, This is > >>>>comment1, This is comment2, This is comment 3 > >>>>2, dd, ee, ff, This is > >>>>comment1 > >>>> > >>>> > >>>>I am a novice R user, I am able to replicate a left join but I need a > bit > >>>>more in the final result. > >>>> > >>>> > >>>>Thanks!! > >>>> > >>>> [[alternative HTML version deleted]] > >>>> > >>>>______________________________________________ > >>>>R-help@r-project.org mailing list > >>>>https://stat.ethz.ch/mailman/listinfo/r-help > >>>>PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >>>>and provide commented, minimal, self-contained, reproducible code. > >>>> > >>>> > >>> > >> > > >[[alternative HTML version deleted]]