Thanks. I think nabble is good for programming questions. Bear with me if I'm incorrect. Data: Genomics SNP information Goal: I need to add Chromosome and SNP position to the data frame I'm using through apply. I'd like to add new column from text processed through apply function. For example: 10:60523:T:G (Column 2) CHR: 10 Position: 60523 Dataset: chr rs ps n_miss allele1 allele0 af beta se l_remle p_wald -9 10:60523:T:G -9 0 T G 0.977 -1.769354e-02 3.597196e-02 1.566731e-01 6.228309e-01 -9 10:60684:A:C -9 0 A C 0.973 1.698925e-02 2.942366e-02 1.561001e-01 5.636926e-01 -9 10:61331:A:G -9 0 A G 0.973 1.708586e-02 2.942424e-02 1.560944e-01 5.614851e-01 -9 10:62010:C:T -9 0 C T 0.980 -8.513143e-03 3.837054e-02 1.566875e-01 8.244260e-01 Code: -------------------------------------------------------- data<-read.table("small.txt",header = T) # read data data<-data[,c(2,11)] #delete other columns not needed #--split data on : and get chromosome and position split_rs<-function(rs){ chr<-vector(,length(rs)) # create new vector to store chr pos<-vector(,length(rs)) #create new vector to store position for(i in 1:length(rs)){ #iterate over RS column if(grepl(":",rs[i])){ #if : in column string temp <- strsplit(rs[i],":",fixed=T) #split chr[i] <-temp[[1]][1] #store CHR pos[i] <- temp[[1]][2] #store position } } return(list(chr=chr,pos=pos)) #return making a list } data$POS<-"NA" #add new column CHR and make NA data$CHR <- "NA" #add new column POS and make NA temp<-apply(data,2,split_rs) #send data frame to function #--I assign value from list sent -- I would like to improve this part data$CHR<-temp$rs$chr data$POS<-temp$rs$pos rm(temp) colnames(data)<-c("SNP","P","CHR","BP") -------------------------------------------------------- -----Original Message----- From: Jeff Newmiller [mailto:jdnewmil at dcn.davis.ca.us] Sent: Monday, March 5, 2018 1:48 PM To: r-help at r-project.org; Sariya, Sanjeev <ss5505 at cumc.columbia.edu>; R Help <r-help at r-project.org> Subject: Re: [R] Help with apply and new column? Read the Posting Guide... (see message footer) ... some relevant things you can find there: a) Yes, this appears to be about how to use an R base function so it is on topic b) Post a reproducible example (include some sample data, preferably using the dput function) c) Post using plain text so the mailing list doesn't convert it for you and mangle things in a way you did not intend. -- Sent from my phone. Please excuse my brevity. On March 5, 2018 10:07:24 AM PST, "Sariya, Sanjeev" <ss5505 at cumc.columbia.edu> wrote:>Hello members, > >Can I ask question for apply, adding new column to data frame on this >e-mail list? > >Thanks! > > > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Comments interspersed, and some code at the end. On Mon, 5 Mar 2018, Sariya, Sanjeev wrote:> Thanks. I think nabble is good for programming questions. Bear with me > if I'm incorrect.You may have found R-help archives at Nabble, but R-help has nothing to do with Nabble.> > Data: Genomics SNP informationI know almost nothing about using R for genomics.> Goal: I need to add Chromosome and SNP position to the data frame I'm using through apply. > > I'd like to add new column from text processed through apply function. > > For example: 10:60523:T:G (Column 2) > CHR: 10 > Position: 60523Assuming Position is "P", what are your "SNP" and "BP" in the names you assigned below as c("SNP","P","CHR","BP")?> Dataset: > chr rs ps n_miss allele1 allele0 af beta se l_remle p_wald > -9 10:60523:T:G -9 0 T G 0.977 -1.769354e-02 3.597196e-02 1.566731e-01 6.228309e-01 > -9 10:60684:A:C -9 0 A C 0.973 1.698925e-02 2.942366e-02 1.561001e-01 5.636926e-01 > -9 10:61331:A:G -9 0 A G 0.973 1.708586e-02 2.942424e-02 1.560944e-01 5.614851e-01 > -9 10:62010:C:T -9 0 C T 0.980 -8.513143e-03 3.837054e-02 1.566875e-01 8.244260e-01 > > Code: > > -------------------------------------------------------- > data<-read.table("small.txt",header = T) # read data > data<-data[,c(2,11)] #delete other columns not needed > > #--split data on : and get chromosome and position > > split_rs<-function(rs){ > > chr<-vector(,length(rs)) # create new vector to store chr > pos<-vector(,length(rs)) #create new vector to store position > > for(i in 1:length(rs)){ #iterate over RS column > > if(grepl(":",rs[i])){ #if : in column string > temp <- strsplit(rs[i],":",fixed=T) #split > chr[i] <-temp[[1]][1] #store CHR > pos[i] <- temp[[1]][2] #store position > } > } > return(list(chr=chr,pos=pos)) #return making a list > } > > data$POS<-"NA" #add new column CHR and make NA > data$CHR <- "NA" #add new column POS and make NA > > temp<-apply(data,2,split_rs) #send data frame to function > > #--I assign value from list sent -- I would like to improve this part > > data$CHR<-temp$rs$chr > data$POS<-temp$rs$pos > > rm(temp) > > colnames(data)<-c("SNP","P","CHR","BP") > --------------------------------------------------------###################################################### # Your code was pretty severely broken... it would not run, # and I don't know what you expected to see as output. # 1) data is the name of a function in base R... re-using # it can lead to puzzling errors # 2) With all this character manipulation, you need to read # your character data in as character, not as factors # 3) Don't use the T variable... use the constant TRUE, # since T can easily be overwritten to some non-TRUE value. dta <- read.table( "small.txt", header = TRUE, as.is = TRUE ) # read data dta <- dta[ , c( 2, 11 ) ] #delete other columns not needed # 4) Not at all clear why you want to split all of the columns # using apply( ..., 2, ... ) when only one column has ":" characters #temp <- apply( dta, 2, split_rs ) #send data frame to function temp <- strsplit( dta$rs, ":" ) # gets the whole column splits at once # wildly guessing here rs_chrmatrix <- do.call( rbind, temp ) rs_DF <- as.data.frame( rs_chrmatrix, stringsAsFactors = FALSE ) names( rs_DF ) <- c( "CHR", "P", "X1", "X2" ) rs_DF$P <- as.integer( rs_DF$P ) str( rs_DF ) ##################################################> > -----Original Message----- > From: Jeff Newmiller [mailto:jdnewmil at dcn.davis.ca.us] > Sent: Monday, March 5, 2018 1:48 PM > To: r-help at r-project.org; Sariya, Sanjeev <ss5505 at cumc.columbia.edu>; R Help <r-help at r-project.org> > Subject: Re: [R] Help with apply and new column? > > Read the Posting Guide... (see message footer) ... some relevant things you can find there: > > a) Yes, this appears to be about how to use an R base function so it is on topic > b) Post a reproducible example (include some sample data, preferably using the dput function) > c) Post using plain text so the mailing list doesn't convert it for you and mangle things in a way you did not intend. > -- > Sent from my phone. Please excuse my brevity. > > On March 5, 2018 10:07:24 AM PST, "Sariya, Sanjeev" <ss5505 at cumc.columbia.edu> wrote: >> Hello members, >> >> Can I ask question for apply, adding new column to data frame on this >> e-mail list? >> >> Thanks! >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
Thank you, that helps. -----Original Message----- From: Jeff Newmiller [mailto:jdnewmil at dcn.davis.ca.us] Sent: Monday, March 5, 2018 3:36 PM To: Sariya, Sanjeev <ss5505 at cumc.columbia.edu> Cc: r-help at r-project.org; R Help <r-help at r-project.org> Subject: RE: [R] Help with apply and new column? Comments interspersed, and some code at the end. On Mon, 5 Mar 2018, Sariya, Sanjeev wrote:> Thanks. I think nabble is good for programming questions. Bear with me > if I'm incorrect.You may have found R-help archives at Nabble, but R-help has nothing to do with Nabble.> > Data: Genomics SNP informationI know almost nothing about using R for genomics.> Goal: I need to add Chromosome and SNP position to the data frame I'm using through apply. > > I'd like to add new column from text processed through apply function. > > For example: 10:60523:T:G (Column 2) > CHR: 10 > Position: 60523Assuming Position is "P", what are your "SNP" and "BP" in the names you assigned below as c("SNP","P","CHR","BP")?> Dataset: > chr rs ps n_miss allele1 allele0 af beta se l_remle p_wald > -9 10:60523:T:G -9 0 T G 0.977 -1.769354e-02 3.597196e-02 1.566731e-01 6.228309e-01 > -9 10:60684:A:C -9 0 A C 0.973 1.698925e-02 2.942366e-02 1.561001e-01 5.636926e-01 > -9 10:61331:A:G -9 0 A G 0.973 1.708586e-02 2.942424e-02 1.560944e-01 5.614851e-01 > -9 10:62010:C:T -9 0 C T 0.980 -8.513143e-03 3.837054e-02 1.566875e-01 8.244260e-01 > > Code: > > -------------------------------------------------------- > data<-read.table("small.txt",header = T) # read data > data<-data[,c(2,11)] #delete other columns not needed > > #--split data on : and get chromosome and position > > split_rs<-function(rs){ > > chr<-vector(,length(rs)) # create new vector to store chr > pos<-vector(,length(rs)) #create new vector to store position > > for(i in 1:length(rs)){ #iterate over RS column > > if(grepl(":",rs[i])){ #if : in column string > temp <- strsplit(rs[i],":",fixed=T) #split > chr[i] <-temp[[1]][1] #store CHR > pos[i] <- temp[[1]][2] #store position > } > } > return(list(chr=chr,pos=pos)) #return making a list } > > data$POS<-"NA" #add new column CHR and make NA data$CHR <- "NA" #add > new column POS and make NA > > temp<-apply(data,2,split_rs) #send data frame to function > > #--I assign value from list sent -- I would like to improve this part > > data$CHR<-temp$rs$chr > data$POS<-temp$rs$pos > > rm(temp) > > colnames(data)<-c("SNP","P","CHR","BP") > --------------------------------------------------------###################################################### # Your code was pretty severely broken... it would not run, # and I don't know what you expected to see as output. # 1) data is the name of a function in base R... re-using # it can lead to puzzling errors # 2) With all this character manipulation, you need to read # your character data in as character, not as factors # 3) Don't use the T variable... use the constant TRUE, # since T can easily be overwritten to some non-TRUE value. dta <- read.table( "small.txt", header = TRUE, as.is = TRUE ) # read data dta <- dta[ , c( 2, 11 ) ] #delete other columns not needed # 4) Not at all clear why you want to split all of the columns # using apply( ..., 2, ... ) when only one column has ":" characters #temp <- apply( dta, 2, split_rs ) #send data frame to function temp <- strsplit( dta$rs, ":" ) # gets the whole column splits at once # wildly guessing here rs_chrmatrix <- do.call( rbind, temp ) rs_DF <- as.data.frame( rs_chrmatrix, stringsAsFactors = FALSE ) names( rs_DF ) <- c( "CHR", "P", "X1", "X2" ) rs_DF$P <- as.integer( rs_DF$P ) str( rs_DF ) ##################################################> > -----Original Message----- > From: Jeff Newmiller [mailto:jdnewmil at dcn.davis.ca.us] > Sent: Monday, March 5, 2018 1:48 PM > To: r-help at r-project.org; Sariya, Sanjeev <ss5505 at cumc.columbia.edu>; > R Help <r-help at r-project.org> > Subject: Re: [R] Help with apply and new column? > > Read the Posting Guide... (see message footer) ... some relevant things you can find there: > > a) Yes, this appears to be about how to use an R base function so it > is on topic > b) Post a reproducible example (include some sample data, preferably > using the dput function) > c) Post using plain text so the mailing list doesn't convert it for you and mangle things in a way you did not intend. > -- > Sent from my phone. Please excuse my brevity. > > On March 5, 2018 10:07:24 AM PST, "Sariya, Sanjeev" <ss5505 at cumc.columbia.edu> wrote: >> Hello members, >> >> Can I ask question for apply, adding new column to data frame on this >> e-mail list? >> >> Thanks! >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k