Dear UseRs, I have a data frame that looks like this: head(test2) attributes start end StemExplant Callus RegenPlant 1 LTR_Unknown 120 535 3.198 1.931 1.927 3 LTR_Unknown 2955 3218 0.541 0.103 0.613 6 LTR_Unknown 6210 6423 6.080 4.650 9.081 9 LTR_Unknown 9658 10124 0.238 0.117 0.347 14 LTR_Unknown 14699 14894 3.545 3.625 2.116 25 LTR_Unknown 33201 33474 1.275 1.194 0.591 I need to subtract each value in the "end" column from its corresponding value in the "start" column, then append that difference as a new column in this data frame. It seems like apply could be the way to approach this, but I can't see any easy way to designate "difference" as a function, like, say, sum or mean. Plus, all the apply/lapply examples I'm looking at seem to depend on a data frame being just the two columns on which to operate, without any way to designate which columns to use in the function(x,y) part of an lapply statement. Another alternative would be a for loop, but when I try this: for(i in 1:nrow(test2)) { testout[i] <- (test2$end[i] - test2$start[i]) } I get an error. So I'm stuck at the first step here. I think that once I can figure out how to get the differences, I can use cbind to append the data frame. But if there is a better way to do it, I'd like to know that as well. Any help is appreciated. --Kelly V.
Vining, Kelly wrote:> > Dear UseRs, > > I have a data frame that looks like this: > > head(test2) > attributes start end StemExplant Callus RegenPlant > 1 LTR_Unknown 120 535 3.198 1.931 1.927 > 3 LTR_Unknown 2955 3218 0.541 0.103 0.613 > 6 LTR_Unknown 6210 6423 6.080 4.650 9.081 > 9 LTR_Unknown 9658 10124 0.238 0.117 0.347 > 14 LTR_Unknown 14699 14894 3.545 3.625 2.116 > 25 LTR_Unknown 33201 33474 1.275 1.194 0.591 > > > I need to subtract each value in the "end" column from its corresponding > value in the "start" column, then append that difference as a new column > in this data frame. > > It seems like apply could be the way to approach this, but I can't see any > easy way to designate "difference" as a function, like, say, sum or mean. > Plus, all the apply/lapply examples I'm looking at seem to depend on a > data frame being just the two columns on which to operate, without any way > to designate which columns to use in the function(x,y) part of an lapply > statement. Another alternative would be a for loop, but when I try this: > > for(i in 1:nrow(test2)) { > testout[i] <- (test2$end[i] - test2$start[i]) > } > > I get an error. So I'm stuck at the first step here. I think that once I > can figure out how to get the differences, I can use cbind to append the > data frame. But if there is a better way to do it, I'd like to know that > as well. > > Any help is appreciated. > > --Kelly V. > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >IS this what you are looking for? # Data lines <- "attributes start end StemExplant Callus RegenPlant LTR_Unknown 120 535 3.198 1.931 1.927 LTR_Unknown 2955 3218 0.541 0.103 0.613 LTR_Unknown 6210 6423 6.080 4.650 9.081 LTR_Unknown 9658 10124 0.238 0.117 0.347 LTR_Unknown 14699 14894 3.545 3.625 2.116 LTR_Unknown 33201 33474 1.275 1.194 0.591" d = read.table(textConnection(lines), header=TRUE) # Create new variable d$new=d$end - d$start print(d) attributes start end StemExplant Callus RegenPlant new 1 LTR_Unknown 120 535 3.198 1.931 1.927 415 2 LTR_Unknown 2955 3218 0.541 0.103 0.613 263 3 LTR_Unknown 6210 6423 6.080 4.650 9.081 213 4 LTR_Unknown 9658 10124 0.238 0.117 0.347 466 5 LTR_Unknown 14699 14894 3.545 3.625 2.116 195 6 LTR_Unknown 33201 33474 1.275 1.194 0.591 273 HTH Pete -- View this message in context: http://r.789695.n4.nabble.com/column-subtraction-by-row-tp3938399p3938461.html Sent from the R help mailing list archive at Nabble.com.
On 26/10/11 10:46, Vining, Kelly wrote:> Dear UseRs, > > I have a data frame that looks like this: > > head(test2) > attributes start end StemExplant Callus RegenPlant > 1 LTR_Unknown 120 535 3.198 1.931 1.927 > 3 LTR_Unknown 2955 3218 0.541 0.103 0.613 > 6 LTR_Unknown 6210 6423 6.080 4.650 9.081 > 9 LTR_Unknown 9658 10124 0.238 0.117 0.347 > 14 LTR_Unknown 14699 14894 3.545 3.625 2.116 > 25 LTR_Unknown 33201 33474 1.275 1.194 0.591 > > > I need to subtract each value in the "end" column from its corresponding value in the "start" column, then append that difference as a new column in this data frame. > > It seems like apply could be the way to approach this, but I can't see any easy way to designate "difference" as a function, like, say, sum or mean. Plus, all the apply/lapply examples I'm looking at seem to depend on a data frame being just the two columns on which to operate, without any way to designate which columns to use in the function(x,y) part of an lapply statement. Another alternative would be a for loop, but when I try this: > > for(i in 1:nrow(test2)) { > testout[i]<- (test2$end[i] - test2$start[i]) > } > > I get an error. So I'm stuck at the first step here. I think that once I can figure out how to get the differences, I can use cbind to append the data frame. But if there is a better way to do it, I'd like to know that as well. > > Any help is appreciated.Learn to think the R-ish way. test2$diff <- test2$end - test2$start Simple as that. Your unnecessary and inefficient for-loop approach probably would have *worked* had you initialised "testout" before the for loop. Like: testout <- numeric(nrow(2)) It's hard to be sure since you didn't say *what* error was thrown. But anyhow, *don't* do it that way. cheers, Rolf Turner