Bert Gunter
2017-Jul-22 00:41 UTC
[R] a difficult situation, how to do this using base function.
1. Please always reply to the list, especially here so that others can see your clarification. 2. What happens if your match.start value exceeds all the cumulative sums?? -- you seem to imply that this cannot happen. Your minimal example, while a little confusing (to me) and in html -- this can get mangled in this plain text list, though seemingly not here -- was very helpful. Essential even . Here is a solution that seems to work: WARNING: There are a zillion ways that one might do this. Mine may be far from the most efficient or the most elegant or the most clear. I hope it is understandable. The chief task here is to parse your second column so that it is numeric and your logic can be applied to it. Due to its simply structured format, I chose to do this by simply converting the dashes to commas and using strsplit() to split the single string into a character vector of numeric values that then can be converted to numerics. Like this: df <-data.frame(match.start=c(5,10,100,200),range.coordinates=c("1000-1050","1500-1555","5000-5050,6000-6180","100-150,200-260,600-900")) ## Note the following to convert the default factor to a character vector. This is essnetial! df[,2]<- as.character(df[,2]) numex <-gsub("-",",",df[,2],fixed=TRUE) ## convert dashes ## convert to a list of numeric vectors numex <-lapply(strsplit(numex,",",fixed = TRUE),as.numeric) ## Here's what you get:> numex[[1]] [1] 1000 1050 [[2]] [1] 1500 1555 [[3]] [1] 5000 5050 6000 6180 [[4]] [1] 100 150 200 260 600 900 Because of the fixed format, we know that the even numbered indices in each vector are for the upper values of the range, and the odd indices are the lower values. I just break these out in a convenient form -- a 2 column matrix, the first column giving the lower value and the second the cumulative ranges:> numex <- lapply(numex,function(x){+ i <- seq_along(x) + odds <- i %% 2 == 1 + evens <- i %% 2 == 0 + cbind(x[odds],cumsum(x[evens] - x[odds])) + }) ## Giving:> numex[[1]] [,1] [,2] [1,] 1000 50 [[2]] [,1] [,2] [1,] 1500 55 [[3]] [,1] [,2] [1,] 5000 50 [2,] 6000 230 [[4]] [,1] [,2] [1,] 100 50 [2,] 200 110 [3,] 600 410 Now I just apply your logic row by row (i.e. index by index) to get the desired column: df$updated <- sapply( seq_len(nrow(df)),function(i){ test <- numex[[i]] val <- df[i,1] if(nrow(test) ==1 ) test[1,1]+val else { wm <- which(val < test[,2])[1] test[wm,1]+val - test[wm-1,2] } })> df$updated[1] 1005 1510 6050 690 Cheers, Bert On Fri, Jul 21, 2017 at 4:24 PM, Honkit Wong <stephen66 at gmail.com> wrote:> Sorry for confusion, it was right, it should be: 600+(200-60-50)=690. > 60 and 50 are from difference of previous two ranges. Thanks! Any clue? > > Stephen (Hon-Kit) Wong > >> On Jul 21, 2017, at 4:13 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >> >> Shouldn't your last value in match.start.updated = 710, i.e. 600 + 60 + 50 ?? >> >> If not, you will need to explain yourself more clearly (for me, anyway). >> >> Cheers, >> Bert >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Fri, Jul 21, 2017 at 12:22 PM, Stephen HonKit Wong >> <stephen66 at gmail.com> wrote: >>> Hello, >>> >>> I have a following dataframe with many rows. >>> data.frame(match.start=c(5,10,100,200),range.coordinates=c("1000-1050","1500-1555","5000-5050,6000-6180","100-150,200-260,600-900")) >>> >>> match.start range.coordinates >>> 5 1000-1050 >>> 10 1500-1555 >>> 100 5000-5050,6000-6180 >>> 200 100-150,200-260,600-900 >>> >>> I want to test for each row element in column "match.start" (e.g. 100 on >>> 3rd row) if it is less than the accumulated range (e.g. for 5000-5050, >>> 6000-6180, the accumulated range is: 50, 230), then update the match start >>> as 6000+ (100-50) = 6050. The result is put on third column. >>> >>> match.start range.coordinates match.start.updated >>> 5 1000-1050 1005 >>> 10 1500-1555 1510 >>> 100 5000-5050,6000-6180 6050 >>> 200 100-150,200-260,600-900 690 >>> >>> Many thanks. >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code.
Seemingly Similar Threads
- a difficult situation, how to do this using base function.
- how to make the code more efficient using lapply
- how to make the code more efficient using lapply
- how to make row.names based on column1 with duplicated values
- Automatic purging of old email in all mailboxes