Hi, I have an integer which is extracted from a dataframe, which is sorted by another column of the dataframe. Now I would like to remove some elements of the integer, which are near to others by their value. For example: integer: c(1,20,2,21) should be c(1,20). I tried to write a function, but for some reason, somethings won't work x <- 1:20 near <- function(x,th) { nr <- NROW(x) for (i in 1:(nr-1)){ for (j in (i+1):nr){ if (j > nr) break t=0 if (abs(x[i] - x[j]) < th) t = 1 if (t== 1) x <- x[-j] if (t== 1) nr <- nr-1 if (t== 1) j <- (j-1) cat (" i",i," j",j,"\n") }} x } near(x,10) This gives you 1 3 7 13 17 while I was suspecting 1, 20 as the outcome. If you look at the intermediate results of the cat instruction, you see that, after he substracted a number, he skipped the next one. Sorting the integer is not an option, the order is important. I used an integer from 1:20 as an example, while x <- sample((1:20),20) is maybe a bit more representable for our data, but isn't reproducible for the output of the function. Maybe there is already an R-function, which does such thing, or what is wrong with my coding? thanks a lot for your time Bart [[alternative HTML version deleted]]
Bart Joosen <bartjoosen <at> hotmail.com> writes:> > Hi, > > I have an integer which is extracted from a dataframe, which is sorted byanother column of the dataframe.> Now I would like to remove some elements of the integer, which are near toothers by their value. For example:> integer: c(1,20,2,21) should be c(1,20).....> Sorting the integer is not an option, the order is important.Why not? It's extremely efficient for large series and the only method that would work with large array. The idea: Keep the indexes of the sort order, mark the "near others" for example making their index NA, and restore original order. No for-loop needed. Dieter
One of the reasons it might not be working is that you are changing the index of the 'for' within the loop. The following is from the help page for 'for': The index seq in a for loop is evaluated at the start of the loop; changing it subsequently does not affect the loop. The variable var has the same type as seq, and is read-only: assigning to it does not alter seq. If seq is a factor (which is not strictly allowed) then its internal codes are used: the effect is that of as.integer<mk:@MSITStore:C:\PROGRA~1\R\R-24~1.1\library\base\chtml\base.chm::/integer.html>not as.vector<mk:@MSITStore:C:\PROGRA~1\R\R-24~1.1\library\base\chtml\base.chm::/vector.html>. On 2/10/07, Bart Joosen <bartjoosen@hotmail.com> wrote:> > Hi, > > I have an integer which is extracted from a dataframe, which is sorted by > another column of the dataframe. > Now I would like to remove some elements of the integer, which are near to > others by their value. For example: integer: c(1,20,2,21) should be c(1,20). > > I tried to write a function, but for some reason, somethings won't work > > x <- 1:20 > near <- function(x,th) { > nr <- NROW(x) > for (i in 1:(nr-1)){ > for (j in (i+1):nr){ > if (j > nr) break > t=0 > if (abs(x[i] - x[j]) < th) t = 1 > if (t== 1) x <- x[-j] > if (t== 1) nr <- nr-1 > if (t== 1) j <- (j-1) > cat (" i",i," j",j,"\n") > }} > x > } > near(x,10) > > > This gives you 1 3 7 13 17 while I was suspecting 1, 20 as the outcome. > If you look at the intermediate results of the cat instruction, you see > that, after he substracted a number, he skipped the next one. > > Sorting the integer is not an option, the order is important. > I used an integer from 1:20 as an example, while x <- sample((1:20),20) is > maybe a bit more representable for our data, but isn't reproducible for the > output of the function. > > Maybe there is already an R-function, which does such thing, or what is > wrong with my coding? > > > thanks a lot for your time > > > Bart > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]
All, thanks for your help. Dieter, thanks, it's a different way of tackling the problem. But I still need a for loop to scroll throug the list? For example: c(1,2,3,5,) and a threshold of 3, then c(1,5) should remain. If I make an integer with the difference between each element and the previous element, then 5 should be eliminated, while it shouldn't. Or am I wrong with this assumption? Thanks anyway Bart> > Hi, > > I have an integer which is extracted from a dataframe, which is sorted byanother column of the dataframe.> Now I would like to remove some elements of the integer, which are near toothers by their value. For example:> integer: c(1,20,2,21) should be c(1,20)....> Sorting the integer is not an option, the order is important.Why not? It's extremely efficient for large series and the only method that would work with large array. The idea: Keep the indexes of the sort order, mark the "near others" for example making their index NA, and restore original order. No for-loop needed. Dieter [[alternative HTML version deleted]]
Dear Bart, "hclust" might be useful for this as well: dat = c(1,20,2,21) hc = hclust(dist(dat)) thresh = 2 ct = cutree(hc, h=thresh) clusteredNumbers = split(dat, ct) firstOne = dat[!duplicated(ct)] > clusteredNumbers $`1` [1] 1 2 $`2` [1] 20 21 > firstOne [1] 1 20 Best wishes Wolfgang> > I have an integer which is extracted from a dataframe, which is sorted by another column of the dataframe. > Now I would like to remove some elements of the integer, which are near to others by their value. For example: integer: c(1,20,2,21) should be c(1,20). > > I tried to write a function, but for some reason, somethings won't work > > x <- 1:20 > near <- function(x,th) { > nr <- NROW(x) > for (i in 1:(nr-1)){ > for (j in (i+1):nr){ > if (j > nr) break > t=0 > if (abs(x[i] - x[j]) < th) t = 1 > if (t== 1) x <- x[-j] > if (t== 1) nr <- nr-1 > if (t== 1) j <- (j-1) > cat (" i",i," j",j,"\n") > }} > x > } > near(x,10) > > > This gives you 1 3 7 13 17 while I was suspecting 1, 20 as the outcome. > If you look at the intermediate results of the cat instruction, you see that, after he substracted a number, he skipped the next one. > > Sorting the integer is not an option, the order is important. > I used an integer from 1:20 as an example, while x <- sample((1:20),20) is maybe a bit more representable for our data, but isn't reproducible for the output of the function. > > Maybe there is already an R-function, which does such thing, or what is wrong with my coding? > > > thanks a lot for your time > > > Bart > [[alternative HTML version deleted]] > > ______________________________________________ > R-help a stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber