Here are are assuming
1. for each row that if that row's value is within 200 - 300 of the
prior or next value with the same ind then that row should be extracted.
2. the input is sorted by values within ind
If that's not the intention then modify the code accordingly.
First we read in the data into data frame DF.
Then we define between(x, min, max) which is a function that returns a
vector whose
ith component is TRUE if x[i] is between min and max.
Then use ave() to get a selection vector. In this case ave returns a vector of
zeros and ones and we convert that to the logical vector sel which
defines the selection.
# read the data
Lines <- "values ind
1 2655 7A5
2 3028 7A5
3 689 ABBA-1
4 1336 ABBA-1
5 1560 ABBA-1
6 2820 ABLIM1
7 3339 ABLIM1
8 171 ACSM5
9 195 ACSM5
10 43 ADAMDEC1
11 129 ADAMDEC1
12 1105 AFF1
13 3202 AFF1
14 852 AFF3
15 2461 AFF3
16 45 AKT1
17 397 AKT1
18 1430 AQP2
19 2402 AQP2
20 2551 ARHGAP19"
DF <- read.table(textConnection(Lines), header = TRUE)
between <- function(x, min, max) x > min & max > x
sel <- ave(DF$values, DF$ind, FUN = function(v)
between(c(FALSE, diff(v)), 200, 300) | between(c(diff(v), FALSE), 200, 300)
) > 0
DF[sel, ]
On Sat, May 30, 2009 at 10:13 AM, Iain Gallagher
<iaingallagher at btopenworld.com> wrote:>
> Hello list
>
> I have a problem with a dataset (see toy example below) where I am trying
to find the difference between two (or more numbers) and discard those
observations which fall outside a set interval.
>
> An example and further explanation:
>
> ? values ? ? ?ind
> 1 ? ?2655 ? ? ?7A5
> 2 ? ?3028 ? ? ?7A5
> 3 ? ? 689 ? ABBA-1
> 4 ? ?1336 ? ABBA-1
> 5 ? ?1560 ? ABBA-1
> 6 ? ?2820 ? ABLIM1
> 7 ? ?3339 ? ABLIM1
> 8 ? ? 171 ? ?ACSM5
> 9 ? ? 195 ? ?ACSM5
> 10 ? ? 43 ADAMDEC1
> 11 ? ?129 ADAMDEC1
> 12 ? 1105 ? ? AFF1
> 13 ? 3202 ? ? AFF1
> 14 ? ?852 ? ? AFF3
> 15 ? 2461 ? ? AFF3
> 16 ? ? 45 ? ? AKT1
> 17 ? ?397 ? ? AKT1
> 18 ? 1430 ? ? AQP2
> 19 ? 2402 ? ? AQP2
> 20 ? 2551 ARHGAP19
>
> Each number in the values column above is associated with a label (in the
ind column). For some inds there will be only 2 values but as can be seen from
the data other inds have many values.
>
> Here's what I want to do using the ABBA-1 data from above as an
example:
>
> calculate the differences between each value:
>
> 1560-1336 = 224
> 1336-689 = 647
>
> then use these values to create an index that will allow me to pull out
values between set limits. If I set the limits to between 200 and 300 then the
index will reference rows 4 & 5 in the above data set.
>
> I hope this is reasonably clear and I appreciate any suggestions.
>
> Thanks
>
> Iain
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>