thr3ads.net - R help - [R] (no subject) [Feb 2005]

If this information is useful, please help other people find it:
Share via:

Soukup, Matt

2005-Feb-04 19:07 UTC

[R] (no subject)

Hi.

I have a problem that I can't seem to find an optimal way of solving other
than by doing things manually. I'm trying to subset a data frame by the
number of observations that occurred at a given row but want to take into
account the number of observations of preceding rows. Here's an example.

I'm looking at intervals of data [10,20), [10, 30), ....., [10,120) which
contain a certain number of observations for treatment A and treatment B. An
example is given by the following code.
>int <- as.factor(paste("[", rep(10, 11), ",",
seq(20,120, by=10), ")"))
>nsamA <- c(62, 83, 118, 151, 180, 201, 212, 215, 216, 217, 218)
>nsamB <- c(65, 90, 128, 163, 190, 199, 209, 214, 215, 216, 218)
>df0 <- data.frame(int, nsamA, nsamB)
>df0
Since the interval [10, s) with n_s samples is nested in [10, t)with n_t
sample for s < t, we know n_s - n_t samples exist in the interval [s, t). If
this sample size of the difference is small I want to exclude the interval
[10,s). This can be done comparing adjacent preceding rows using the
following.
> df0$itagA <- ifelse(c(10, diff(nsamA)) <= 4, 1, 0)
>df0$itagB <- ifelse(c(10, diff(nsamB)) <= 4, 1, 0)
>df0
># Subset df0 on the tag results
> df1 <- df0[df0$itagA != 1 & df0$itagB != 1,]
> df1
This works fine, but here is my problem. This simply looks at only the
immediate preceding row and not at rows further "down the line". What
I
would like to do is include the next interval that includes 5 or more
samples from each group since earlier intervals are nested in the latter
intervals. In the example given this would include the final interval [10,
120) as this contains more than 4 samples for each treatment. I can do this
by hand using something like
> df0[c(1:7,11),]
But this is not an attractive solution as it requires me to actually look at
the data set each time and determine the row numbers. This works for this
case, but I have many intervals (rows of data) to look at and this would be
cumbersome. I've considered using diff with different lag arguments, but
this still doesn't seem to work. I also want to note that I need to keep the
int factor (as used in the example above) as this is used throughout my
analysis (i.e. this is a true factor variable and not simply denoting an
interval). I'd be grateful for any possible suggestions as I'm stumped
at
this moment. 

Thanks,

Mat

R v. 2.0.1 on Windows XP

Disclaimer: The views and opinions expressed in this email are of the author
and not of the Food and Drug Administration.
***********************************************************************
Mat Soukup, Ph.D.
Mathematical Statistician, Biometrics III
Center for Drug Evaluation and Research
9201 Corporate Blvd. Rm. N250
Phone: 301.827.2081
***********************************************************************


	[[alternative HTML version deleted]]

Gabor Grothendieck

2005-Feb-05 04:36 UTC

head link

[R] interval partition problem [was: (no subject)]

Soukup, Matt <SoukupM <at> cder.fda.gov> writes:

: 
: Hi.
: 
: I have a problem that I can't seem to find an optimal way of solving other
: than by doing things manually. I'm trying to subset a data frame by the
: number of observations that occurred at a given row but want to take into
: account the number of observations of preceding rows. Here's an example.
: 
: I'm looking at intervals of data [10,20), [10, 30), ....., [10,120) which
: contain a certain number of observations for treatment A and treatment B. An
: example is given by the following code.
: 
: >int <- as.factor(paste("[", rep(10, 11), ",",
seq(20,120, by=10), ")"))
: >nsamA <- c(62, 83, 118, 151, 180, 201, 212, 215, 216, 217, 218)
: >nsamB <- c(65, 90, 128, 163, 190, 199, 209, 214, 215, 216, 218)
: 
: >df0 <- data.frame(int, nsamA, nsamB)
: >df0
: 
: Since the interval [10, s) with n_s samples is nested in [10, t)with n_t
: sample for s < t, we know n_s - n_t samples exist in the interval [s, t).
If
: this sample size of the difference is small I want to exclude the interval
: [10,s). This can be done comparing adjacent preceding rows using the
: following.
: 
: > df0$itagA <- ifelse(c(10, diff(nsamA)) <= 4, 1, 0)
: >df0$itagB <- ifelse(c(10, diff(nsamB)) <= 4, 1, 0)
: >df0
: ># Subset df0 on the tag results
: > df1 <- df0[df0$itagA != 1 & df0$itagB != 1,]
: > df1
: 
: This works fine, but here is my problem. This simply looks at only the
: immediate preceding row and not at rows further "down the line".
What I
: would like to do is include the next interval that includes 5 or more
: samples from each group since earlier intervals are nested in the latter
: intervals. In the example given this would include the final interval [10,
: 120) as this contains more than 4 samples for each treatment. I can do this
: by hand using something like
: 
: > df0[c(1:7,11),]
: 
: But this is not an attractive solution as it requires me to actually look at
: the data set each time and determine the row numbers. This works for this
: case, but I have many intervals (rows of data) to look at and this would be
: cumbersome. I've considered using diff with different lag arguments, but
: this still doesn't seem to work. I also want to note that I need to keep
the
: int factor (as used in the example above) as this is used throughout my
: analysis (i.e. this is a true factor variable and not simply denoting an
: interval). I'd be grateful for any possible suggestions as I'm stumped
at
: this moment. 
: 


Delete the rows one by one and then recalculate diff
after each deletion (rather than diff'ing all at once 
and then deleting all at once).  Also, assuming you want 
every interval to be covered, force the last interval to 
end at the last row.

Assume too.few(df0, i) is a function, not shown here, which 
returns TRUE if there are too few As or Bs in row i minus row 
i-1 of df0 and otherwise FALSE. Then:

last.row <- df0[nrow(df0),]
i <- 1
while(i < nrow(df0)) if (too.few(df0, i)) df0 <- df0[-i,] else i <- i +
1
df0[nrow(df0),] <- last.row


P.S.

Please start a new thread rather than replying to an existing thread
and please use a meaningful subject.

Maybe Matching Threads

Search for more apparently analagous threads

R help - Feb 2005 - (no subject)

[R] (no subject)

[R] interval partition problem [was: (no subject)]

Maybe Matching Threads