thr3ads.net - R help - [R] Help with a problem [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Michael Hess

2010-Jul-17 18:10 UTC

[R] Help with a problem

Hello R users,

I am a researcher at the University of Michigan looking for a solution to an R
problem.  I have loaded my data in from a mysql database and it looks like this
> data           ds c1 c2
1  2010-04-03        100           0
2  2010-04-30      11141          15
3  2010-05-01      3          16
4  2010-05-02       7615          14
5  2010-05-03       6910          17
6  2010-05-04       5035          3
7  2010-05-05       3007          15
8  2010-05-06       4          14
9  2010-05-07       8335          17
10 2010-05-08       2897          13
11 2010-05-09       6377          17
12 2010-05-10       3177          17
13 2010-05-11       7946          15
14 2010-05-12       8705          0
15 2010-05-13       9030          16
16 2010-05-14       8682          16
17 2010-05-15       8440          15


What I am trying to do is sort by ds, and take rows 1,7, see if c1 is at least
100 AND c2 is at least 8. If it is not, start with check rows 2,8 and if not
there 3,9....until it loops over the entire file.   If it finds a set that
matches, set a new variable equal to 1, if never finds a match, set it equal to
0.

I have done this in stata but on this project we are trying to use R.  Is this
something that can be done in R, if so, could someone point me in the correct
direction.

Thanks,

Michael Hess
University of Michigan
Health System

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used
for urgent or sensitive issues

RICHARD M. HEIBERGER

2010-Jul-17 20:13 UTC

head link

[R] Help with a problem

Michael,

This will get you started.  What you are doing with the seven rows isn't
clear
from your description.  I made the dates into "Date" objects.  I
called your
data
"mydata" as "data" is potentially ambiguous.

Rich



mydata <- read.table(header=TRUE, textConnection("
          ds c1 c2
1  2010-04-03        100           0
2  2010-04-30      11141          15
3  2010-05-01      3          16
4  2010-05-02       7615          14
5  2010-05-03       6910          17
6  2010-05-04       5035          3
7  2010-05-05       3007          15
8  2010-05-06       4          14
9  2010-05-07       8335          17
10 2010-05-08       2897          13
11 2010-05-09       6377          17
12 2010-05-10       3177          17
13 2010-05-11       7946          15
14 2010-05-12       8705          0
15 2010-05-13       9030          16
16 2010-05-14       8682          16
17 2010-05-15       8440          15
"))

mydata$ds <- as.Date(mydata$ds)
result <- 0
for (i in seq(length=nrow(mydata)-6)) {
  ## do something with
  mydata[i:(i+6), 2:3]
  ## and
  c(100, 8)
  if (TRUE) {
    result <- i  ## I am returning the start row, not the generic 1.
    break
  }
}
result

	[[alternative HTML version deleted]]

Stephan Kolassa

2010-Jul-17 20:50 UTC

head link

[R] Help with a problem

Mike,

I am slightly unclear on what you want to do. Do you want to check rows 
1 and 7 or 1 *to* 7? Should c1 be at least 100 for *any one* or *all* 
rows you are looking at, and same for c2?

You can sort your data like this:
data <- data[order(data$ds),]

Type ?order for help. But also do this for added enlightenment...:

library(fortunes)
fortune("dog")

Next, your analysis on the sorted data frame. As I said, I am not 
entirely clear on what you are looking at, but the following may solve 
your problem with choices "1 to 7" and "any one" above.

foo <- 0
for ( ii in 1:(nrow(data)-8) ) {
   if (any(data$c1[ii+seq(0,6)]>=100) & any(data$c2[ii+seq(0,6)]>=8))
{
     foo <- 1
     break
   }
}

The variable "foo" should contain what you want it to. Look at ?any 
(and, if this does not do what you want it to, at ?all) for further info.

No doubt this could be vectorized, but I think the loop is clear enough.

Good luck!
Stephan



Michael Hess schrieb:> Hello R users,
> 
> I am a researcher at the University of Michigan looking for a solution to
an R problem.  I have loaded my data in from a mysql database and it looks like
this
> 
>> data
>            ds c1 c2
> 1  2010-04-03        100           0
> 2  2010-04-30      11141          15
> 3  2010-05-01      3          16
> 4  2010-05-02       7615          14
> 5  2010-05-03       6910          17
> 6  2010-05-04       5035          3
> 7  2010-05-05       3007          15
> 8  2010-05-06       4          14
> 9  2010-05-07       8335          17
> 10 2010-05-08       2897          13
> 11 2010-05-09       6377          17
> 12 2010-05-10       3177          17
> 13 2010-05-11       7946          15
> 14 2010-05-12       8705          0
> 15 2010-05-13       9030          16
> 16 2010-05-14       8682          16
> 17 2010-05-15       8440          15
> 
> 
> What I am trying to do is sort by ds, and take rows 1,7, see if c1 is at
least 100 AND c2 is at least 8. If it is not, start with check rows 2,8 and if
not there 3,9....until it loops over the entire file.   If it finds a set that
matches, set a new variable equal to 1, if never finds a match, set it equal to
0.
> 
> I have done this in stata but on this project we are trying to use R.  Is
this something that can be done in R, if so, could someone point me in the
correct direction.
> 
> Thanks,
> 
> Michael Hess
> University of Michigan
> Health System
> 
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not be
used for urgent or sensitive issues
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Michael Hess

2010-Jul-17 21:38 UTC

head link

[R] Help with a problem

Sorry for not being clear.

In the dataset there are around 100 or so days of data (in the case also rows of
data)

I need to make sure that the person meets that c1 is at least 100 AND c2 is at
least 8 for 5 of 7 continuous days.

I will play with what I have and see if I can find out how to do this.

Thanks for the help!

Michael  
>>> Stephan Kolassa  07/17/10 4:50 PM >>>Mike,

I am slightly unclear on what you want to do. Do you want to check rows 
1 and 7 or 1 *to* 7? Should c1 be at least 100 for *any one* or *all* 
rows you are looking at, and same for c2?

You can sort your data like this:
data <- data[order(data$ds),]

Type ?order for help. But also do this for added enlightenment...:

library(fortunes)
fortune("dog")

Next, your analysis on the sorted data frame. As I said, I am not 
entirely clear on what you are looking at, but the following may solve 
your problem with choices "1 to 7" and "any one" above.

foo <- 0
for ( ii in 1:(nrow(data)-8) ) {
   if (any(data$c1[ii+seq(0,6)]>=100) & any(data$c2[ii+seq(0,6)]>=8))
{
     foo <- 1
     break
   }
}

The variable "foo" should contain what you want it to. Look at ?any 
(and, if this does not do what you want it to, at ?all) for further info.

No doubt this could be vectorized, but I think the loop is clear enough.

Good luck!
Stephan



Michael Hess schrieb:> Hello R users,
> 
> I am a researcher at the University of Michigan looking for a solution to
an R problem.  I have loaded my data in from a mysql database and it looks like
this
> 
>> data
>            ds c1 c2
> 1  2010-04-03        100           0
> 2  2010-04-30      11141          15
> 3  2010-05-01      3          16
> 4  2010-05-02       7615          14
> 5  2010-05-03       6910          17
> 6  2010-05-04       5035          3
> 7  2010-05-05       3007          15
> 8  2010-05-06       4          14
> 9  2010-05-07       8335          17
> 10 2010-05-08       2897          13
> 11 2010-05-09       6377          17
> 12 2010-05-10       3177          17
> 13 2010-05-11       7946          15
> 14 2010-05-12       8705          0
> 15 2010-05-13       9030          16
> 16 2010-05-14       8682          16
> 17 2010-05-15       8440          15
> 
> 
> What I am trying to do is sort by ds, and take rows 1,7, see if c1 is at
least 100 AND c2 is at least 8. If it is not, start with check rows 2,8 and if
not there 3,9....until it loops over the entire file.   If it finds a set that
matches, set a new variable equal to 1, if never finds a match, set it equal to
0.
> 
> I have done this in stata but on this project we are trying to use R.  Is
this something that can be done in R, if so, could someone point me in the
correct direction.
> 
> Thanks,
> 
> Michael Hess
> University of Michigan
> Health System
> 
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not be
used for urgent or sensitive issues
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used
for urgent or sensitive issues

Joshua Wiley

2010-Jul-17 23:15 UTC

head link

[R] Help with a problem

Hi Michael,

The days in your example do not look continuous (at least from my
thinking), so you may have extra requirements in mind, but take a look
at this code.  My general thought was first to turn each column into a
logical vector (c1 >= 100 and c2 >= 8).  Taking advantage of the fact
that R treats TRUE as 1 and FALSE as 0, compute a rolling mean.  If
(and only if) 5 consecutive values are TRUE, the mean will be 1.  Next
I added the rolling means for each column, and then tested whether any
were 2 (i.e., 1 + 1).

Cheers,

Josh

###################
#Load required package
library(zoo)

#Your data with ds converted to Date
#from dput()
dat <-
structure(list(ds = structure(c(14702, 14729, 14730, 14731, 14732,
14733, 14734, 14735, 14736, 14737, 14738, 14739, 14740, 14741,
14742, 14743, 14744), class = "Date"), c1 = c(100L, 11141L, 3L,
7615L, 6910L, 5035L, 3007L, 4L, 8335L, 2897L, 6377L, 3177L, 7946L,
8705L, 9030L, 8682L, 8440L), c2 = c(0L, 15L, 16L, 14L, 17L, 3L,
15L, 14L, 17L, 13L, 17L, 17L, 15L, 0L, 16L, 16L, 1L)), .Names =
c("ds",
"c1", "c2"), row.names = c(NA, -17L), class =
"data.frame")

#Order by ds
dat <- dat[order(dat$ds), ]

yourvar <- 0

#Test that 5 consecutive values from c1 AND c2 meet requirements
if(any(
 c(rollmean(dat$c1 >= 100, 5) + rollmean(dat$c2 >= 8, 5)) == 2)
   ) {yourvar <- 1}

###################

On Sat, Jul 17, 2010 at 2:38 PM, Michael Hess <mlhess at med.umich.edu>
wrote:> Sorry for not being clear.
>
> In the dataset there are around 100 or so days of data (in the case also
rows of data)
>
> I need to make sure that the person meets that c1 is at least 100 AND c2 is
at least 8 for 5 of 7 continuous days.
>
> I will play with what I have and see if I can find out how to do this.
>
> Thanks for the help!
>
> Michael
>
>>>> Stephan Kolassa ?07/17/10 4:50 PM >>>
> Mike,
>
> I am slightly unclear on what you want to do. Do you want to check rows
> 1 and 7 or 1 *to* 7? Should c1 be at least 100 for *any one* or *all*
> rows you are looking at, and same for c2?
>
> You can sort your data like this:
> data <- data[order(data$ds),]
>
> Type ?order for help. But also do this for added enlightenment...:
>
> library(fortunes)
> fortune("dog")
>
> Next, your analysis on the sorted data frame. As I said, I am not
> entirely clear on what you are looking at, but the following may solve
> your problem with choices "1 to 7" and "any one" above.
>
> foo <- 0
> for ( ii in 1:(nrow(data)-8) ) {
> ? if (any(data$c1[ii+seq(0,6)]>=100) &
any(data$c2[ii+seq(0,6)]>=8)) {
> ? ? foo <- 1
> ? ? break
> ? }
> }
>
> The variable "foo" should contain what you want it to. Look at
?any
> (and, if this does not do what you want it to, at ?all) for further info.
>
> No doubt this could be vectorized, but I think the loop is clear enough.
>
> Good luck!
> Stephan
>
>
>
> Michael Hess schrieb:
>> Hello R users,
>>
>> I am a researcher at the University of Michigan looking for a solution
to an R problem. ?I have loaded my data in from a mysql database and it looks
like this
>>
>>> data
>> ? ? ? ? ? ?ds c1 c2
>> 1 ?2010-04-03 ? ? ? ?100 ? ? ? ? ? 0
>> 2 ?2010-04-30 ? ? ?11141 ? ? ? ? ?15
>> 3 ?2010-05-01 ? ? ?3 ? ? ? ? ?16
>> 4 ?2010-05-02 ? ? ? 7615 ? ? ? ? ?14
>> 5 ?2010-05-03 ? ? ? 6910 ? ? ? ? ?17
>> 6 ?2010-05-04 ? ? ? 5035 ? ? ? ? ?3
>> 7 ?2010-05-05 ? ? ? 3007 ? ? ? ? ?15
>> 8 ?2010-05-06 ? ? ? 4 ? ? ? ? ?14
>> 9 ?2010-05-07 ? ? ? 8335 ? ? ? ? ?17
>> 10 2010-05-08 ? ? ? 2897 ? ? ? ? ?13
>> 11 2010-05-09 ? ? ? 6377 ? ? ? ? ?17
>> 12 2010-05-10 ? ? ? 3177 ? ? ? ? ?17
>> 13 2010-05-11 ? ? ? 7946 ? ? ? ? ?15
>> 14 2010-05-12 ? ? ? 8705 ? ? ? ? ?0
>> 15 2010-05-13 ? ? ? 9030 ? ? ? ? ?16
>> 16 2010-05-14 ? ? ? 8682 ? ? ? ? ?16
>> 17 2010-05-15 ? ? ? 8440 ? ? ? ? ?15
>>
>>
>> What I am trying to do is sort by ds, and take rows 1,7, see if c1 is
at least 100 AND c2 is at least 8. If it is not, start with check rows 2,8 and
if not there 3,9....until it loops over the entire file. ? If it finds a set
that matches, set a new variable equal to 1, if never finds a match, set it
equal to 0.
>>
>> I have done this in stata but on this project we are trying to use R.
?Is this something that can be done in R, if so, could someone point me in the
correct direction.
>>
>> Thanks,
>>
>> Michael Hess
>> University of Michigan
>> Health System
>>
>> **********************************************************
>> Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not be
used for urgent or sensitive issues
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

Stephan Kolassa

2010-Jul-18 07:06 UTC

head link

[R] Help with a problem

Hi all,

zoo::rollmean() is a nice idea. But if I understand Mike correctly, he 
wants 5 out of any 7 consecutive logicals to be TRUE, where these 5 do 
not necessarily need to be consecutive themselves. (remaining open 
question: could, e.g., the condition on c1 be TRUE for rows 1,2,3,4,5 
and on c2 for rows 3,4,5,6,7, or would it need to be TRUE for the same 
rows?). Then something like this would make sense:

any(rollmean(dat$c1>=100,7)>=5/7-.01 &
rollmean(dat$c2>=8,7)>=5/7-.01))

or

any(rollmean(dat$c1>=100,7)>=5/7-.01 & dat$c2>=8,7)>=5/7-.01))

depending on the open question above.

The "-.01" above may be necessary in light of FAQ 7.31.

HTH,
Stephan



Joshua Wiley schrieb:> Hi Michael,
> 
> The days in your example do not look continuous (at least from my
> thinking), so you may have extra requirements in mind, but take a look
> at this code.  My general thought was first to turn each column into a
> logical vector (c1 >= 100 and c2 >= 8).  Taking advantage of the fact
> that R treats TRUE as 1 and FALSE as 0, compute a rolling mean.  If
> (and only if) 5 consecutive values are TRUE, the mean will be 1.  Next
> I added the rolling means for each column, and then tested whether any
> were 2 (i.e., 1 + 1).
> 
> Cheers,
> 
> Josh
> 
> ###################
> #Load required package
> library(zoo)
> 
> #Your data with ds converted to Date
> #from dput()
> dat <-
> structure(list(ds = structure(c(14702, 14729, 14730, 14731, 14732,
> 14733, 14734, 14735, 14736, 14737, 14738, 14739, 14740, 14741,
> 14742, 14743, 14744), class = "Date"), c1 = c(100L, 11141L, 3L,
> 7615L, 6910L, 5035L, 3007L, 4L, 8335L, 2897L, 6377L, 3177L, 7946L,
> 8705L, 9030L, 8682L, 8440L), c2 = c(0L, 15L, 16L, 14L, 17L, 3L,
> 15L, 14L, 17L, 13L, 17L, 17L, 15L, 0L, 16L, 16L, 1L)), .Names =
c("ds",
> "c1", "c2"), row.names = c(NA, -17L), class =
"data.frame")
> 
> #Order by ds
> dat <- dat[order(dat$ds), ]
> 
> yourvar <- 0
> 
> #Test that 5 consecutive values from c1 AND c2 meet requirements
> if(any(
>  c(rollmean(dat$c1 >= 100, 5) + rollmean(dat$c2 >= 8, 5)) == 2)
>    ) {yourvar <- 1}
> 
> ###################
> 
> On Sat, Jul 17, 2010 at 2:38 PM, Michael Hess <mlhess at
med.umich.edu> wrote:
>> Sorry for not being clear.
>>
>> In the dataset there are around 100 or so days of data (in the case
also rows of data)
>>
>> I need to make sure that the person meets that c1 is at least 100 AND
c2 is at least 8 for 5 of 7 continuous days.
>>
>> I will play with what I have and see if I can find out how to do this.
>>
>> Thanks for the help!
>>
>> Michael
>>
>>>>> Stephan Kolassa  07/17/10 4:50 PM >>>
>> Mike,
>>
>> I am slightly unclear on what you want to do. Do you want to check rows
>> 1 and 7 or 1 *to* 7? Should c1 be at least 100 for *any one* or *all*
>> rows you are looking at, and same for c2?
>>
>> You can sort your data like this:
>> data <- data[order(data$ds),]
>>
>> Type ?order for help. But also do this for added enlightenment...:
>>
>> library(fortunes)
>> fortune("dog")
>>
>> Next, your analysis on the sorted data frame. As I said, I am not
>> entirely clear on what you are looking at, but the following may solve
>> your problem with choices "1 to 7" and "any one"
above.
>>
>> foo <- 0
>> for ( ii in 1:(nrow(data)-8) ) {
>>   if (any(data$c1[ii+seq(0,6)]>=100) &
any(data$c2[ii+seq(0,6)]>=8)) {
>>     foo <- 1
>>     break
>>   }
>> }
>>
>> The variable "foo" should contain what you want it to. Look
at ?any
>> (and, if this does not do what you want it to, at ?all) for further
info.
>>
>> No doubt this could be vectorized, but I think the loop is clear
enough.
>>
>> Good luck!
>> Stephan
>>
>>
>>
>> Michael Hess schrieb:
>>> Hello R users,
>>>
>>> I am a researcher at the University of Michigan looking for a
solution to an R problem.  I have loaded my data in from a mysql database and it
looks like this
>>>
>>>> data
>>>            ds c1 c2
>>> 1  2010-04-03        100           0
>>> 2  2010-04-30      11141          15
>>> 3  2010-05-01      3          16
>>> 4  2010-05-02       7615          14
>>> 5  2010-05-03       6910          17
>>> 6  2010-05-04       5035          3
>>> 7  2010-05-05       3007          15
>>> 8  2010-05-06       4          14
>>> 9  2010-05-07       8335          17
>>> 10 2010-05-08       2897          13
>>> 11 2010-05-09       6377          17
>>> 12 2010-05-10       3177          17
>>> 13 2010-05-11       7946          15
>>> 14 2010-05-12       8705          0
>>> 15 2010-05-13       9030          16
>>> 16 2010-05-14       8682          16
>>> 17 2010-05-15       8440          15
>>>
>>>
>>> What I am trying to do is sort by ds, and take rows 1,7, see if c1
is at least 100 AND c2 is at least 8. If it is not, start with check rows 2,8
and if not there 3,9....until it loops over the entire file.   If it finds a set
that matches, set a new variable equal to 1, if never finds a match, set it
equal to 0.
>>>
>>> I have done this in stata but on this project we are trying to use
R.  Is this something that can be done in R, if so, could someone point me in
the correct direction.
>>>
>>> Thanks,
>>>
>>> Michael Hess
>>> University of Michigan
>>> Health System
>>>
>>> **********************************************************
>>> Electronic Mail is not secure, may not be read every day, and
should not be used for urgent or sensitive issues
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> **********************************************************
>> Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
>

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Jul 2010 - Help with a problem

[R] Help with a problem

[R] Help with a problem

[R] Help with a problem

[R] Help with a problem

[R] Help with a problem

[R] Help with a problem

Seemingly Similar Threads