thr3ads.net - R help - [R] Accessing data in groups created with split() and other beginner questions [Mar 2010]

If this information is useful, please help other people find it:
Share via:

Clay Heaton

2010-Mar-22 13:27 UTC

[R] Accessing data in groups created with split() and other beginner questions

Hi, very new to R here...

I have a data frame called 'set' with 100k+ rows in it that looks like
this:

  subject           timestamp  yvalue traceabs subjtrace
1       1 1992-07-12 06:05:00      12        1       1-1
2       1 1992-07-12 06:10:00      15        1       1-1
3       1 1992-07-12 06:15:00      17        1       1-1
4       1 1992-07-12 06:20:00      20        1       1-1
5       1 1992-07-12 06:25:00      24        1       1-1
....

There are 89 subjects, each of which have a different number of traces --
it's time series data. There are, in total, around 180 traces. The
"subjtrace" variable is just a concatenation of the subject number, a
hyphen, and the relative trace number. For instance, the first trace for subject
46 is "46-1" but the traceabs value for the same trace is 71.

I need to perform simple statistics on each subject and on each trace. I also
need to graph each trace.

It seems like the easy approach to identifying the variables would be to use the
split() function to create groups:
> temp <- split(set, set$subject)
When I then try, for example:
> summary(temp[1])
all I get as a result is:
  Length Class      Mode
1 5      data.frame list

So I went with:
> lapply(temp[1], summary)
That works, but I'm unable to do something like:
> lapply(temp[1]$yvalue, mean)
because the result returned is:
list()

Ultimately, I'm trying to run the exact same code on each group, as defined
by the subject number, and each trace. I would like to display something like
the following:

Subject # and Summary Statistics
-- Graph of a trace belonging to the subject
-- Summary statistics for the trace
-- Graph of the next trace belonging to the subject
-- Summary statistics for the trace
-- etc...

My intention is to dump this all into a .pdf file with Sweave and LaTeX.

Questions:
- Is split() the best function to use to create the proper groups? or should I
look to create a separate variable for each group using subset, like:
temp.46 <- subset(set, subject==46,select=c(subject, timestamp, yvalue,
subjtrace))

- How do I call functions on data within the groups created by split()? Like...
lapply(temp[1]$yvalue, sd)

- In an effort to try to learn the proper way to approach this, what would be
the best practice for iterating through the data and pushing it to .pdf?

Thanks!

Benilton Carvalho

2010-Mar-22 13:33 UTC

head link

[R] Accessing data in groups created with split() and other beginner questions

To access elements of a list (object returned by split), you need to use
"[[".

Therefore,

summary(temp[[1]])

is what you meant to use (or even summ = lapply(temp, summary) - which
will give you the summaries for every subject).

About using PDFs, I'd recommend you to take a look at Sweave (
http://www.statistik.lmu.de/~leisch/Sweave/ )

b



On Mon, Mar 22, 2010 at 1:27 PM, Clay Heaton <ccheaton at gmail.com>
wrote:> Hi, very new to R here...
>
> I have a data frame called 'set' with 100k+ rows in it that looks
like this:
>
> ?subject ? ? ? ? ? timestamp ?yvalue traceabs subjtrace
> 1 ? ? ? 1 1992-07-12 06:05:00 ? ? ?12 ? ? ? ?1 ? ? ? 1-1
> 2 ? ? ? 1 1992-07-12 06:10:00 ? ? ?15 ? ? ? ?1 ? ? ? 1-1
> 3 ? ? ? 1 1992-07-12 06:15:00 ? ? ?17 ? ? ? ?1 ? ? ? 1-1
> 4 ? ? ? 1 1992-07-12 06:20:00 ? ? ?20 ? ? ? ?1 ? ? ? 1-1
> 5 ? ? ? 1 1992-07-12 06:25:00 ? ? ?24 ? ? ? ?1 ? ? ? 1-1
> ....
>
> There are 89 subjects, each of which have a different number of traces --
it's time series data. There are, in total, around 180 traces. The
"subjtrace" variable is just a concatenation of the subject number, a
hyphen, and the relative trace number. For instance, the first trace for subject
46 is "46-1" but the traceabs value for the same trace is 71.
>
> I need to perform simple statistics on each subject and on each trace. I
also need to graph each trace.
>
> It seems like the easy approach to identifying the variables would be to
use the split() function to create groups:
>
>> temp <- split(set, set$subject)
>
> When I then try, for example:
>
>> summary(temp[1])
>
> all I get as a result is:
> ?Length Class ? ? ?Mode
> 1 5 ? ? ?data.frame list
>
> So I went with:
>
>> lapply(temp[1], summary)
>
> That works, but I'm unable to do something like:
>
>> lapply(temp[1]$yvalue, mean)
>
> because the result returned is:
> list()
>
> Ultimately, I'm trying to run the exact same code on each group, as
defined by the subject number, and each trace. I would like to display something
like the following:
>
> Subject # and Summary Statistics
> -- Graph of a trace belonging to the subject
> -- Summary statistics for the trace
> -- Graph of the next trace belonging to the subject
> -- Summary statistics for the trace
> -- etc...
>
> My intention is to dump this all into a .pdf file with Sweave and LaTeX.
>
> Questions:
> - Is split() the best function to use to create the proper groups? or
should I look to create a separate variable for each group using subset, like:
> temp.46 <- subset(set, subject==46,select=c(subject, timestamp, yvalue,
subjtrace))
>
> - How do I call functions on data within the groups created by split()?
Like...
> lapply(temp[1]$yvalue, sd)
>
> - In an effort to try to learn the proper way to approach this, what would
be the best practice for iterating through the data and pushing it to .pdf?
>
> Thanks!
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Mar 2010 - Accessing data in groups created with split() and other beginner questions

[R] Accessing data in groups created with split() and other beginner questions

[R] Accessing data in groups created with split() and other beginner questions

Seemingly Similar Threads