thr3ads.net - R help - [R] Advanced Filtering problem [Jun 2008]

If this information is useful, please help other people find it:
Share via:

T.D.Rudolph

2008-Jun-19 21:06 UTC

[R] Advanced Filtering problem

http://www.nabble.com/file/p18018170/subdata.csv subdata.csv 

I've attached 100 rows of a data frame I am working with.  
I have one factor, id, with 27 levels.  There are two columns of reference
data, x and y (UTM coordinates), one column "date" in POSIXct format,
and
one column "diff" in times format (chron package).

What I am trying to do is as follows:
For each day of the year (date, irrespective of time), select that row for
each id which contains the smallest "diff" value, resulting in an
output
containing in general one value per id per day.

"aggregate" has been suggested but it only produces the columns
considered
in the function and I need all columns intact.  My data frame contains
almost 70,000 entries so manual sorting is not an option.  I know R is
robust but my programming skills are elementary.  The only way I know to
approach it is to first separate every id, then filter, then recombine
somehow.  Is there not a more efficient way for this relatively
straight-forward filtering exercise?

Tyler
-- 
View this message in context:
http://www.nabble.com/Advanced-Filtering-problem-tp18018170p18018170.html
Sent from the R help mailing list archive at Nabble.com.

milton ruser

2008-Jun-19 23:16 UTC

head link

[R] Advanced Filtering problem

Hi Tyler,

I don´t know if I understood well.
Try this. Case not work I try again and again :-)



df<-read.csv("http://www.nabble.com/file/p18018170/subdata.csv")
df.min.diff<-aggregate(df["diff"], df[c("day")], min)
df.subset<-subset(df, paste(df$day, df$diff) %in% paste(df.min.diff$day,
df.min.diff$diff))




On 6/19/08, T.D.Rudolph <prairie.picker@gmail.com>
wrote:>
>
> http://www.nabble.com/file/p18018170/subdata.csv subdata.csv
>
> I've attached 100 rows of a data frame I am working with.
> I have one factor, id, with 27 levels.  There are two columns of reference
> data, x and y (UTM coordinates), one column "date" in POSIXct
format, and
> one column "diff" in times format (chron package).
>
> What I am trying to do is as follows:
> For each day of the year (date, irrespective of time), select that row for
> each id which contains the smallest "diff" value, resulting in an
output
> containing in general one value per id per day.
>
> "aggregate" has been suggested but it only produces the columns
considered
> in the function and I need all columns intact.  My data frame contains
> almost 70,000 entries so manual sorting is not an option.  I know R is
> robust but my programming skills are elementary.  The only way I know to
> approach it is to first separate every id, then filter, then recombine
> somehow.  Is there not a more efficient way for this relatively
> straight-forward filtering exercise?
>
> Tyler
> --
> View this message in context:
> http://www.nabble.com/Advanced-Filtering-problem-tp18018170p18018170.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

hadley wickham

2008-Jun-19 23:49 UTC

head link

[R] Advanced Filtering problem

Hi Tyler,
> I've attached 100 rows of a data frame I am working with.
> I have one factor, id, with 27 levels.  There are two columns of reference
> data, x and y (UTM coordinates), one column "date" in POSIXct
format, and
> one column "diff" in times format (chron package).
>
> What I am trying to do is as follows:
> For each day of the year (date, irrespective of time), select that row for
> each id which contains the smallest "diff" value, resulting in an
output
> containing in general one value per id per day.
There's a basic strategy that makes solving this type of problem much
easier.  I call it split-apply-combine.  The basic idea is that if you
had a single day, the problem would be pretty easy:

df <- read.csv("http://www.nabble.com/file/p18018170/subdata.csv")

oneday <- subset(df, day == "01-01-05")
oneday[which.min(oneday$diff), ]

# Let's make that into a function to make it easier to apply to all days

mindiff <- function(df) df[which.min(df$diff), ]

# Now we split up the data frame so that we have a data frame for
# each day

pieces <- split(df, df$day)

# And use lapply to apply that function to each piece:

results <- lapply(pieces, mindiff)

# Then finally join all the pieces back together

df_done <- do.call("rbind", results)

So we split the data frame into individual days, picked the correct
row for each day, and then joined all the pieces back together.  This
isn't the most efficient solution, but I think it's easy to see how
each part works, and how you can apply it to new situations.  If you
aren't familiar with lapply or do.call, it's worth having a look at
their examples to get a feel for how they work (although for this case
you can of course just copy and paste them without caring how they
work)

Hadley

-- 
http://had.co.nz/

Seemingly Similar Threads

Search for more maybe matching threads

R help - Jun 2008 - Advanced Filtering problem

[R] Advanced Filtering problem

[R] Advanced Filtering problem

[R] Advanced Filtering problem

Seemingly Similar Threads