thr3ads.net - R help - [R] separation depending on equal contents in more than one field [Oct 2006]

If this information is useful, please help other people find it:
Share via:

Florian Jansen

2006-Oct-02 15:30 UTC

[R] separation depending on equal contents in more than one field

Hi,

I have a dataframe:

(obs <- data.frame(a=c(1,2,2,3,3,3), b=c(1,2,3,4,4,5), c=1:2))
attach(obs)

In reality its about 1 million rows.

Some of the datasets have same contents in col a and! b like row 4 and 5.
I want to do some calculations on col c within the duplicated rows and 
merge them afterwards:

layer <- function(x) round((1-prod(1-x/100))*100,0)
(covnew <- aggregate(c, list(a=a, b=b), layer))

This works fine, but not with 1 mill. rows because of memory space 
limitations.
So I thought to split the dataframe into the majority of unique rows on 
one hand and all duplicated rows on the other:

With
subset(obs, a %in% a[duplicated(a)])
and !a respectively this works fine for single column comparison.
This must be also possible for two column comparison, but I can`t get it.

Thanks
Florian

-- 
Dr. Florian Jansen
Geobotany & Nature Conservation
Institute for Botany and Landscape Ecology
Ernst-Moritz-Arndt-University
Grimmer Str. 88
17487 Greifswald - Germany
+49 (0)3834 86 4147

jim holtman

2006-Oct-02 17:13 UTC

head link

[R] separation depending on equal contents in more than one field

One way is to 'split' the indices of the rows to determine which ones
to use.  For example from the data give, I got the following:
> split(seq(nrow(obs)), list(obs$a, obs$b), drop=T)$`1.1`
[1] 1

$`2.2`
[1] 2

$`2.3`
[1] 3

$`3.4`
[1] 4 5

$`3.5`
[1] 6

You can then use this resulting list and find all entries with more
than one value and use this to do your calculations.

On 10/2/06, Florian Jansen <jansen at uni-greifswald.de>
wrote:> Hi,
>
> I have a dataframe:
>
> (obs <- data.frame(a=c(1,2,2,3,3,3), b=c(1,2,3,4,4,5), c=1:2))
> attach(obs)
>
> In reality its about 1 million rows.
>
> Some of the datasets have same contents in col a and! b like row 4 and 5.
> I want to do some calculations on col c within the duplicated rows and
> merge them afterwards:
>
> layer <- function(x) round((1-prod(1-x/100))*100,0)
> (covnew <- aggregate(c, list(a=a, b=b), layer))
>
> This works fine, but not with 1 mill. rows because of memory space
> limitations.
> So I thought to split the dataframe into the majority of unique rows on
> one hand and all duplicated rows on the other:
>
> With
> subset(obs, a %in% a[duplicated(a)])
> and !a respectively this works fine for single column comparison.
> This must be also possible for two column comparison, but I can`t get it.
>
> Thanks
> Florian
>
> --
> Dr. Florian Jansen
> Geobotany & Nature Conservation
> Institute for Botany and Landscape Ecology
> Ernst-Moritz-Arndt-University
> Grimmer Str. 88
> 17487 Greifswald - Germany
> +49 (0)3834 86 4147
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Oct 2006 - separation depending on equal contents in more than one field

[R] separation depending on equal contents in more than one field

[R] separation depending on equal contents in more than one field

Possibly Parallel Threads