thr3ads.net - R devel - [Rd] subcripts on data frames (PR#9885) [Aug 2007]

If this information is useful, please help other people find it:
Share via:

m.crawley at imperial.ac.uk

2007-Aug-28 14:44 UTC

[Rd] subcripts on data frames (PR#9885)

I'm not sure if this is a bug, or if I'm doing something wrong.
=20
=46rom the worms dataframe, which is at in a file called worms.txt at
=20
http://www.imperial.ac.uk/bio/research/crawley/therbook
<http://www.imperial.ac.uk/bio/research/mjcraw/therbook/index.htm>=20

=20
the idea is to extract a subset of the rows, sorted in declining order
of worm density, with only the maximum worm density from each vegetation
type:
=20

worms<-read.table("c:\\temp\\worms.txt",header=3DT)
attach(worms)
names(worms)

[1] "Field.Name"   "Area"         "Slope"       
"Vegetation"
"Soil.pH"=20=20=20=20=20
[6] "Damp"         "Worm.density"

=20
Usinng "not duplicated" I get two rows for Meadow and none for Scrub
=20
worms[rev(order(Worm.density)),] [!duplicated(Vegetation),]

       Field.Name Area Slope Vegetation Soil.pH  Damp Worm.density
9     The.Orchard  1.9     0    Orchard     5.7 FALSE            9
16   Water.Meadow  3.9     0     Meadow     4.9  TRUE            8
10  Rookery.Slope  1.5     4  Grassland     5.0  TRUE            7
2  Silwood.Bottom  5.1     2     Arable     5.2 FALSE            7
4     Rush.Meadow  2.4     5     Meadow     4.9  TRUE            5

and here is the correct set of rows, but in the wrong order, using
unique
=20
worms[rev(order(Worm.density)),] [unique(Vegetation),]

       Field.Name Area Slope Vegetation Soil.pH  Damp Worm.density
16   Water.Meadow  3.9     0     Meadow     4.9  TRUE            8
9     The.Orchard  1.9     0    Orchard     5.7 FALSE            9
11    Garden.Wood  2.9    10      Scrub     5.2 FALSE            8
2  Silwood.Bottom  5.1     2     Arable     5.2 FALSE            7
10  Rookery.Slope  1.5     4  Grassland     5.0  TRUE            7

=20
Best wishes,
=20
Mick
=20
Prof  M.J. Crawley  FRS
=20
Imperial College London
Silwood Park
Ascot
Berks
SL5 7PY
UK
=20
Phone (0) 207 5942 216
Fax     (0) 207 5942 339
=20

	[[alternative HTML version deleted]]

Tony Plate

2007-Aug-28 15:44 UTC

head link

[Rd] subcripts on data frames (PR#9885)

The line

worms[rev(order(Worm.density)),] [!duplicated(Vegetation),]

looks suspect to me -- it looks like you are first creating an sorted 
version of the dataframe 'worms', and then subsetting it based on values
of 'Vegetation' in the original order.  When reordering dataframes I 
would avoid 'attaching' them and I would break the expression into two 
separate expressions, so to be sure the subsetting is referring to the 
appropriate values:

 > worms <- 
read.table("http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/worms.txt",
header=T)
 > worms2 <- worms[rev(order(worms$Worm.density)), ]
 > worms2[!duplicated(worms2$Vegetation), ]
       Field.Name Area Slope Vegetation Soil.pH  Damp Worm.density
9     The.Orchard  1.9     0    Orchard     5.7 FALSE            9
16   Water.Meadow  3.9     0     Meadow     4.9  TRUE            8
11    Garden.Wood  2.9    10      Scrub     5.2 FALSE            8
10  Rookery.Slope  1.5     4  Grassland     5.0  TRUE            7
2  Silwood.Bottom  5.1     2     Arable     5.2 FALSE            7
 >

Here's a one-liner involving 'with' and 'subset':
 > subset(worms[rev(order(worms$Worm.density)), ], !duplicated(Vegetation))
       Field.Name Area Slope Vegetation Soil.pH  Damp Worm.density
9     The.Orchard  1.9     0    Orchard     5.7 FALSE            9
16   Water.Meadow  3.9     0     Meadow     4.9  TRUE            8
11    Garden.Wood  2.9    10      Scrub     5.2 FALSE            8
10  Rookery.Slope  1.5     4  Grassland     5.0  TRUE            7
2  Silwood.Bottom  5.1     2     Arable     5.2 FALSE            7
 >

-- Tony Plate

m.crawley at imperial.ac.uk wrote:> I'm not sure if this is a bug, or if I'm doing something wrong.
> =20
> =46rom the worms dataframe, which is at in a file called worms.txt at
> =20
> http://www.imperial.ac.uk/bio/research/crawley/therbook
> <http://www.imperial.ac.uk/bio/research/mjcraw/therbook/index.htm>=20
>
> =20
> the idea is to extract a subset of the rows, sorted in declining order
> of worm density, with only the maximum worm density from each vegetation
> type:
> =20
>
> worms<-read.table("c:\\temp\\worms.txt",header=3DT)
> attach(worms)
> names(worms)
>
> [1] "Field.Name"   "Area"         "Slope"    
"Vegetation"
> "Soil.pH"=20=20=20=20=20
> [6] "Damp"         "Worm.density"
>
> =20
> Usinng "not duplicated" I get two rows for Meadow and none for
Scrub
> =20
> worms[rev(order(Worm.density)),] [!duplicated(Vegetation),]
>
>        Field.Name Area Slope Vegetation Soil.pH  Damp Worm.density
> 9     The.Orchard  1.9     0    Orchard     5.7 FALSE            9
> 16   Water.Meadow  3.9     0     Meadow     4.9  TRUE            8
> 10  Rookery.Slope  1.5     4  Grassland     5.0  TRUE            7
> 2  Silwood.Bottom  5.1     2     Arable     5.2 FALSE            7
> 4     Rush.Meadow  2.4     5     Meadow     4.9  TRUE            5
>
> and here is the correct set of rows, but in the wrong order, using
> unique
> =20
> worms[rev(order(Worm.density)),] [unique(Vegetation),]
>
>        Field.Name Area Slope Vegetation Soil.pH  Damp Worm.density
> 16   Water.Meadow  3.9     0     Meadow     4.9  TRUE            8
> 9     The.Orchard  1.9     0    Orchard     5.7 FALSE            9
> 11    Garden.Wood  2.9    10      Scrub     5.2 FALSE            8
> 2  Silwood.Bottom  5.1     2     Arable     5.2 FALSE            7
> 10  Rookery.Slope  1.5     4  Grassland     5.0  TRUE            7
>
> =20
> Best wishes,
> =20
> Mick
> =20
> Prof  M.J. Crawley  FRS
> =20
> Imperial College London
> Silwood Park
> Ascot
> Berks
> SL5 7PY
> UK
> =20
> Phone (0) 207 5942 216
> Fax     (0) 207 5942 339
> =20
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Possibly Parallel Threads

Search for more possibly parallel threads

R devel - Aug 2007 - subcripts on data frames (PR#9885)

[Rd] subcripts on data frames (PR#9885)

[Rd] subcripts on data frames (PR#9885)

Possibly Parallel Threads