thr3ads.net - R help - [R] question about subseting a dataframe [May 2008]

If this information is useful, please help other people find it:
Share via:

Dipankar Basu

2008-May-10 03:23 UTC

[R] question about subseting a dataframe

Hi!

I am using R version 2.7.0 and am working on a panel dataset read into R as
a dataframe; I call it "ex". The variables in "ex" are: id 
year  x

id: a character string which identifies the unit
year: identifies the time period
x: the variable of interest (which might contain NAs).

Here is an example:> id <- rep(c("A","B","C"),2)
> year <- c(rep(1970,3),rep(1980,3))
> x <- c(20,30,40,25,35,45)
> ex <- data.frame(id=id,year=year,x=x)
> ex id year  x
1  A 1970 20
2  B 1970 30
3  C 1970 40
4  A 1980 25
5  B 1980 35
6  C 1980 45

I want to draw a subset of "ex" by selecting only the A and B units:
> ex1 <- subset(ex[which(ex$id=="A"|ex$id=="B"),])
Now I want to do some computations on x for each selected unit only:
> tapply(ex1$x, ex1$id, mean)  A    B    C
22.5 32.5   NA

But this gives me an NA value for the unit C, which I thought I had already
left out. How do I ensure that the computation (in the last step) is limited
to only the units I have selected in the first step?

Dipankar

	[[alternative HTML version deleted]]

Charles Plessy

2008-May-10 03:36 UTC

head link

[R] question about subseting a dataframe

Le Fri, May 09, 2008 at 11:23:37PM -0400, Dipankar Basu a ?crit
:> > ex <- data.frame(id=id,year=year,x=x)
> > ex1 <-
subset(ex[which(ex$id=="A"|ex$id=="B"),])
> > tapply(ex1$x, ex1$id, mean)
>   A    B    C
> 22.5 32.5   NA
Dear Dipankar,

The reason for this behaviour is that the class of ex$id is "factor".
You can avoid this by using the I command, like in:

ex <- data.frame(id=I(id),year=year,x=x)

Have a nice day,

-- 
Charles Plessy
http://charles.plessy.org
Wak?, Saitama, Japan

ronggui

2008-May-10 03:37 UTC

head link

[R] question about subseting a dataframe

Because id is a factor in your data frame, and the levels (including
"C") is kept when subsetted. Here is one way to get ride of
"C".
> ex1$id <- factor(ex1$id)
> tapply(ex1$x, ex1$id, mean)   A    B
22.5 32.5


On Sat, May 10, 2008 at 11:23 AM, Dipankar Basu <basu.15 at gmail.com>
wrote:> Hi!
>
> I am using R version 2.7.0 and am working on a panel dataset read into R as
> a dataframe; I call it "ex". The variables in "ex" are:
id  year  x
>
> id: a character string which identifies the unit
> year: identifies the time period
> x: the variable of interest (which might contain NAs).
>
> Here is an example:
>> id <- rep(c("A","B","C"),2)
>> year <- c(rep(1970,3),rep(1980,3))
>> x <- c(20,30,40,25,35,45)
>> ex <- data.frame(id=id,year=year,x=x)
>> ex
>  id year  x
> 1  A 1970 20
> 2  B 1970 30
> 3  C 1970 40
> 4  A 1980 25
> 5  B 1980 35
> 6  C 1980 45
>
> I want to draw a subset of "ex" by selecting only the A and B
units:
>
>> ex1 <- subset(ex[which(ex$id=="A"|ex$id=="B"),])
>
> Now I want to do some computations on x for each selected unit only:
>
>> tapply(ex1$x, ex1$id, mean)
>  A    B    C
> 22.5 32.5   NA
>
> But this gives me an NA value for the unit C, which I thought I had already
> left out. How do I ensure that the computation (in the last step) is
limited
> to only the units I have selected in the first step?
>
> Dipankar
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
HUANG Ronggui, Wincent

Bachelor of Social Work, Fudan University, China

Master of sociology, Fudan University, China

Ph.D. Candidate, CityU of HK,
http://www.cityu.edu.hk/sa/psa_web2006/students/rdegree/huangronggui.html

Yasir Kaheil

2008-May-10 03:47 UTC

head link

[R] question about subseting a dataframe

you need to redefine the factor on ex1$id.. cast it again as.factor to
redefine the levels.


Dipankar Basu wrote:> 
> Hi!
> 
> I am using R version 2.7.0 and am working on a panel dataset read into R
> as
> a dataframe; I call it "ex". The variables in "ex" are:
id  year  x
> 
> id: a character string which identifies the unit
> year: identifies the time period
> x: the variable of interest (which might contain NAs).
> 
> Here is an example:
>> id <- rep(c("A","B","C"),2)
>> year <- c(rep(1970,3),rep(1980,3))
>> x <- c(20,30,40,25,35,45)
>> ex <- data.frame(id=id,year=year,x=x)
>> ex
>  id year  x
> 1  A 1970 20
> 2  B 1970 30
> 3  C 1970 40
> 4  A 1980 25
> 5  B 1980 35
> 6  C 1980 45
> 
> I want to draw a subset of "ex" by selecting only the A and B
units:
> 
>> ex1 <- subset(ex[which(ex$id=="A"|ex$id=="B"),])
> 
> Now I want to do some computations on x for each selected unit only:
> 
>> tapply(ex1$x, ex1$id, mean)
>   A    B    C
> 22.5 32.5   NA
> 
> But this gives me an NA value for the unit C, which I thought I had
> already
> left out. How do I ensure that the computation (in the last step) is
> limited
> to only the units I have selected in the first step?
> 
> Dipankar
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-----
Yasir H. Kaheil
Catchment Research Facility
The University of Western Ontario 

-- 
View this message in context:
http://www.nabble.com/question-about-subseting-a-dataframe-tp17159592p17159679.html
Sent from the R help mailing list archive at Nabble.com.

Reasonably Related Threads

Search for more possibly parallel threads

R help - May 2008 - question about subseting a dataframe

[R] question about subseting a dataframe

[R] question about subseting a dataframe

[R] question about subseting a dataframe

[R] question about subseting a dataframe

Reasonably Related Threads