Hi! Sorry for asking a trivial questions, but I can't seem to figure this out. I have a dataframe called master containing 30-odd variables. In this dataframe, I have observations across these 30 variables from 1930 to 2003 (I've made a "year" variable). How can I drop all rows for which the year is less than 1960? I'm assuming something with ifelse() but I can't quite figure it out. I would appreciate a suggestion of some syntax. Thanks! Toby
Tobias Muhlhofer <t.muhlhofer at lse.ac.uk> writes:> Hi! > > Sorry for asking a trivial questions, but I can't seem to figure this out. > > I have a dataframe called master containing 30-odd variables. > > In this dataframe, I have observations across these 30 variables from > 1930 to 2003 (I've made a "year" variable). How can I drop all rows > for which the year is less than 1960? I'm assuming something with > ifelse() but I can't quite figure it out. > > I would appreciate a suggestion of some syntax.myframe[myframe$year>=1960,] or subset(myframe, year >= 1960) -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
See ?subset, especially the example there. (I'm surprised this function isn't in R-intro...) Andy> From: Tobias Muhlhofer > > Hi! > > Sorry for asking a trivial questions, but I can't seem to > figure this out. > > I have a dataframe called master containing 30-odd variables. > > In this dataframe, I have observations across these 30 variables from > 1930 to 2003 (I've made a "year" variable). How can I drop > all rows for > which the year is less than 1960? I'm assuming something with > ifelse() > but I can't quite figure it out. > > I would appreciate a suggestion of some syntax. > > Thanks! > Toby > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Tobias Muhlhofer wrote:> Hi! > > Sorry for asking a trivial questions, but I can't seem to figure this out. > > I have a dataframe called master containing 30-odd variables. > > In this dataframe, I have observations across these 30 variables from > 1930 to 2003 (I've made a "year" variable). How can I drop all rows for > which the year is less than 1960? I'm assuming something with ifelse() > but I can't quite figure it out. > > I would appreciate a suggestion of some syntax.In R this is called subsetting and the simplest way to do this is with the subset function. older <- subset(master, year < 1960) See ?subset for more variations on this theme.
Thanks. The problem is that there is extremely little on dataframes or matrices in "An Intro to R", which I did read and I frankly don't know where else to go. Once I know a function like subset() exists, I can then read the help files on it and that's fine, but I would never dream this function up myself... As for indexing, I DID read "An Introduction to R" and I did NOT catch the part where it says you can use any variable in the dataframe to index it, nor would I have thought of it by myself. From that documentation, I only learned about using row-labels to index things... But I am definitely thankful for the quick help given to me by people on this list, and so I guess being RTFM'ed is a small price to pay for figuring out how to solve the problem I need to solve. Toby Jeff Laake wrote:> Here's an example: > > earlydata=data[data$year<1960,] > > Lookup help and read manuals on manipulating dataframes. > > > Tobias Muhlhofer wrote: > >> Hi! >> >> Sorry for asking a trivial questions, but I can't seem to figure this >> out. >> >> I have a dataframe called master containing 30-odd variables. >> >> In this dataframe, I have observations across these 30 variables from >> 1930 to 2003 (I've made a "year" variable). How can I drop all rows >> for which the year is less than 1960? I'm assuming something with >> ifelse() but I can't quite figure it out. >> >> I would appreciate a suggestion of some syntax. >> >> Thanks! >> Toby >> >> ______________________________________________ >> R-help at stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html > > > > >-- ************************************************************************** When Thomas Edison invented the light bulb he tried over 2000 experiments before he got it to work. A young reporter asked him how it felt to have failed so many times. He said "I never failed once. I invented the light bulb. It just happened to be a 2000-step process."
Tobias I remember finding Patrick Burns' "S Poetry" (see http://www.burns-stat.com/ ) worth reading - and it covers this sort of thing nicely. Peter Alspach>>> Tobias Muhlhofer <t.muhlhofer at lse.ac.uk> 02/12/04 13:57:47 >>>Thanks. The problem is that there is extremely little on dataframes or matrices in "An Intro to R", which I did read and I frankly don't know where else to go. Once I know a function like subset() exists, I can then read the help files on it and that's fine, but I would never dream this function up myself... As for indexing, I DID read "An Introduction to R" and I did NOT catch the part where it says you can use any variable in the dataframe to index it, nor would I have thought of it by myself. From that documentation, I only learned about using row-labels to index things... But I am definitely thankful for the quick help given to me by people on this list, and so I guess being RTFM'ed is a small price to pay for figuring out how to solve the problem I need to solve. Toby Jeff Laake wrote:> Here's an example: > > earlydata=data[data$year<1960,] > > Lookup help and read manuals on manipulating dataframes. > > > Tobias Muhlhofer wrote: > >> Hi! >> >> Sorry for asking a trivial questions, but I can't seem to figure this >> out. >> >> I have a dataframe called master containing 30-odd variables. >> >> In this dataframe, I have observations across these 30 variables from >> 1930 to 2003 (I've made a "year" variable). How can I drop all rows >> for which the year is less than 1960? I'm assuming something with >> ifelse() but I can't quite figure it out. >> >> I would appreciate a suggestion of some syntax. >> >> Thanks! >> Toby >> >> ______________________________________________ >> R-help at stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html > > > > >-- ************************************************************************** When Thomas Edison invented the light bulb he tried over 2000 experiments before he got it to work. A young reporter asked him how it felt to have failed so many times. He said "I never failed once. I invented the light bulb. It just happened to be a 2000-step process." ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html ______________________________________________________ The contents of this e-mail are privileged and/or confidenti...{{dropped}}
Douglas Bates <bates at stat.wisc.edu> wrote:
In R this is called subsetting and the simplest way to do this
is with the subset function.
older <- subset(master, year < 1960)
I'm not sure that it's the "simplest".
Since rows for year < 1960 were to be dropped,
I'd say the _simplest_ way to do it is one which exploits
a primitive feature of R:
master[master$year >= 1960,]
For me, the fact that the 'subset' argument of subset() is evaluated
in the scope of the data frame makes subset() quite a complicated way
to do things. It's certainly something I'd hesitate to use inside a
function which might be given a data frame without knowing _exactly_
which column names were going to be in scope for the 2nd argument.
The fact that the 'subset' argument is *not* evaluated in the scope
of the 1st argument in other cases also makes subset() a somewhat
confusing function, compared with simple logical indexing.
Strengths of subset() include
- you can select which columns you want, either instead of choosing
a subset or at the same time (but you can do this with indexing too)
- the drop= argument of indexing defaults to FALSE instead of TRUE
(but this is not a problem for indexing data frames, where
master[master$year == 1960,] will give you a data frame even if
there is exactly one row with year 1960)
I would suggest that people who aren't yet thoroughly familiar with
what a simple "[" can do should add subset() to the list of things to
learn about _after_ they've done learning about "[". On second
thoughts,
maybe looking at the implementation of subset.default and subset.data.frame
would be helpful in learning about "[".