Hi! Sorry for asking a trivial questions, but I can't seem to figure this out. I have a dataframe called master containing 30-odd variables. In this dataframe, I have observations across these 30 variables from 1930 to 2003 (I've made a "year" variable). How can I drop all rows for which the year is less than 1960? I'm assuming something with ifelse() but I can't quite figure it out. I would appreciate a suggestion of some syntax. Thanks! Toby
Tobias Muhlhofer <t.muhlhofer at lse.ac.uk> writes:> Hi! > > Sorry for asking a trivial questions, but I can't seem to figure this out. > > I have a dataframe called master containing 30-odd variables. > > In this dataframe, I have observations across these 30 variables from > 1930 to 2003 (I've made a "year" variable). How can I drop all rows > for which the year is less than 1960? I'm assuming something with > ifelse() but I can't quite figure it out. > > I would appreciate a suggestion of some syntax.myframe[myframe$year>=1960,] or subset(myframe, year >= 1960) -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
See ?subset, especially the example there. (I'm surprised this function isn't in R-intro...) Andy> From: Tobias Muhlhofer > > Hi! > > Sorry for asking a trivial questions, but I can't seem to > figure this out. > > I have a dataframe called master containing 30-odd variables. > > In this dataframe, I have observations across these 30 variables from > 1930 to 2003 (I've made a "year" variable). How can I drop > all rows for > which the year is less than 1960? I'm assuming something with > ifelse() > but I can't quite figure it out. > > I would appreciate a suggestion of some syntax. > > Thanks! > Toby > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Tobias Muhlhofer wrote:> Hi! > > Sorry for asking a trivial questions, but I can't seem to figure this out. > > I have a dataframe called master containing 30-odd variables. > > In this dataframe, I have observations across these 30 variables from > 1930 to 2003 (I've made a "year" variable). How can I drop all rows for > which the year is less than 1960? I'm assuming something with ifelse() > but I can't quite figure it out. > > I would appreciate a suggestion of some syntax.In R this is called subsetting and the simplest way to do this is with the subset function. older <- subset(master, year < 1960) See ?subset for more variations on this theme.
Thanks. The problem is that there is extremely little on dataframes or matrices in "An Intro to R", which I did read and I frankly don't know where else to go. Once I know a function like subset() exists, I can then read the help files on it and that's fine, but I would never dream this function up myself... As for indexing, I DID read "An Introduction to R" and I did NOT catch the part where it says you can use any variable in the dataframe to index it, nor would I have thought of it by myself. From that documentation, I only learned about using row-labels to index things... But I am definitely thankful for the quick help given to me by people on this list, and so I guess being RTFM'ed is a small price to pay for figuring out how to solve the problem I need to solve. Toby Jeff Laake wrote:> Here's an example: > > earlydata=data[data$year<1960,] > > Lookup help and read manuals on manipulating dataframes. > > > Tobias Muhlhofer wrote: > >> Hi! >> >> Sorry for asking a trivial questions, but I can't seem to figure this >> out. >> >> I have a dataframe called master containing 30-odd variables. >> >> In this dataframe, I have observations across these 30 variables from >> 1930 to 2003 (I've made a "year" variable). How can I drop all rows >> for which the year is less than 1960? I'm assuming something with >> ifelse() but I can't quite figure it out. >> >> I would appreciate a suggestion of some syntax. >> >> Thanks! >> Toby >> >> ______________________________________________ >> R-help at stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html > > > > >-- ************************************************************************** When Thomas Edison invented the light bulb he tried over 2000 experiments before he got it to work. A young reporter asked him how it felt to have failed so many times. He said "I never failed once. I invented the light bulb. It just happened to be a 2000-step process."
Tobias I remember finding Patrick Burns' "S Poetry" (see http://www.burns-stat.com/ ) worth reading - and it covers this sort of thing nicely. Peter Alspach>>> Tobias Muhlhofer <t.muhlhofer at lse.ac.uk> 02/12/04 13:57:47 >>>Thanks. The problem is that there is extremely little on dataframes or matrices in "An Intro to R", which I did read and I frankly don't know where else to go. Once I know a function like subset() exists, I can then read the help files on it and that's fine, but I would never dream this function up myself... As for indexing, I DID read "An Introduction to R" and I did NOT catch the part where it says you can use any variable in the dataframe to index it, nor would I have thought of it by myself. From that documentation, I only learned about using row-labels to index things... But I am definitely thankful for the quick help given to me by people on this list, and so I guess being RTFM'ed is a small price to pay for figuring out how to solve the problem I need to solve. Toby Jeff Laake wrote:> Here's an example: > > earlydata=data[data$year<1960,] > > Lookup help and read manuals on manipulating dataframes. > > > Tobias Muhlhofer wrote: > >> Hi! >> >> Sorry for asking a trivial questions, but I can't seem to figure this >> out. >> >> I have a dataframe called master containing 30-odd variables. >> >> In this dataframe, I have observations across these 30 variables from >> 1930 to 2003 (I've made a "year" variable). How can I drop all rows >> for which the year is less than 1960? I'm assuming something with >> ifelse() but I can't quite figure it out. >> >> I would appreciate a suggestion of some syntax. >> >> Thanks! >> Toby >> >> ______________________________________________ >> R-help at stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html > > > > >-- ************************************************************************** When Thomas Edison invented the light bulb he tried over 2000 experiments before he got it to work. A young reporter asked him how it felt to have failed so many times. He said "I never failed once. I invented the light bulb. It just happened to be a 2000-step process." ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html ______________________________________________________ The contents of this e-mail are privileged and/or confidenti...{{dropped}}
Douglas Bates <bates at stat.wisc.edu> wrote: In R this is called subsetting and the simplest way to do this is with the subset function. older <- subset(master, year < 1960) I'm not sure that it's the "simplest". Since rows for year < 1960 were to be dropped, I'd say the _simplest_ way to do it is one which exploits a primitive feature of R: master[master$year >= 1960,] For me, the fact that the 'subset' argument of subset() is evaluated in the scope of the data frame makes subset() quite a complicated way to do things. It's certainly something I'd hesitate to use inside a function which might be given a data frame without knowing _exactly_ which column names were going to be in scope for the 2nd argument. The fact that the 'subset' argument is *not* evaluated in the scope of the 1st argument in other cases also makes subset() a somewhat confusing function, compared with simple logical indexing. Strengths of subset() include - you can select which columns you want, either instead of choosing a subset or at the same time (but you can do this with indexing too) - the drop= argument of indexing defaults to FALSE instead of TRUE (but this is not a problem for indexing data frames, where master[master$year == 1960,] will give you a data frame even if there is exactly one row with year 1960) I would suggest that people who aren't yet thoroughly familiar with what a simple "[" can do should add subset() to the list of things to learn about _after_ they've done learning about "[". On second thoughts, maybe looking at the implementation of subset.default and subset.data.frame would be helpful in learning about "[".