I am just trying to split a dataframe of 750 observations of 29 variables by "Site", which is a vector in the dataframe with five text names (ex. PtaCaracol). I just want to generate summary statistics for my other variables for each site individually. I know this should be simple, and I did read up on options, choosing to use "subset", but I am honestly confounded that this does not result in a new dataframe split by this factor. I tried "subset" with an argument based on my various other numeric vectors, and it worked just fine. My very simple code is below, and thank you for any responses. I apologize for asking such a simple question, but I wrote this out exactly as found it in an exactly similar question, which indicated this should work. Now I am confused! Thank you for any assistance. Ben Neal # Load data from CSV file Cover = read.csv ("/Users/benjaminneal/Documents/1110_Panama/Transect series /1210_BocasTransectSummary.csv", header=T) # Divide dataframe by Site names Site1 <- subset(Cover, Site = "PtaCaracol") PS The csv loads fine, gives normal summary stats, and does not seem to be the issue. I thought about just renaming by hand all the sites in the file (i.e. "PtaCaracol"=1 . . but this seems like a weak solution).
On 21/04/12 13:52, Ben Neal wrote:> I am just trying to split a dataframe of 750 observations of 29 variables by "Site", which is a vector in the dataframe with five text names (ex. PtaCaracol). I just want to generate summary statistics for my other variables for each site individually. > > I know this should be simple, and I did read up on options, choosing to use "subset", but I am honestly confounded that this does not result in a new dataframe split by this factor. I tried "subset" with an argument based on my various other numeric vectors, and it worked just fine. My very simple code is below, and thank you for any responses. I apologize for asking such a simple question, but I wrote this out exactly as found it in an exactly similar question, which indicated this should work. Now I am confused! Thank you for any assistance. Ben Neal > > # Load data from CSV file > Cover = read.csv ("/Users/benjaminneal/Documents/1110_Panama/Transect series /1210_BocasTransectSummary.csv", > header=T) > # Divide dataframe by Site names > Site1<- subset(Cover, Site = "PtaCaracol") > > PS The csv loads fine, gives normal summary stats, and does not seem to be the issue. I thought about just renaming by hand all the sites in the file (i.e. "PtaCaracol"=1 . . but this seems like a weak solution).What about a ***reproducible*** example? What isn't working? What (if any) error message did you get? I am at least 99% confident that numeric versus character values in Cover$Site is completely irrelevant. My ***guess*** (and it can only be a guess, given the vagueness and lack of detail in your question) is that you need a double ("logical") equals sign. I.e. I suspect that Site1<- subset(Cover, Site == "PtaCaracol") will work. That being said --- since you want to *split* your data frame by "Site" --- why the <expletive deleted> don't you use split()??? E.g.: Splitz <- split(Cover,Cover$Site) cheers, Rolf Turner
Thank you Jorge. I did get subset to work on a character vector last night, the only difference being I used two equals signs: (not working) > Site1 <- subset(Cover, Site = "PtaCaracol") (working) > Site1 <- subset(Cover, Site = = "PtaCaracol") I am unsure why this is, but it worked. I believe this was the error, causing my returned dataframes to contain all the original entries. Thanks again; I will explore the plyr package as well. Ben -----Original Message----- From: Jorge I Velez [mailto:jorgeivanvelez at gmail.com] Sent: Fri 4/20/2012 8:17 PM To: Ben Neal Subject: Re: [R] Splitting a dataframe by character vector Dear Ben, Check ?tapply, ?aggregate, ?ave for some ways to accomplish this using the base package. Also, check and check the plyr package at http://plyr.had.co.nz/ along with the examples and accompanying paper. HTH, Jorge.- On Fri, Apr 20, 2012 at 9:52 PM, Ben Neal <> wrote:> I am just trying to split a dataframe of 750 observations of 29 variables > by "Site", which is a vector in the dataframe with five text names (ex. > PtaCaracol). I just want to generate summary statistics for my other > variables for each site individually. > > I know this should be simple, and I did read up on options, choosing to > use "subset", but I am honestly confounded that this does not result in a > new dataframe split by this factor. I tried "subset" with an argument based > on my various other numeric vectors, and it worked just fine. My very > simple code is below, and thank you for any responses. I apologize for > asking such a simple question, but I wrote this out exactly as found it in > an exactly similar question, which indicated this should work. Now I am > confused! Thank you for any assistance. Ben Neal > > # Load data from CSV file > Cover = read.csv ("/Users/benjaminneal/Documents/1110_Panama/Transect > series /1210_BocasTransectSummary.csv", > header=T) > # Divide dataframe by Site names > Site1 <- subset(Cover, Site = "PtaCaracol") > > PS The csv loads fine, gives normal summary stats, and does not seem to be > the issue. I thought about just renaming by hand all the sites in the file > (i.e. "PtaCaracol"=1 . . but this seems like a weak solution). > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
> -----Original Message----- > I am just trying to split a dataframe of 750 observations of > 29 variables by "Site", which is a vector in the dataframe > with five text names (ex. PtaCaracol).A couple of methods. i) First, look up ?split, which chops your data frame into a list of five data frames. Then use lapply on the list to get a list of summaries, or sapply to get something that will (if the results of your summary are a simple number or vector) look like an array. ii) look up ?by, which will do something like lapply (and returns a list with extra features) and will print a tidier result.> I tried "subset" .... > # Divide dataframe by Site names > Site1 <- subset(Cover, Site = "PtaCaracol")Check your syntax again; that should have been Site1 <- subset(Cover, Site == "PtaCaracol") '=' is a pairwise link or an assignment operator, not the equality test that subset would be looking for. Steve E ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}