I have some data files in which some fields have multiple values. For example first last sex major John Smith M ANTH Jane Doe F HIST,BIOL What's the best R-like way to handle these data (Jane's major in my example), so that I can do things like summarize the other fields by them (e.g., sex by major)? Right now I'm processing the files (in excel since they're spreadsheets) by duplicating lines with two values in the major field, eliminating one value per row. I suspect there's a nifty R way to do this. Thanks in advance! John Muccigrosso
John I have to deal with this kind of thing too for my class. # Some functions # for ad$Full.name = "Mark Grimes" get.first.name <- function(cell){ x<-unlist(strsplit(as.character(cell), " ")) return(x[1]) } get.last.name <- function(cell){ x<-unlist(strsplit(as.character(cell), " ")) return(x[2]) } # For roster$Name = "Grimes, Mark L" get.first.namec <- function(cell){ x<-unlist(strsplit(as.character(cell), ", ")) y <- get.first.name(x[2]) return(y) } get.last.namec <- function(cell){ x<-unlist(strsplit(as.character(cell), ", ")) return(x[1]) } Use these functions with the apply family for processing class files. Hope this helps, Mark On Apr 6, 2012, at 9:09 AM, John D. Muccigrosso wrote:> I have some data files in which some fields have multiple values. For example > > first last sex major > John Smith M ANTH > Jane Doe F HIST,BIOL > > What's the best R-like way to handle these data (Jane's major in my example), so that I can do things like summarize the other fields by them (e.g., sex by major)? > > Right now I'm processing the files (in excel since they're spreadsheets) by duplicating lines with two values in the major field, eliminating one value per row. I suspect there's a nifty R way to do this. > > Thanks in advance! > > John Muccigrosso > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
How about reading lines and separating out cases having more than one major? For cases having more than one major, process the data to create duplicate rows - one for each major On Fri, Apr 6, 2012 at 8:39 PM, John D. Muccigrosso < internetj@muccigrosso.org> wrote:> I have some data files in which some fields have multiple values. For > example > > first last sex major > John Smith M ANTH > Jane Doe F HIST,BIOL > > What's the best R-like way to handle these data (Jane's major in my > example), so that I can do things like summarize the other fields by them > (e.g., sex by major)? > > Right now I'm processing the files (in excel since they're spreadsheets) > by duplicating lines with two values in the major field, eliminating one > value per row. I suspect there's a nifty R way to do this. > > Thanks in advance! > > John Muccigrosso > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]