Hello, I?ve tried several times to learn R, but have never gotten past a particular gate. My data are organized by column in Excel, with column headers in the first row. The columns are of unequal lengths. I export them as CSV, then import the CSV file into R. I wish to summarize the data by column. R inserts NA for missing values, then refuses to operate on columns with NA. R is importing my data into a data frame, and I realize that is inappropriate for what I want to do. How can I import my data so that I can work on columns of unequal length? The first thing I would like to do is generate a table containing mean, median, mode, standard deviation, min, max and count, all per column. Thank you, Tom Example data Dat1 Dat2 Dat3 1 1 5 4 2 7 7 9 3 3 3 5 4 2 NA 5 5 9 NA NA [[alternative HTML version deleted]]
> On Apr 14, 2016, at 2:33 PM, Tom Mosca <tom at vims.edu> wrote: > > Hello, > > I?ve tried several times to learn R, but have never gotten past a particular gate. My data are organized by column in Excel, with column headers in the first row. The columns are of unequal lengths. I export them as CSV, then import the CSV file into R. I wish to summarize the data by column. R inserts NA for missing values, then refuses to operate on columns with NA. R is importing my data into a data frame, and I realize that is inappropriate for what I want to do. > > How can I import my data so that I can work on columns of unequal length? The first thing I would like to do is generate a table containing mean, median, mode, standard deviation, min, max and count, all per column. >Most of the summary statistic functions have an na.rm options that you should set to TRUE.> Thank you, Tom > > Example data > Dat1 Dat2 Dat3 > 1 1 5 4 > 2 7 7 9 > 3 3 3 5 > 4 2 NA 5 > 5 9 NA NALooks like you have an R dataframe already, so I would try( colMeans(data, na.rm=TRUE)> > [[alternative HTML version deleted]]And do learn to configure your email client to post to r-help in plain text.> > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Hi Tom, What you want is a list rather than a data frame. So: df<-read.table(text=" Dat1 Dat2 Dat3 1 1 5 4 2 7 7 9 3 3 3 5 4 2 NA 5 5 9 NA NA", header=TRUE) dflist<-as.list(df) na.remove<-function(x) return(x[!is.na(x)]) sapply(dflist,na.remove) Jim On Fri, Apr 15, 2016 at 7:33 AM, Tom Mosca <tom at vims.edu> wrote:> Hello, > > I?ve tried several times to learn R, but have never gotten past a particular gate. My data are organized by column in Excel, with column headers in the first row. The columns are of unequal lengths. I export them as CSV, then import the CSV file into R. I wish to summarize the data by column. R inserts NA for missing values, then refuses to operate on columns with NA. R is importing my data into a data frame, and I realize that is inappropriate for what I want to do. > > How can I import my data so that I can work on columns of unequal length? The first thing I would like to do is generate a table containing mean, median, mode, standard deviation, min, max and count, all per column. > > Thank you, Tom > > Example data > Dat1 Dat2 Dat3 > 1 1 5 4 > 2 7 7 9 > 3 3 3 5 > 4 2 NA 5 > 5 9 NA NA > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Many basic summary stats in R will not work (i.e. usually return an NA) if there are NAs in the data unless you explicitylauthorize it to do so. With your data set df with(df, mean(Dat2, na.rm = TRUE)) [1] 5 This by the way is functionally the same as mean(df$Dat2, na.rm = TRUE) It's just easier to type the first one In other cases R will do not object to the NA's summary(df) Dat1 Dat2 Dat3 Min. :1.0 Min. :3 Min. :4.00 1st Qu.:2.0 1st Qu.:4 1st Qu.:4.75 Median :3.0 Median :5 Median :5.00 Mean :4.4 Mean :5 Mean :5.75 3rd Qu.:7.0 3rd Qu.:6 3rd Qu.:6.00 Max. :9.0 Max. :7 Max. :9.00 NA's :2 NA's :1 John Kane Kingston ON Canada> -----Original Message----- > From: tom at vims.edu > Sent: Thu, 14 Apr 2016 21:33:31 +0000 > To: r-help at r-project.org > Subject: [R] Unequal column lengths > > Hello, > > Ive tried several times to learn R, but have never gotten past a > particular gate. My data are organized by column in Excel, with column > headers in the first row. The columns are of unequal lengths. I export > them as CSV, then import the CSV file into R. I wish to summarize the > data by column. R inserts NA for missing values, then refuses to operate > on columns with NA. R is importing my data into a data frame, and I > realize that is inappropriate for what I want to do. > > How can I import my data so that I can work on columns of unequal length? > The first thing I would like to do is generate a table containing mean, > median, mode, standard deviation, min, max and count, all per column. > > Thank you, Tom > > Example data > Dat1 Dat2 Dat3 > 1 1 5 4 > 2 7 7 9 > 3 3 3 5 > 4 2 NA 5 > 5 9 NA NA > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.____________________________________________________________ FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!