Hello, I?ve tried several times to learn R, but have never gotten past a particular gate. My data are organized by column in Excel, with column headers in the first row. The columns are of unequal lengths. I export them as CSV, then import the CSV file into R. I wish to summarize the data by column. R inserts NA for missing values, then refuses to operate on columns with NA. R is importing my data into a data frame, and I realize that is inappropriate for what I want to do. How can I import my data so that I can work on columns of unequal length? The first thing I would like to do is generate a table containing mean, median, mode, standard deviation, min, max and count, all per column. Thank you, Tom Example data Dat1 Dat2 Dat3 1 1 5 4 2 7 7 9 3 3 3 5 4 2 NA 5 5 9 NA NA [[alternative HTML version deleted]]
> On Apr 14, 2016, at 2:33 PM, Tom Mosca <tom at vims.edu> wrote: > > Hello, > > I?ve tried several times to learn R, but have never gotten past a particular gate. My data are organized by column in Excel, with column headers in the first row. The columns are of unequal lengths. I export them as CSV, then import the CSV file into R. I wish to summarize the data by column. R inserts NA for missing values, then refuses to operate on columns with NA. R is importing my data into a data frame, and I realize that is inappropriate for what I want to do. > > How can I import my data so that I can work on columns of unequal length? The first thing I would like to do is generate a table containing mean, median, mode, standard deviation, min, max and count, all per column. >Most of the summary statistic functions have an na.rm options that you should set to TRUE.> Thank you, Tom > > Example data > Dat1 Dat2 Dat3 > 1 1 5 4 > 2 7 7 9 > 3 3 3 5 > 4 2 NA 5 > 5 9 NA NALooks like you have an R dataframe already, so I would try( colMeans(data, na.rm=TRUE)> > [[alternative HTML version deleted]]And do learn to configure your email client to post to r-help in plain text.> > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Hi Tom, What you want is a list rather than a data frame. So: df<-read.table(text=" Dat1 Dat2 Dat3 1 1 5 4 2 7 7 9 3 3 3 5 4 2 NA 5 5 9 NA NA", header=TRUE) dflist<-as.list(df) na.remove<-function(x) return(x[!is.na(x)]) sapply(dflist,na.remove) Jim On Fri, Apr 15, 2016 at 7:33 AM, Tom Mosca <tom at vims.edu> wrote:> Hello, > > I?ve tried several times to learn R, but have never gotten past a particular gate. My data are organized by column in Excel, with column headers in the first row. The columns are of unequal lengths. I export them as CSV, then import the CSV file into R. I wish to summarize the data by column. R inserts NA for missing values, then refuses to operate on columns with NA. R is importing my data into a data frame, and I realize that is inappropriate for what I want to do. > > How can I import my data so that I can work on columns of unequal length? The first thing I would like to do is generate a table containing mean, median, mode, standard deviation, min, max and count, all per column. > > Thank you, Tom > > Example data > Dat1 Dat2 Dat3 > 1 1 5 4 > 2 7 7 9 > 3 3 3 5 > 4 2 NA 5 > 5 9 NA NA > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Many basic summary stats in R will not work (i.e. usually return an NA) if there
are NAs in the data unless you explicitylauthorize it to do so.
With your data set df
with(df, mean(Dat2, na.rm = TRUE))
[1] 5
This by the way is functionally the same as
mean(df$Dat2, na.rm = TRUE)
It's just easier to type the first one
In other cases R will do not object to the NA's
summary(df)
Dat1 Dat2 Dat3
Min. :1.0 Min. :3 Min. :4.00
1st Qu.:2.0 1st Qu.:4 1st Qu.:4.75
Median :3.0 Median :5 Median :5.00
Mean :4.4 Mean :5 Mean :5.75
3rd Qu.:7.0 3rd Qu.:6 3rd Qu.:6.00
Max. :9.0 Max. :7 Max. :9.00
NA's :2 NA's :1
John Kane
Kingston ON Canada
> -----Original Message-----
> From: tom at vims.edu
> Sent: Thu, 14 Apr 2016 21:33:31 +0000
> To: r-help at r-project.org
> Subject: [R] Unequal column lengths
>
> Hello,
>
> Ive tried several times to learn R, but have never gotten past a
> particular gate. My data are organized by column in Excel, with column
> headers in the first row. The columns are of unequal lengths. I export
> them as CSV, then import the CSV file into R. I wish to summarize the
> data by column. R inserts NA for missing values, then refuses to operate
> on columns with NA. R is importing my data into a data frame, and I
> realize that is inappropriate for what I want to do.
>
> How can I import my data so that I can work on columns of unequal length?
> The first thing I would like to do is generate a table containing mean,
> median, mode, standard deviation, min, max and count, all per column.
>
> Thank you, Tom
>
> Example data
> Dat1 Dat2 Dat3
> 1 1 5 4
> 2 7 7 9
> 3 3 3 5
> 4 2 NA 5
> 5 9 NA NA
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!