Aniruddha Mukherjee
2012-Feb-29 08:45 UTC
[R] Error occurred during mean calculation of a column of a data frame, which is apparently contents numeric data
Hello R people, How can I compute the mean of the "Pulse_rate" column of the data frame or matrix from the following character object called "str_got". It has 14 entries and each entry has 8 values, separated by commas. Please go thru the following R commands to know how I tried to unstring and unlist the values to form a data frame.> str_got[1] "bp,67,2011-12-09T19:59:44.044+05:30,9830576102,68.0,124.0,58.0,66.0" "bp,67,2011-12-09T20:19:31.031+05:30,9830576102,72.0,133.0,93.0,40.0" [3] "bp,25155,2011-12-12T13:08:48.048+05:30,9830576102,79.0,143.0,105.0,38.0" "bp,25155,2011-12-12T13:10:44.044+05:30,9830576102,72.0,121.0,72.0,49.0" [5] "bp,25155,2011-12-12T14:32:07.007+05:30,9830576102,97.0,146.0,67.0,79.0" "bp,25155,2011-12-12T15:39:33.033+05:30,9830576102,81.0,135.0,84.0,51.0" [7] "bp,25155,2011-12-12T19:08:08.008+05:30,9830576102,76.0,148.0,62.0,86.0" "bp,25155,2011-12-13T14:29:15.015+05:30,9830576102,99.0,124.0,60.0,64.0" [9] "bp,25155,2012-01-30T13:09:06.006+05:30,9830576102,64.0,120.0,91.0,29.0" "bp,25155,2012-02-06T09:03:35.035+05:30,9830576102,135.0,152.0,100.0,52.0" [11] "bp,25155,2012-02-06T11:54:50.050+05:30,9830576102,72.0,152.0,123.0,29.0" "bp,25155,2012-02-06T13:39:59.059+05:30,9830576102,100.0,113.0,82.0,31.0" [13] "bp,25155,2012-02-06T13:48:40.040+05:30,9830576102,99.0,117.0,84.0,33.0" "bp,25155,2012-02-06T13:48:42.042+05:30,9830576102,99.0,117.0,84.0,33.0">matr<-matrix(unlist(strsplit(str_got, ",")), nrows, byrow=T)> matr[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] "bp" "67" "2011-12-09T19:59:44.044+05:30" "9830576102" "68.0" "124.0" "58.0" "66.0" [2,] "bp" "67" "2011-12-09T20:19:31.031+05:30" "9830576102" "72.0" "133.0" "93.0" "40.0" [3,] "bp" "25155" "2011-12-12T13:08:48.048+05:30" "9830576102" "79.0" "143.0" "105.0" "38.0" [4,] "bp" "25155" "2011-12-12T13:10:44.044+05:30" "9830576102" "72.0" "121.0" "72.0" "49.0" [5,] "bp" "25155" "2011-12-12T14:32:07.007+05:30" "9830576102" "97.0" "146.0" "67.0" "79.0" [6,] "bp" "25155" "2011-12-12T15:39:33.033+05:30" "9830576102" "81.0" "135.0" "84.0" "51.0" [7,] "bp" "25155" "2011-12-12T19:08:08.008+05:30" "9830576102" "76.0" "148.0" "62.0" "86.0" [8,] "bp" "25155" "2011-12-13T14:29:15.015+05:30" "9830576102" "99.0" "124.0" "60.0" "64.0" [9,] "bp" "25155" "2012-01-30T13:09:06.006+05:30" "9830576102" "64.0" "120.0" "91.0" "29.0" [10,] "bp" "25155" "2012-02-06T09:03:35.035+05:30" "9830576102" "135.0" "152.0" "100.0" "52.0" [11,] "bp" "25155" "2012-02-06T11:54:50.050+05:30" "9830576102" "72.0" "152.0" "123.0" "29.0" [12,] "bp" "25155" "2012-02-06T13:39:59.059+05:30" "9830576102" "100.0" "113.0" "82.0" "31.0" [13,] "bp" "25155" "2012-02-06T13:48:40.040+05:30" "9830576102" "99.0" "117.0" "84.0" "33.0" [14,] "bp" "25155" "2012-02-06T13:48:42.042+05:30" "9830576102" "99.0" "117.0" "84.0" "33.0"> > colnames(matr)<-c("Type", "S_Id", "Record_time", "P_id", "Pulse_rate","Syst", "Dias", "Pres") Note column names must be inserted before computing the desired mean value. matr1<-as.data.frame(matr)> matr1Type S_Id Record_time P_id Pulse_rate Syst Dias Pres 1 bp 67 2011-12-09T19:59:44.044+05:30 9830576102 68.0 124.0 58.0 66.0 2 bp 67 2011-12-09T20:19:31.031+05:30 9830576102 72.0 133.0 93.0 40.0 3 bp 25155 2011-12-12T13:08:48.048+05:30 9830576102 79.0 143.0 105.0 38.0 4 bp 25155 2011-12-12T13:10:44.044+05:30 9830576102 72.0 1 21.0 72.0 49.0 5 bp 25155 2011-12-12T14:32:07.007+05:30 9830576102 97.0 146.0 67.0 79.0 6 bp 25155 2011-12-12T15:39:33.033+05:30 9830576102 81.0 135.0 84.0 51.0 7 bp 25155 2011-12-12T19:08:08.008+05:30 9830576102 76.0 148.0 62.0 86.0 8 bp 25155 2011-12-13T14:29:15.015+05:30 9830576102 99.0 124.0 60.0 64.0 9 bp 25155 2012-01-30T13:09:06.006+05:30 9830576102 64.0 120.0 91.0 29.0 10 bp 25155 2012-02-06T09:03:35.035+05:30 9830576102 135.0 152.0 100.0 52.0 11 bp 25155 2012-02-06T11:54:50.050+05:30 9830576102 72.0 152.0 123.0 29.0 12 bp 25155 2012-02-06T13:39:59.059+05:30 9830576102 100.0 113.0 82.0 31.0 13 bp 25155 2012-02-06T13:48:40.040+05:30 9830576102 99.0 117.0 84.0 33.0 14 bp 25155 2012-02-06T13:48:42.042+05:30 9830576102 99.0 117.0 84.0 33.0 This command generated an error, please see below> mean(matr1$Pulse_rate)[1] NA Warning message: In mean.default(matr1$Pulse_rate) : argument is not numeric or logical: returning NA>Following commands would help to understand the object classes.> typeof(str_got)[1] "character"> storage.mode(str_got)[1] "character"> typeof(matr)[1] "character"> typeof(matr1)[1] "list"> storage.mode(matr)[1] "character"> storage.mode(matr1)[1] "list"> typeof(matr1$Pulse_rate)[1] "integer" The following seems very weird :> as.numeric(matr1$Pulse_rate)[1] 4 5 7 5 9 8 6 10 3 2 5 1 10 10> typeof(matr[,5])[1] "character" I could have get my mean value from the following command but that is not desired as I have to use the column-name i.e. Pulse_rate> mean(as.numeric(matr[,5]))[1] 86.64286> as.numeric(matr[,5])[1] 68 72 79 72 97 81 76 99 64 135 72 100 99 99>Please help me by providing the correct steps & commands. =====-----=====-----====Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you [[alternative HTML version deleted]]
Berend Hasselman
2012-Feb-29 10:26 UTC
[R] Error occurred during mean calculation of a column of a data frame, which is apparently contents numeric data
On 29-02-2012, at 09:45, Aniruddha Mukherjee wrote:> Hello R people, > > How can I compute the mean of the "Pulse_rate" column of the data frame or > matrix from the following character object called "str_got". It has 14 > entries and each entry has 8 values, separated by commas. Please go thru > the following R commands to know how I tried to unstring and unlist the > values to form a data frame. >> str_got > [1] "bp,67,2011-12-09T19:59:44.044+05:30,9830576102,68.0,124.0,58.0,66.0" > "bp,67,2011-12-09T20:19:31.031+05:30,9830576102,72.0,133.0,93.0,40.0" > ..... >> > matr<-matrix(unlist(strsplit(str_got, ",")), nrows, byrow=T)nrows? I assume this was set somewhere in your script and not shown. Is it length(str_got)?>> matr > [,1] [,2] [,3] > [,4] [,5] [,6] [,7] [,8] > [1,] "bp" "67" "2011-12-09T19:59:44.044+05:30" "9830576102" "68.0" > ......> Note column names must be inserted before computing the desired mean > value. > matr1<-as.data.frame(matr)Use matr1 <- as.data.frame(matr, stringsAsFactors=FALSE) If you don't dos tringsAsFactors=FALSE the column will be a factor and that is not equivalent with numeric. What's wrong with matr1$Pulse_rate <- as.numeric(matr1$Pulse_rate) Then you can calculate the desired mean with mean(matr1$Pulse_rate) or mean(matr1[,"Pulse_rate"]) Berend