Henrik Parn
2006-Aug-03 13:46 UTC
[R] efficient way to make NAs of empty cells in a factor (or character)
Dear all, I have some csv-files (originating from Excel-files) containing empty cells. In my example file I have four variables of different classes, each with some empty cells in the original csv-file: > test <- read.csv2("test.csv", dec=".") > test id id2 x y 1 a 1 NA 2 b e NA 2.2 3 f 3 3.3 4 c g 4 4.4 > class(test$id) [1] "factor" > class(test$id2) [1] "factor" > class(test$x) [1] "integer" > class(test$y) [1] "numeric" In the help text of read.csv2 you can read 'Blank fields are also considered to be missing values in logical, integer, numeric and complex fields.'. Thus, empty cells in a factor (or a character I assume) is not considered as missing values but an own level: > is.na(test$id) [1] FALSE FALSE FALSE FALSE > levels(test$id) [1] "" "a" "b" "c" When I work with my real (larger) dataset I would like to use functions like 'is.na' and '!is.na' on factors. Now I wonder if there is an R alternativ to do 'search (for empty cells) and replace (with NA)' in Excel? I have tried a modification of Uwe Ligges suggestion on missing value posted 2 Aug: > is.na(test[test==""]) <- TRUE ...but it did not work on the data set: Error in "[<-.data.frame"(`*tmp*`, test == "", value = c(NA, NA, NA, NA : rhs is the wrong length for indexing by a logical matrix However it worked fine when applied to a single vector: > is.na(test$id[test$id==""]) <- TRUE > test$id [1] a b <NA> c Levels: a b c > is.na(test$id) [1] FALSE FALSE TRUE FALSE Is there a more efficient way to fill empty cells in all my factors in R or should I just do it in advance in Excel by 'search and replace'? Thanks in advance! -- ************************ Henrik P?rn Department of Biology NTNU 7491 Trondheim Norway +47 735 96282 (office) +47 909 89 255 (mobile) +47 735 96100 (fax)
Dimitris Rizopoulos
2006-Aug-03 14:20 UTC
[R] efficient way to make NAs of empty cells in a factor (orcharacter)
try to use the 'na.strings' argument of read.csv(), e.g., test <- read.csv("test.csv", na.strings = "") I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Henrik Parn" <henrik.parn at bio.ntnu.no> To: "R-help" <r-help at stat.math.ethz.ch> Sent: Thursday, August 03, 2006 3:46 PM Subject: [R] efficient way to make NAs of empty cells in a factor (orcharacter) Dear all, I have some csv-files (originating from Excel-files) containing empty cells. In my example file I have four variables of different classes, each with some empty cells in the original csv-file: > test <- read.csv2("test.csv", dec=".") > test id id2 x y 1 a 1 NA 2 b e NA 2.2 3 f 3 3.3 4 c g 4 4.4 > class(test$id) [1] "factor" > class(test$id2) [1] "factor" > class(test$x) [1] "integer" > class(test$y) [1] "numeric" In the help text of read.csv2 you can read 'Blank fields are also considered to be missing values in logical, integer, numeric and complex fields.'. Thus, empty cells in a factor (or a character I assume) is not considered as missing values but an own level: > is.na(test$id) [1] FALSE FALSE FALSE FALSE > levels(test$id) [1] "" "a" "b" "c" When I work with my real (larger) dataset I would like to use functions like 'is.na' and '!is.na' on factors. Now I wonder if there is an R alternativ to do 'search (for empty cells) and replace (with NA)' in Excel? I have tried a modification of Uwe Ligges suggestion on missing value posted 2 Aug: > is.na(test[test==""]) <- TRUE ...but it did not work on the data set: Error in "[<-.data.frame"(`*tmp*`, test == "", value = c(NA, NA, NA, NA : rhs is the wrong length for indexing by a logical matrix However it worked fine when applied to a single vector: > is.na(test$id[test$id==""]) <- TRUE > test$id [1] a b <NA> c Levels: a b c > is.na(test$id) [1] FALSE FALSE TRUE FALSE Is there a more efficient way to fill empty cells in all my factors in R or should I just do it in advance in Excel by 'search and replace'? Thanks in advance! -- ************************ Henrik P?rn Department of Biology NTNU 7491 Trondheim Norway +47 735 96282 (office) +47 909 89 255 (mobile) +47 735 96100 (fax) ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
Petr Pikal
2006-Aug-03 14:40 UTC
[R] efficient way to make NAs of empty cells in a factor (or character)
Hi try to set na.strings = "" in calling read.csv2. Works for me> is.na(read.delim("clipboard", na.strings="")$mono)[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE> read.delim("clipboard", na.strings="")$mono[1] hruby hruby jemny jemny nejhrubsi nejhrubsi standard standard <NA> Levels: hruby jemny nejhrubsi standard or you can try test[(test=="")] <- NA HTH Petr On 3 Aug 2006 at 15:46, Henrik Parn wrote: Date sent: Thu, 03 Aug 2006 15:46:32 +0200 From: Henrik Parn <henrik.parn at bio.ntnu.no> Organization: NTNU To: R-help <r-help at stat.math.ethz.ch> Subject: [R] efficient way to make NAs of empty cells in a factor (or character) Send reply to: henrik.parn at bio.ntnu.no <mailto:r-help-request at stat.math.ethz.ch?subject=unsubscribe> <mailto:r-help-request at stat.math.ethz.ch?subject=subscribe>> Dear all, > > I have some csv-files (originating from Excel-files) containing empty > cells. In my example file I have four variables of different classes, > each with some empty cells in the original csv-file: > > > test <- read.csv2("test.csv", dec=".") > > > test > id id2 x y > 1 a 1 NA > 2 b e NA 2.2 > 3 f 3 3.3 > 4 c g 4 4.4 > > > > class(test$id) > [1] "factor" > > class(test$id2) > [1] "factor" > > class(test$x) > [1] "integer" > > class(test$y) > [1] "numeric" > > In the help text of read.csv2 you can read 'Blank fields are also > considered to be missing values in logical, integer, numeric and > complex fields.'. Thus, empty cells in a factor (or a character I > assume) is not considered as missing values but an own level: > > > is.na(test$id) > [1] FALSE FALSE FALSE FALSE > > levels(test$id) > [1] "" "a" "b" "c" > > When I work with my real (larger) dataset I would like to use > functions like 'is.na' and '!is.na' on factors. Now I wonder if there > is an R alternativ to do 'search (for empty cells) and replace (with > NA)' in Excel? > > I have tried a modification of Uwe Ligges suggestion on missing value > posted 2 Aug: > > is.na(test[test==""]) <- TRUE > > ...but it did not work on the data set: > > Error in "[<-.data.frame"(`*tmp*`, test == "", value = c(NA, NA, NA, > NA : > rhs is the wrong length for indexing by a logical matrix > > > However it worked fine when applied to a single vector: > > > is.na(test$id[test$id==""]) <- TRUE > > test$id > [1] a b <NA> c > Levels: a b c > > > is.na(test$id) > [1] FALSE FALSE TRUE FALSE > > Is there a more efficient way to fill empty cells in all my factors in > R or should I just do it in advance in Excel by 'search and replace'? > > Thanks in advance! > > -- > ************************ > Henrik P?rn > Department of Biology > NTNU > 7491 Trondheim > Norway > > +47 735 96282 (office) > +47 909 89 255 (mobile) > +47 735 96100 (fax) > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, > minimal, self-contained, reproducible code.Petr Pikal petr.pikal at precheza.cz