thr3ads.net - R help - [R] efficient way to make NAs of empty cells in a factor (or character) [Aug 2006]

If this information is useful, please help other people find it:
Share via:

Henrik Parn

2006-Aug-03 13:46 UTC

[R] efficient way to make NAs of empty cells in a factor (or character)

Dear all,

I have some csv-files (originating from Excel-files) containing empty 
cells. In my example file I have four variables of different classes, 
each with some empty cells in the original csv-file:

 > test <- read.csv2("test.csv", dec=".")

 > test
  id id2  x   y
1  a      1  NA
2  b   e NA 2.2
3      f  3 3.3
4  c   g  4 4.4


 > class(test$id)
[1] "factor"
 > class(test$id2)
[1] "factor"
 > class(test$x)
[1] "integer"
 > class(test$y)
[1] "numeric"

In the help text of read.csv2 you can read 'Blank fields are also 
considered to be missing values in logical, integer, numeric and complex 
fields.'. Thus, empty cells in a factor (or a character I assume) is not 
considered as missing values but an own level:

 > is.na(test$id)
[1] FALSE FALSE FALSE FALSE
 > levels(test$id)
[1] ""  "a" "b" "c"

When I work with my real (larger) dataset I would like to use functions 
like 'is.na' and '!is.na' on factors. Now I wonder if there is
an R
alternativ to do 'search (for empty cells) and replace (with NA)' in
Excel?

I have tried a modification of Uwe Ligges suggestion on missing value 
posted 2 Aug:
 > is.na(test[test==""]) <- TRUE

...but it did not work on the data set:

Error in "[<-.data.frame"(`*tmp*`, test == "", value =
c(NA, NA, NA, NA :
        rhs is the wrong length for indexing by a logical matrix


However it worked fine when applied to a single vector:

 > is.na(test$id[test$id==""]) <- TRUE
 > test$id
[1] a    b    <NA> c  
Levels:  a b c

 > is.na(test$id)
[1] FALSE FALSE  TRUE FALSE

Is there a more efficient way to fill empty cells in all my factors in R 
or should I just do it in advance in Excel by 'search and replace'?

Thanks in advance!

-- 
************************
Henrik P?rn
Department of Biology
NTNU
7491 Trondheim
Norway

+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)

Dimitris Rizopoulos

2006-Aug-03 14:20 UTC

head link

[R] efficient way to make NAs of empty cells in a factor (orcharacter)

try to use the 'na.strings' argument of read.csv(), e.g.,

test <- read.csv("test.csv", na.strings = "")


I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm



----- Original Message ----- 
From: "Henrik Parn" <henrik.parn at bio.ntnu.no>
To: "R-help" <r-help at stat.math.ethz.ch>
Sent: Thursday, August 03, 2006 3:46 PM
Subject: [R] efficient way to make NAs of empty cells in a factor 
(orcharacter)


Dear all,

I have some csv-files (originating from Excel-files) containing empty
cells. In my example file I have four variables of different classes,
each with some empty cells in the original csv-file:

 > test <- read.csv2("test.csv", dec=".")

 > test
  id id2  x   y
1  a      1  NA
2  b   e NA 2.2
3      f  3 3.3
4  c   g  4 4.4


 > class(test$id)
[1] "factor"
 > class(test$id2)
[1] "factor"
 > class(test$x)
[1] "integer"
 > class(test$y)
[1] "numeric"

In the help text of read.csv2 you can read 'Blank fields are also
considered to be missing values in logical, integer, numeric and 
complex
fields.'. Thus, empty cells in a factor (or a character I assume) is 
not
considered as missing values but an own level:

 > is.na(test$id)
[1] FALSE FALSE FALSE FALSE
 > levels(test$id)
[1] ""  "a" "b" "c"

When I work with my real (larger) dataset I would like to use 
functions
like 'is.na' and '!is.na' on factors. Now I wonder if there is
an R
alternativ to do 'search (for empty cells) and replace (with NA)' in 
Excel?

I have tried a modification of Uwe Ligges suggestion on missing value
posted 2 Aug:
 > is.na(test[test==""]) <- TRUE

...but it did not work on the data set:

Error in "[<-.data.frame"(`*tmp*`, test == "", value =
c(NA, NA, NA,
NA :
        rhs is the wrong length for indexing by a logical matrix


However it worked fine when applied to a single vector:

 > is.na(test$id[test$id==""]) <- TRUE
 > test$id
[1] a    b    <NA> c
Levels:  a b c

 > is.na(test$id)
[1] FALSE FALSE  TRUE FALSE

Is there a more efficient way to fill empty cells in all my factors in 
R
or should I just do it in advance in Excel by 'search and replace'?

Thanks in advance!

-- 
************************
Henrik P?rn
Department of Biology
NTNU
7491 Trondheim
Norway

+47 735 96282 (office)
+47 909 89 255 (mobile)
+47 735 96100 (fax)

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Petr Pikal

2006-Aug-03 14:40 UTC

head link

[R] efficient way to make NAs of empty cells in a factor (or character)

Hi

try to set

na.strings = ""

in calling read.csv2. Works for me
> is.na(read.delim("clipboard", na.strings="")$mono)[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
> read.delim("clipboard", na.strings="")$mono[1] hruby     hruby     jemny     jemny     nejhrubsi nejhrubsi 
standard  standard  <NA>     
Levels: hruby jemny nejhrubsi standard

or you can try

test[(test=="")] <- NA

HTH
Petr


On 3 Aug 2006 at 15:46, Henrik Parn wrote:

Date sent:      	Thu, 03 Aug 2006 15:46:32 +0200
From:           	Henrik Parn <henrik.parn at bio.ntnu.no>
Organization:   	NTNU
To:             	R-help <r-help at stat.math.ethz.ch>
Subject:        	[R] efficient way to make NAs of empty cells in a factor (or
	character)
Send reply to:  	henrik.parn at bio.ntnu.no
	<mailto:r-help-request at stat.math.ethz.ch?subject=unsubscribe>
	<mailto:r-help-request at stat.math.ethz.ch?subject=subscribe>
> Dear all,
> 
> I have some csv-files (originating from Excel-files) containing empty
> cells. In my example file I have four variables of different classes,
> each with some empty cells in the original csv-file:
> 
>  > test <- read.csv2("test.csv", dec=".")
> 
>  > test
>   id id2  x   y
> 1  a      1  NA
> 2  b   e NA 2.2
> 3      f  3 3.3
> 4  c   g  4 4.4
> 
> 
>  > class(test$id)
> [1] "factor"
>  > class(test$id2)
> [1] "factor"
>  > class(test$x)
> [1] "integer"
>  > class(test$y)
> [1] "numeric"
> 
> In the help text of read.csv2 you can read 'Blank fields are also
> considered to be missing values in logical, integer, numeric and
> complex fields.'. Thus, empty cells in a factor (or a character I
> assume) is not considered as missing values but an own level:
> 
>  > is.na(test$id)
> [1] FALSE FALSE FALSE FALSE
>  > levels(test$id)
> [1] ""  "a" "b" "c"
> 
> When I work with my real (larger) dataset I would like to use
> functions like 'is.na' and '!is.na' on factors. Now I
wonder if there
> is an R alternativ to do 'search (for empty cells) and replace (with
> NA)' in Excel?
> 
> I have tried a modification of Uwe Ligges suggestion on missing value
> posted 2 Aug:
>  > is.na(test[test==""]) <- TRUE
> 
> ...but it did not work on the data set:
> 
> Error in "[<-.data.frame"(`*tmp*`, test == "", value
= c(NA, NA, NA,
> NA :
>         rhs is the wrong length for indexing by a logical matrix
> 
> 
> However it worked fine when applied to a single vector:
> 
>  > is.na(test$id[test$id==""]) <- TRUE
>  > test$id
> [1] a    b    <NA> c  
> Levels:  a b c
> 
>  > is.na(test$id)
> [1] FALSE FALSE  TRUE FALSE
> 
> Is there a more efficient way to fill empty cells in all my factors in
> R or should I just do it in advance in Excel by 'search and
replace'?
> 
> Thanks in advance!
> 
> -- 
> ************************
> Henrik P?rn
> Department of Biology
> NTNU
> 7491 Trondheim
> Norway
> 
> +47 735 96282 (office)
> +47 909 89 255 (mobile)
> +47 735 96100 (fax)
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.
Petr Pikal
petr.pikal at precheza.cz

Reasonably Related Threads

Search for more reasonably related threads

R help - Aug 2006 - efficient way to make NAs of empty cells in a factor (or character)

[R] efficient way to make NAs of empty cells in a factor (or character)

[R] efficient way to make NAs of empty cells in a factor (orcharacter)

[R] efficient way to make NAs of empty cells in a factor (or character)

Reasonably Related Threads