thr3ads.net - R help - [R] regular expression for na.strings / read.table [Feb 2008]

If this information is useful, please help other people find it:
Share via:

jessica.gervais at tudor.lu

2008-Feb-12 14:30 UTC

[R] regular expression for na.strings / read.table

Dear all,

I am working with a csv file.
Some data of the file are not valid and they are marked with a star '*'.
For example : *789.

I have attached with this email a example file (test.txt) that looks like
the data I have to work with.


I see 2 possibilities ..thast I cannot manage anyway in R:

1-first & easiest solution:
Read the data with read.csv in R, and define as na strings all cells
containing a star (*).
Something which would looks like this ...
>DATA<-read.csv("test.txt",na.strings=list(length(grep("\\*",DATA,value=T))==0))
> DATA  X1 X.789 LNM. X78 X56  X89 X56.1 X100
1  2   700  AUW  78  56   89    56  100
2  3   400  TOC  78  56   89    56   10
3  4   389  RMN  78  56   89    56  *89
4  5   400  LNM  78  56 *452    56  100
5  6   200  UTC  78 *40   89    56  100
6  7   100  GAT  78  56    8    56 *100
7  8    79 *LNM  78  56    9    56  100
8  9    89  TCG  78  56  800    56 *100
9 10   78*  LNM  78  56   89    56  100


...but which would work (Stars are still there)! Do anyone knows how to do
that ?

2-Second solution:
- first read the file with DATA<-read.csv("test.txt")
- then replace all fields containing a * with NA in applying the following
function to the object DATA:
DATA_cleaned<-apply(DATA,c(1,2),function(x){if(length(grep("\\*",x,value=TRUE))==1){x<-NA}})
 DATA_cleaned
      X1   X.789 LNM. X78  X56  X89  X56.1 X100
 [1,] NULL NULL  NULL NULL NULL NULL NULL  NULL
 [2,] NULL NULL  NULL NULL NULL NULL NULL  NULL
 [3,] NULL NULL  NULL NULL NULL NULL NULL  NA
 [4,] NULL NULL  NULL NULL NULL NA   NULL  NULL
 [5,] NULL NULL  NULL NULL NA   NULL NULL  NULL
 [6,] NULL NULL  NULL NULL NULL NULL NULL  NA
 [7,] NULL NULL  NA   NULL NULL NULL NULL  NULL
 [8,] NULL NULL  NULL NULL NULL NULL NULL  NA
 [9,] NULL NA    NULL NULL NULL NULL NULL  NULL

stars have deaseper, but all the rest too !
The pb comes from the fact that if a field does not contain any *, the
command
if(length(grep("\\*",x,value=T))==1) return NULL instead of FALSE !

I you have any idea, please let me know !

Many thanks,

Jessica
____________________________________

Jessica Gervais
Mail: jessica.gervais at tudor.lu

Resource Centre for Environmental Technologies,
Public Research Centre Henri Tudor,
Technoport Schlassgoart,
66 rue de Luxembourg,
P.O. BOX 144,
L-4002 Esch-sur-Alzette, Luxembourg

(See attached file: test.txt)
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.txt
Url:
https://stat.ethz.ch/pipermail/r-help/attachments/20080212/b67d1cbd/attachment.txt

milton ruser

2008-Feb-12 15:01 UTC

head link

[R] regular expression for na.strings / read.table

Using brute force you can do something like:


my.df<-read.table(stdin(),head=T,sep=",")
X1,X.789,LNM.,X78,X56,X89,X56.1,X100
1,2,700,AUW,78,56,89,56,100
2,3,400,TOC,78,56,89,56,10
3,4,389,RMN,78,56,89,56,*89
4,5,400,LNM,78,56,*452,56,100
5,6,200,UTC,78,*40,89,56,100
6,7,100,GAT,78,56,8,56,*100
7,8,79,*LNM,78,56,9,56,100
8,9,89,TCG,78,56,800,56,*100
9,10,78*,LNM,78,56,89,56,100

X56.fix.index<-grep("\\*",my.df$X56 <file://*%22,my.df$X56/>)

my.df$X56[X56.fix.index]<-NA
my.df$X56<-as.numeric(my.df$X56)







On 2/12/08, jessica.gervais@tudor.lu <jessica.gervais@tudor.lu>
wrote:>
>
> Dear all,
>
> I am working with a csv file.
> Some data of the file are not valid and they are marked with a star
'*'.
> For example : *789.
>
> I have attached with this email a example file (test.txt) that looks like
> the data I have to work with.
>
>
> I see 2 possibilities ..thast I cannot manage anyway in R:
>
> 1-first & easiest solution:
> Read the data with read.csv in R, and define as na strings all cells
> containing a star (*).
> Something which would looks like this ...
>
> >
> DATA<-read.csv("test.txt",na.strings=list
> (length(grep("\\*",DATA,value=T))==0))
>
> > DATA
> X1 X.789 LNM. X78 X56  X89 X56.1 X100
> 1  2   700  AUW  78  56   89    56  100
> 2  3   400  TOC  78  56   89    56   10
> 3  4   389  RMN  78  56   89    56  *89
> 4  5   400  LNM  78  56 *452    56  100
> 5  6   200  UTC  78 *40   89    56  100
> 6  7   100  GAT  78  56    8    56 *100
> 7  8    79 *LNM  78  56    9    56  100
> 8  9    89  TCG  78  56  800    56 *100
> 9 10   78*  LNM  78  56   89    56  100
>
>
> ...but which would work (Stars are still there)! Do anyone knows how to do
> that ?
>
> 2-Second solution:
> - first read the file with DATA<-read.csv("test.txt")
> - then replace all fields containing a * with NA in applying the following
> function to the object DATA:
>
>
DATA_cleaned<-apply(DATA,c(1,2),function(x){if(length(grep("\\*",x,value=TRUE))==1){x<-NA}})
> DATA_cleaned
>      X1   X.789 LNM. X78  X56  X89  X56.1 X100
> [1,] NULL NULL  NULL NULL NULL NULL NULL  NULL
> [2,] NULL NULL  NULL NULL NULL NULL NULL  NULL
> [3,] NULL NULL  NULL NULL NULL NULL NULL  NA
> [4,] NULL NULL  NULL NULL NULL NA   NULL  NULL
> [5,] NULL NULL  NULL NULL NA   NULL NULL  NULL
> [6,] NULL NULL  NULL NULL NULL NULL NULL  NA
> [7,] NULL NULL  NA   NULL NULL NULL NULL  NULL
> [8,] NULL NULL  NULL NULL NULL NULL NULL  NA
> [9,] NULL NA    NULL NULL NULL NULL NULL  NULL
>
> stars have deaseper, but all the rest too !
> The pb comes from the fact that if a field does not contain any *, the
> command
> if(length(grep("\\*",x,value=T))==1) return NULL instead of FALSE
!
>
> I you have any idea, please let me know !
>
> Many thanks,
>
> Jessica
> ____________________________________
>
> Jessica Gervais
> Mail: jessica.gervais@tudor.lu
>
> Resource Centre for Environmental Technologies,
> Public Research Centre Henri Tudor,
> Technoport Schlassgoart,
> 66 rue de Luxembourg,
> P.O. BOX 144,
> L-4002 Esch-sur-Alzette, Luxembourg
>
> (See attached file: test.txt)
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
	[[alternative HTML version deleted]]

Henrique Dallazuanna

2008-Feb-12 15:07 UTC

head link

[R] regular expression for na.strings / read.table

as.data.frame(sapply(DATA, function(x){x[grep(patt="\\*",
x)]<-NA;x}))

On 12/02/2008, jessica.gervais at tudor.lu <jessica.gervais at tudor.lu>
wrote:>
> Dear all,
>
> I am working with a csv file.
> Some data of the file are not valid and they are marked with a star
'*'.
> For example : *789.
>
> I have attached with this email a example file (test.txt) that looks like
> the data I have to work with.
>
>
> I see 2 possibilities ..thast I cannot manage anyway in R:
>
> 1-first & easiest solution:
> Read the data with read.csv in R, and define as na strings all cells
> containing a star (*).
> Something which would looks like this ...
>
> >
>
DATA<-read.csv("test.txt",na.strings=list(length(grep("\\*",DATA,value=T))==0))
>
> > DATA
>   X1 X.789 LNM. X78 X56  X89 X56.1 X100
> 1  2   700  AUW  78  56   89    56  100
> 2  3   400  TOC  78  56   89    56   10
> 3  4   389  RMN  78  56   89    56  *89
> 4  5   400  LNM  78  56 *452    56  100
> 5  6   200  UTC  78 *40   89    56  100
> 6  7   100  GAT  78  56    8    56 *100
> 7  8    79 *LNM  78  56    9    56  100
> 8  9    89  TCG  78  56  800    56 *100
> 9 10   78*  LNM  78  56   89    56  100
>
>
> ...but which would work (Stars are still there)! Do anyone knows how to do
> that ?
>
> 2-Second solution:
> - first read the file with DATA<-read.csv("test.txt")
> - then replace all fields containing a * with NA in applying the following
> function to the object DATA:
>
DATA_cleaned<-apply(DATA,c(1,2),function(x){if(length(grep("\\*",x,value=TRUE))==1){x<-NA}})
>  DATA_cleaned
>       X1   X.789 LNM. X78  X56  X89  X56.1 X100
>  [1,] NULL NULL  NULL NULL NULL NULL NULL  NULL
>  [2,] NULL NULL  NULL NULL NULL NULL NULL  NULL
>  [3,] NULL NULL  NULL NULL NULL NULL NULL  NA
>  [4,] NULL NULL  NULL NULL NULL NA   NULL  NULL
>  [5,] NULL NULL  NULL NULL NA   NULL NULL  NULL
>  [6,] NULL NULL  NULL NULL NULL NULL NULL  NA
>  [7,] NULL NULL  NA   NULL NULL NULL NULL  NULL
>  [8,] NULL NULL  NULL NULL NULL NULL NULL  NA
>  [9,] NULL NA    NULL NULL NULL NULL NULL  NULL
>
> stars have deaseper, but all the rest too !
> The pb comes from the fact that if a field does not contain any *, the
> command
> if(length(grep("\\*",x,value=T))==1) return NULL instead of FALSE
!
>
> I you have any idea, please let me know !
>
> Many thanks,
>
> Jessica
> ____________________________________
>
> Jessica Gervais
> Mail: jessica.gervais at tudor.lu
>
> Resource Centre for Environmental Technologies,
> Public Research Centre Henri Tudor,
> Technoport Schlassgoart,
> 66 rue de Luxembourg,
> P.O. BOX 144,
> L-4002 Esch-sur-Alzette, Luxembourg
>
> (See attached file: test.txt)
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

-- 
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O

jim holtman

2008-Feb-12 16:41 UTC

head link

[R] regular expression for na.strings / read.table

Here is one way of doing it:
> # read the file in as lines, do the convert and then re-read
> x <- readLines(textConnection(" X1 X.789 LNM. X78 X56  X89 X56.1
X100+ 1  2   700  AUW  78  56   89    56  100
+ 2  3   400  TOC  78  56   89    56   10
+ 3  4   389  RMN  78  56   89    56  *89
+ 4  5   400  LNM  78  56 *452    56  100
+ 5  6   200  UTC  78 *40   89    56  100
+ 6  7   100  GAT  78  56    8    56 *100
+ 7  8    79 *LNM  78  56    9    56  100
+ 8  9    89  TCG  78  56  800    56 *100
+ 9 10   78*  LNM  78  56   89    56  100"))> x.c <- gsub("\\*[[:alnum:]]*|[[:alnum:]]*\\*", "NA",
x)
> x.new <- read.table(textConnection(x.c), header=TRUE)
> closeAllConnections()
>
> x.new  X1 X.789 LNM. X78 X56 X89 X56.1 X100
1  2   700  AUW  78  56  89    56  100
2  3   400  TOC  78  56  89    56   10
3  4   389  RMN  78  56  89    56   NA
4  5   400  LNM  78  56  NA    56  100
5  6   200  UTC  78  NA  89    56  100
6  7   100  GAT  78  56   8    56   NA
7  8    79 <NA>  78  56   9    56  100
8  9    89  TCG  78  56 800    56   NA
9 10    NA  LNM  78  56  89    56  100


On Feb 12, 2008 9:30 AM,  <jessica.gervais at tudor.lu>
wrote:>
> Dear all,
>
> I am working with a csv file.
> Some data of the file are not valid and they are marked with a star
'*'.
> For example : *789.
>
> I have attached with this email a example file (test.txt) that looks like
> the data I have to work with.
>
>
> I see 2 possibilities ..thast I cannot manage anyway in R:
>
> 1-first & easiest solution:
> Read the data with read.csv in R, and define as na strings all cells
> containing a star (*).
> Something which would looks like this ...
>
> >
>
DATA<-read.csv("test.txt",na.strings=list(length(grep("\\*",DATA,value=T))==0))
>
> > DATA
>  X1 X.789 LNM. X78 X56  X89 X56.1 X100
> 1  2   700  AUW  78  56   89    56  100
> 2  3   400  TOC  78  56   89    56   10
> 3  4   389  RMN  78  56   89    56  *89
> 4  5   400  LNM  78  56 *452    56  100
> 5  6   200  UTC  78 *40   89    56  100
> 6  7   100  GAT  78  56    8    56 *100
> 7  8    79 *LNM  78  56    9    56  100
> 8  9    89  TCG  78  56  800    56 *100
> 9 10   78*  LNM  78  56   89    56  100
>
>
> ...but which would work (Stars are still there)! Do anyone knows how to do
> that ?
>
> 2-Second solution:
> - first read the file with DATA<-read.csv("test.txt")
> - then replace all fields containing a * with NA in applying the following
> function to the object DATA:
>
DATA_cleaned<-apply(DATA,c(1,2),function(x){if(length(grep("\\*",x,value=TRUE))==1){x<-NA}})
>  DATA_cleaned
>      X1   X.789 LNM. X78  X56  X89  X56.1 X100
>  [1,] NULL NULL  NULL NULL NULL NULL NULL  NULL
>  [2,] NULL NULL  NULL NULL NULL NULL NULL  NULL
>  [3,] NULL NULL  NULL NULL NULL NULL NULL  NA
>  [4,] NULL NULL  NULL NULL NULL NA   NULL  NULL
>  [5,] NULL NULL  NULL NULL NA   NULL NULL  NULL
>  [6,] NULL NULL  NULL NULL NULL NULL NULL  NA
>  [7,] NULL NULL  NA   NULL NULL NULL NULL  NULL
>  [8,] NULL NULL  NULL NULL NULL NULL NULL  NA
>  [9,] NULL NA    NULL NULL NULL NULL NULL  NULL
>
> stars have deaseper, but all the rest too !
> The pb comes from the fact that if a field does not contain any *, the
> command
> if(length(grep("\\*",x,value=T))==1) return NULL instead of FALSE
!
>
> I you have any idea, please let me know !
>
> Many thanks,
>
> Jessica
> ____________________________________
>
> Jessica Gervais
> Mail: jessica.gervais at tudor.lu
>
> Resource Centre for Environmental Technologies,
> Public Research Centre Henri Tudor,
> Technoport Schlassgoart,
> 66 rue de Luxembourg,
> P.O. BOX 144,
> L-4002 Esch-sur-Alzette, Luxembourg
>
> (See attached file: test.txt)
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Reasonably Related Threads

Search for more apparently analagous threads

R help - Feb 2008 - regular expression for na.strings / read.table

[R] regular expression for na.strings / read.table

[R] regular expression for na.strings / read.table

[R] regular expression for na.strings / read.table

[R] regular expression for na.strings / read.table

Reasonably Related Threads