Leeds, Mark (IED)
2006-Nov-21 09:22 UTC
[R] Is there any way to know when a field is blank
I have many text files in the format below and in certain rare instances such as below there can be nothing in one of the fields so a double comma is written but I won't know this because I am reading in many,many files sequentially. # TEXT FILE 2004-02-10 00:01:31.00000,,105.60000000 2004-02-10 00:01:32.00001,,105.60000000 2004-02-10 00:01:45.00000,,105.60000000 2004-02-10 00:01:49.00000,,105.61000000 2004-02-10 00:02:08.00000,,105.60000000 2004-02-10 00:02:15.00000,,105.60000000 2004-02-10 00:02:23.00000,,105.60000000 2004-02-10 00:02:41.00000,,105.60000000 2004-02-10 00:03:09.00000,,105.59000000 2004-02-10 00:03:16.00000,,105.60000000 2004-02-10 00:03:19.00000,,105.59000000 2004-02-10 00:03:25.00000,,105.60000000 2004-02-10 00:03:39.00000,,105.59000000 2004-02-10 00:03:52.00000,,105.60000000 2004-02-10 00:03:54.00000,,105.60000000 # LINES OF CODE fxdata<-read.zoo(file=fxfile,FUN=as.POSIXct,sep=",",col.names=c("date"," bid","ask")) fxdata<-fxdata[( fxdata[,"bid"] > 0.0 ) & ( fxdata[,"ask"] > 0.0 ),] aggfxdata<-as.zoo(aggregatebyminutes(zooobj=fxdata,aggtimeframe=aggtimef rame)) #========================================================================================= Even with the double comma being there, the fxdata<-read.zoo line and the fxdata<-fxdata line still work but then on the aggfxdata<-as.zoo line , I get the error : "Error in rep.int(seq(1:d[i]), prod(d[seq(length = i - 1)]) * rep.int(1, : invalid number of copies in rep()" This error is reasonable because the routines, aggregatebyminutes, probably has a problem with nothing being in the bid field. My question is if there is some way tha I can know that nothing is in the bid field so that I can skip this file altogether and go onto the next one ? I'm not showing the details of the function because I'm not interested in the error. I am only interested in knowing that the "bid" field does not exist. I ask only because I am unsure how often this double comma/missing field scenario can happen so it would be better to automate the skipping of the file. Thanks. -------------------------------------------------------- This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
Leeds, Mark (IED) wrote:> I have many text files in the format below and in certain rare instances > such as below there can be nothing in one of the fields so > a double comma is written but I won't know this because I am reading in > many,many files sequentially. > > # TEXT FILE > > 2004-02-10 00:01:31.00000,,105.60000000 > 2004-02-10 00:01:32.00001,,105.60000000 > 2004-02-10 00:01:45.00000,,105.60000000 > 2004-02-10 00:01:49.00000,,105.61000000 > 2004-02-10 00:02:08.00000,,105.60000000 > 2004-02-10 00:02:15.00000,,105.60000000 > 2004-02-10 00:02:23.00000,,105.60000000 > 2004-02-10 00:02:41.00000,,105.60000000 > 2004-02-10 00:03:09.00000,,105.59000000 > 2004-02-10 00:03:16.00000,,105.60000000 > 2004-02-10 00:03:19.00000,,105.59000000 > 2004-02-10 00:03:25.00000,,105.60000000 > 2004-02-10 00:03:39.00000,,105.59000000 > 2004-02-10 00:03:52.00000,,105.60000000 > 2004-02-10 00:03:54.00000,,105.60000000 > > # LINES OF CODE > > fxdata<-read.zoo(file=fxfile,FUN=as.POSIXct,sep=",",col.names=c("date"," > bid","ask")) > fxdata<-fxdata[( fxdata[,"bid"] > 0.0 ) & ( fxdata[,"ask"] > 0.0 ),] > aggfxdata<-as.zoo(aggregatebyminutes(zooobj=fxdata,aggtimeframe=aggtimef > rame)) > > #======================================================================> ===================> > Even with the double comma being there, the fxdata<-read.zoo line and > the fxdata<-fxdata line still work but then on > the aggfxdata<-as.zoo line , I get the error : > > "Error in rep.int(seq(1:d[i]), prod(d[seq(length = i - 1)]) * rep.int(1, > : > invalid number of copies in rep()" > > This error is reasonable because the routines, aggregatebyminutes, > probably has a problem with nothing > being in the bid field. My question is if there is some way tha I can > know that nothing > is in the bid field so that I can skip this file altogether and go onto > the next one ? > I'm not showing the details of the function because I'm not interested > in the error. I am only interested in knowing > that the "bid" field does not exist. > > I ask only because I am unsure how often this double comma/missing field > scenario can happen so it would > be better to automate the skipping of the file.You could avoid the as.zoo() part when dim(fxdata)[1] is equal to zero with something like this: library(zoo) fxdata <- read.zoo(file="fxfile", FUN=as.POSIXct, sep=",", col.names=c("date","bid","ask")) fxdata <- fxdata[(fxdata[,"bid"] > 0.0) & (fxdata[,"ask"] > 0.0),] if(dim(fxdata)[1] == 0) cat("\n All missing! \n") else{ aggfxdata <- as.zoo(aggregatebyminutes(zooobj=fxdata, aggtimeframe=aggtimeframe)) } I'm not sure how to know without reading the file whether you have this problem, but once you have read it and know that dim(fxdata)[1] =0, you can remove fxdata with rm().> Thanks. > -------------------------------------------------------- > > This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894