Dear all, I've come across a problem using strptime, can anyone explain what's going on? I'm using version 2.7.0 on Windows XP. Thank you Caroline First read in a data file using read.table alldata = read.table(file, header=F, skip=4, colClasses c("character","numeric")) dim(alldata) [1] 223960 2 # inefficient, safe way of sorting out missing or dodgy data alldata[,2][alldata[,2] < 0] = NA # first ten lines of the data alldata[1:10,] V1 V2 1 19800604062759 NA 2 19800604062800 0.271 3 19800604111900 0.286 4 19800604134300 0.362 5 19800604144400 0.465 6 19800604163300 0.510 7 19800604175400 0.518 8 19800604185100 0.526 9 19800611110900 NA 10 19800611110959 NA #Then convert the first column using strptime datetimes = strptime(alldata[,1],format="%Y%m%d%H%M%S") #Then I want to get minimum and maximum, but some seem to be missing when they aren't. length(as.POSIXlt(datetimes)) #also equal to length(datetimes) [1] 9 # Why isn't this 223960? Is it something to do with the class? # This is the really puzzling bit (to me anyway) a =(1:223960)[is.na(datetimes)] # which gives 1462 14295 18744 50499 50500 92472 92473 92474 92475 92476 137525 137526 137527 171066 171067 192353 # 16 values alldata[a,] V1 V2 1462 19810329012000 0.983 14295 19900325014300 0.219 18744 19920329014300 0.246 50499 19960331013000 0.564 50500 19960331015700 0.563 92472 19970330010200 0.173 92473 19970330011400 0.172 92474 19970330012700 0.172 92475 19970330014400 0.172 92476 19970330015500 0.172 137525 19980329011600 0.427 137526 19980329014100 0.427 137527 19980329015600 0.427 171066 19990328010300 0.223 171067 19990328011800 0.223 192353 20000326012800 0.189 datetimes[a] [1] "1981-03-29 01:20:00" "1990-03-25 01:43:00" "1992-03-29 01:43:00" "1996-03-31 01:30:00" "1996-03-31 01:57:00" [6] "1997-03-30 01:02:00" "1997-03-30 01:14:00" "1997-03-30 01:27:00" "1997-03-30 01:44:00" "1997-03-30 01:55:00" [11] "1998-03-29 01:16:00" "1998-03-29 01:41:00" "1998-03-29 01:56:00" "1999-03-28 01:03:00" "1999-03-28 01:18:00" [16] "2000-03-26 01:28:00" # They're all around the end of March! I've looked at the data file and I can't see anything funny in it around these dates. The first few lines of the data file look like #TZUTC+0|*|SANR08002|*|SNAMENAUL|*|SWATERDELVIN|*|CNR98808|*| #CNAMEQ|*|CTYPEn-min-ip|*|CMW1440|*|RTIMELVLhigh-resolution|*| #CUNITm3/s|*|RINVAL-777|*|RNR-1|*|REXCHANGE98913|*| #RTYPEinstantaneous values|*| 19800604062759 -777.0 19800604062800 0.271 19800604111900 0.286 19800604134300 0.362 19800604144400 0.465 19800604163300 0.510 19800604175400 0.518 19800604185100 0.526 19800611110900 -777.0 19800611110959 -777.0 19800611111000 0.100 19800611211400 0.096 19800612000000 0.096 19800612065000 0.098 19800612133400 0.100 Caroline KeefJBA Consulting South Barn, Broughton Hall, Skipton, North Yorkshire, BD23 3AE, UK t: +44 (0)1756 799919 f: +44 (0)1756 799449 JBA Consulting now incorporates Maslen Environmental, the award winning environmental regeneration consultancy. http://www.maslen-environmental.com. JBA is a Carbon Neutral Company. Please don't print this e-mail unless you really need to. This email is covered by JBA Consulting's email disclaimer at www.jbaconsulting.co.uk/emaildisclaimer.
You probably want POSIXct instead of POSIXlt: x <- read.table(textConnection("#TZUTC+0|*|SANR08002|*|SNAMENAUL|*|SWATERDELVIN|*|CNR98808|*| + #CNAMEQ|*|CTYPEn-min-ip|*|CMW1440|*|RTIMELVLhigh-resolution|*| + #CUNITm3/s|*|RINVAL-777|*|RNR-1|*|REXCHANGE98913|*| + #RTYPEinstantaneous values|*| + 19800604062759 -777.0 + 19800604062800 0.271 + 19800604111900 0.286 + 19800604134300 0.362 + 19800604144400 0.465 + 19800604163300 0.510 + 19800604175400 0.518 + 19800604185100 0.526 + 19800611110900 -777.0 + 19800611110959 -777.0 + 19800611111000 0.100 + 19800611211400 0.096 + 19800612000000 0.096 + 19800612065000 0.098 + 19800612133400 0.100"),colClasses=c('character','numeric'))> closeAllConnections() > # you probably want POSIXct not POSIXlt > datetimes <- as.POSIXct(strptime(x[,1], "%Y%m%d%H%M%S")) > str(datetimes)POSIXct[1:15], format: "1980-06-04 06:27:59" "1980-06-04 06:28:00" "1980-06-04 11:19:00" ...> length(datetimes)[1] 15>On Wed, Jul 9, 2008 at 6:09 AM, Caroline Keef <caroline.keef at jbaconsulting.co.uk> wrote:> Dear all, > > I've come across a problem using strptime, can anyone explain what's > going on? I'm using version 2.7.0 on Windows XP. > > Thank you > > Caroline > > First read in a data file using read.table > > alldata = read.table(file, header=F, skip=4, colClasses > c("character","numeric")) > > dim(alldata) > [1] 223960 2 > > # inefficient, safe way of sorting out missing or dodgy data > > alldata[,2][alldata[,2] < 0] = NA > > # first ten lines of the data > > alldata[1:10,] > V1 V2 > 1 19800604062759 NA > 2 19800604062800 0.271 > 3 19800604111900 0.286 > 4 19800604134300 0.362 > 5 19800604144400 0.465 > 6 19800604163300 0.510 > 7 19800604175400 0.518 > 8 19800604185100 0.526 > 9 19800611110900 NA > 10 19800611110959 NA > > #Then convert the first column using strptime > > datetimes = strptime(alldata[,1],format="%Y%m%d%H%M%S") > > #Then I want to get minimum and maximum, but some seem to be missing > when they aren't. > > length(as.POSIXlt(datetimes)) #also equal to length(datetimes) > > [1] 9 > > # Why isn't this 223960? Is it something to do with the class? > > # This is the really puzzling bit (to me anyway) > > a =(1:223960)[is.na(datetimes)] > > # which gives > 1462 14295 18744 50499 50500 92472 92473 92474 92475 92476 > 137525 137526 137527 171066 171067 192353 > > # 16 values > > alldata[a,] > V1 V2 > 1462 19810329012000 0.983 > 14295 19900325014300 0.219 > 18744 19920329014300 0.246 > 50499 19960331013000 0.564 > 50500 19960331015700 0.563 > 92472 19970330010200 0.173 > 92473 19970330011400 0.172 > 92474 19970330012700 0.172 > 92475 19970330014400 0.172 > 92476 19970330015500 0.172 > 137525 19980329011600 0.427 > 137526 19980329014100 0.427 > 137527 19980329015600 0.427 > 171066 19990328010300 0.223 > 171067 19990328011800 0.223 > 192353 20000326012800 0.189 > > datetimes[a] > [1] "1981-03-29 01:20:00" "1990-03-25 01:43:00" "1992-03-29 01:43:00" > "1996-03-31 01:30:00" "1996-03-31 01:57:00" [6] "1997-03-30 01:02:00" > "1997-03-30 01:14:00" "1997-03-30 01:27:00" "1997-03-30 01:44:00" > "1997-03-30 01:55:00" [11] "1998-03-29 01:16:00" "1998-03-29 01:41:00" > "1998-03-29 01:56:00" "1999-03-28 01:03:00" "1999-03-28 01:18:00" [16] > "2000-03-26 01:28:00" > > # They're all around the end of March! I've looked at the data file and > I can't see anything funny in it around these dates. > > > > The first few lines of the data file look like > > #TZUTC+0|*|SANR08002|*|SNAMENAUL|*|SWATERDELVIN|*|CNR98808|*| > #CNAMEQ|*|CTYPEn-min-ip|*|CMW1440|*|RTIMELVLhigh-resolution|*| > #CUNITm3/s|*|RINVAL-777|*|RNR-1|*|REXCHANGE98913|*| > #RTYPEinstantaneous values|*| > 19800604062759 -777.0 > 19800604062800 0.271 > 19800604111900 0.286 > 19800604134300 0.362 > 19800604144400 0.465 > 19800604163300 0.510 > 19800604175400 0.518 > 19800604185100 0.526 > 19800611110900 -777.0 > 19800611110959 -777.0 > 19800611111000 0.100 > 19800611211400 0.096 > 19800612000000 0.096 > 19800612065000 0.098 > 19800612133400 0.100 > > > > > > Caroline KeefJBA Consulting > South Barn, Broughton Hall, Skipton, North Yorkshire, BD23 3AE, UK > t: +44 (0)1756 799919 f: +44 (0)1756 799449 > > JBA Consulting now incorporates Maslen Environmental, the award winning environmental regeneration consultancy. http://www.maslen-environmental.com. > > JBA is a Carbon Neutral Company. Please don't print this e-mail unless you really need to. > > This email is covered by JBA Consulting's email disclaimer at www.jbaconsulting.co.uk/emaildisclaimer. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Hi Caroline, Because POSIXlt is a complicated structure: you are dealing with a list, not with what you think you are. Maybe this will help you to see more clearly. strptime(19800604062759, format="%Y%m%d%H%M%S") [1] "1980-06-04 06:27:59" str(strptime(19800604062759, format="%Y%m%d%H%M%S")) POSIXlt[1:9], format: "1980-06-04 06:27:59" ## length == 9 strptime(c(19800604062759,19800604062800), format="%Y%m%d%H%M%S") [1] "1980-06-04 06:27:59" "1980-06-04 06:28:00" str(strptime(c(19800604062759,19800604062800), format="%Y%m%d%H%M%S")) POSIXlt[1:9], format: "1980-06-04 06:27:59" "1980-06-04 06:28:00" ## length == 9 typeof(strptime(c(19800604062759,19800604062800), format="%Y%m%d%H%M%S")) [1] "list" length(unlist(strptime(c(19800604062759,19800604062800), format="%Y%m%d%H%M%S"))) [1] 18 ## 9 * 2 == 18 HTH you on your way, Mark. Caroline Keef wrote:> > Dear all, > > I've come across a problem using strptime, can anyone explain what's > going on? I'm using version 2.7.0 on Windows XP. > > Thank you > > Caroline > > First read in a data file using read.table > > alldata = read.table(file, header=F, skip=4, colClasses > c("character","numeric")) > > dim(alldata) > [1] 223960 2 > > # inefficient, safe way of sorting out missing or dodgy data > > alldata[,2][alldata[,2] < 0] = NA > > # first ten lines of the data > > alldata[1:10,] > V1 V2 > 1 19800604062759 NA > 2 19800604062800 0.271 > 3 19800604111900 0.286 > 4 19800604134300 0.362 > 5 19800604144400 0.465 > 6 19800604163300 0.510 > 7 19800604175400 0.518 > 8 19800604185100 0.526 > 9 19800611110900 NA > 10 19800611110959 NA > > #Then convert the first column using strptime > > datetimes = strptime(alldata[,1],format="%Y%m%d%H%M%S") > > #Then I want to get minimum and maximum, but some seem to be missing > when they aren't. > > length(as.POSIXlt(datetimes)) #also equal to length(datetimes) > > [1] 9 > > # Why isn't this 223960? Is it something to do with the class? > > # This is the really puzzling bit (to me anyway) > > a =(1:223960)[is.na(datetimes)] > > # which gives > 1462 14295 18744 50499 50500 92472 92473 92474 92475 92476 > 137525 137526 137527 171066 171067 192353 > > # 16 values > > alldata[a,] > V1 V2 > 1462 19810329012000 0.983 > 14295 19900325014300 0.219 > 18744 19920329014300 0.246 > 50499 19960331013000 0.564 > 50500 19960331015700 0.563 > 92472 19970330010200 0.173 > 92473 19970330011400 0.172 > 92474 19970330012700 0.172 > 92475 19970330014400 0.172 > 92476 19970330015500 0.172 > 137525 19980329011600 0.427 > 137526 19980329014100 0.427 > 137527 19980329015600 0.427 > 171066 19990328010300 0.223 > 171067 19990328011800 0.223 > 192353 20000326012800 0.189 > > datetimes[a] > [1] "1981-03-29 01:20:00" "1990-03-25 01:43:00" "1992-03-29 01:43:00" > "1996-03-31 01:30:00" "1996-03-31 01:57:00" [6] "1997-03-30 01:02:00" > "1997-03-30 01:14:00" "1997-03-30 01:27:00" "1997-03-30 01:44:00" > "1997-03-30 01:55:00" [11] "1998-03-29 01:16:00" "1998-03-29 01:41:00" > "1998-03-29 01:56:00" "1999-03-28 01:03:00" "1999-03-28 01:18:00" [16] > "2000-03-26 01:28:00" > > # They're all around the end of March! I've looked at the data file and > I can't see anything funny in it around these dates. > > > > The first few lines of the data file look like > > #TZUTC+0|*|SANR08002|*|SNAMENAUL|*|SWATERDELVIN|*|CNR98808|*| > #CNAMEQ|*|CTYPEn-min-ip|*|CMW1440|*|RTIMELVLhigh-resolution|*| > #CUNITm3/s|*|RINVAL-777|*|RNR-1|*|REXCHANGE98913|*| > #RTYPEinstantaneous values|*| > 19800604062759 -777.0 > 19800604062800 0.271 > 19800604111900 0.286 > 19800604134300 0.362 > 19800604144400 0.465 > 19800604163300 0.510 > 19800604175400 0.518 > 19800604185100 0.526 > 19800611110900 -777.0 > 19800611110959 -777.0 > 19800611111000 0.100 > 19800611211400 0.096 > 19800612000000 0.096 > 19800612065000 0.098 > 19800612133400 0.100 > > > > > > Caroline KeefJBA Consulting > South Barn, Broughton Hall, Skipton, North Yorkshire, BD23 3AE, UK > t: +44 (0)1756 799919 f: +44 (0)1756 799449 > > JBA Consulting now incorporates Maslen Environmental, the award winning > environmental regeneration consultancy. > http://www.maslen-environmental.com. > > JBA is a Carbon Neutral Company. Please don't print this e-mail unless you > really need to. > > This email is covered by JBA Consulting's email disclaimer at > www.jbaconsulting.co.uk/emaildisclaimer. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/Strptime--date-time-classes-tp18362221p18365531.html Sent from the R help mailing list archive at Nabble.com.
Suggest you read the relevant article in R News 4/1. Also the zoo package and read.zoo, in particular, might help. On Wed, Jul 9, 2008 at 6:09 AM, Caroline Keef <caroline.keef at jbaconsulting.co.uk> wrote:> Dear all, > > I've come across a problem using strptime, can anyone explain what's > going on? I'm using version 2.7.0 on Windows XP. > > Thank you > > Caroline > > First read in a data file using read.table > > alldata = read.table(file, header=F, skip=4, colClasses > c("character","numeric")) > > dim(alldata) > [1] 223960 2 > > # inefficient, safe way of sorting out missing or dodgy data > > alldata[,2][alldata[,2] < 0] = NA > > # first ten lines of the data > > alldata[1:10,] > V1 V2 > 1 19800604062759 NA > 2 19800604062800 0.271 > 3 19800604111900 0.286 > 4 19800604134300 0.362 > 5 19800604144400 0.465 > 6 19800604163300 0.510 > 7 19800604175400 0.518 > 8 19800604185100 0.526 > 9 19800611110900 NA > 10 19800611110959 NA > > #Then convert the first column using strptime > > datetimes = strptime(alldata[,1],format="%Y%m%d%H%M%S") > > #Then I want to get minimum and maximum, but some seem to be missing > when they aren't. > > length(as.POSIXlt(datetimes)) #also equal to length(datetimes) > > [1] 9 > > # Why isn't this 223960? Is it something to do with the class? > > # This is the really puzzling bit (to me anyway) > > a =(1:223960)[is.na(datetimes)] > > # which gives > 1462 14295 18744 50499 50500 92472 92473 92474 92475 92476 > 137525 137526 137527 171066 171067 192353 > > # 16 values > > alldata[a,] > V1 V2 > 1462 19810329012000 0.983 > 14295 19900325014300 0.219 > 18744 19920329014300 0.246 > 50499 19960331013000 0.564 > 50500 19960331015700 0.563 > 92472 19970330010200 0.173 > 92473 19970330011400 0.172 > 92474 19970330012700 0.172 > 92475 19970330014400 0.172 > 92476 19970330015500 0.172 > 137525 19980329011600 0.427 > 137526 19980329014100 0.427 > 137527 19980329015600 0.427 > 171066 19990328010300 0.223 > 171067 19990328011800 0.223 > 192353 20000326012800 0.189 > > datetimes[a] > [1] "1981-03-29 01:20:00" "1990-03-25 01:43:00" "1992-03-29 01:43:00" > "1996-03-31 01:30:00" "1996-03-31 01:57:00" [6] "1997-03-30 01:02:00" > "1997-03-30 01:14:00" "1997-03-30 01:27:00" "1997-03-30 01:44:00" > "1997-03-30 01:55:00" [11] "1998-03-29 01:16:00" "1998-03-29 01:41:00" > "1998-03-29 01:56:00" "1999-03-28 01:03:00" "1999-03-28 01:18:00" [16] > "2000-03-26 01:28:00" > > # They're all around the end of March! I've looked at the data file and > I can't see anything funny in it around these dates. > > > > The first few lines of the data file look like > > #TZUTC+0|*|SANR08002|*|SNAMENAUL|*|SWATERDELVIN|*|CNR98808|*| > #CNAMEQ|*|CTYPEn-min-ip|*|CMW1440|*|RTIMELVLhigh-resolution|*| > #CUNITm3/s|*|RINVAL-777|*|RNR-1|*|REXCHANGE98913|*| > #RTYPEinstantaneous values|*| > 19800604062759 -777.0 > 19800604062800 0.271 > 19800604111900 0.286 > 19800604134300 0.362 > 19800604144400 0.465 > 19800604163300 0.510 > 19800604175400 0.518 > 19800604185100 0.526 > 19800611110900 -777.0 > 19800611110959 -777.0 > 19800611111000 0.100 > 19800611211400 0.096 > 19800612000000 0.096 > 19800612065000 0.098 > 19800612133400 0.100 > > > > > > Caroline KeefJBA Consulting > South Barn, Broughton Hall, Skipton, North Yorkshire, BD23 3AE, UK > t: +44 (0)1756 799919 f: +44 (0)1756 799449 > > JBA Consulting now incorporates Maslen Environmental, the award winning environmental regeneration consultancy. http://www.maslen-environmental.com. > > JBA is a Carbon Neutral Company. Please don't print this e-mail unless you really need to. > > This email is covered by JBA Consulting's email disclaimer at www.jbaconsulting.co.uk/emaildisclaimer. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
is.na(strptime("19810329012000",format="%Y%m%d%H%M%S")) [1] TRUE The problem was to do with daylight saving time. I need to specify a time zone as this time doesn't exist in my operating system's current time zone. I still think this is odd behaviour though! When you look at the missing object it doesn't look missing at all. Caroline _________________________________________________________________ Find the best and worst places on the planet [[alternative HTML version deleted]]