Cristina Silva
2017-Jul-10 15:31 UTC
[R] Problems with time formats when importing data using readHTMLTable
Hi, I am extracting positions data from the marine traffic website. The table has a "Timestamp" column which, in the browser, appears with the format yyyy-mm-dd HH:MM (UTC), e.g. 2017-07-10 14:04 (UTC). When I import the table, the same date "2017-07-10 14:04 (UTC)" appears as "1499696500149969650021 minutes ago", This is the more recent date and time. Older records, as e.g. "2017-07-09 17:02 (UTC)" appear as e.g. "1499619726149961972621 hours, 59 minutes ago". I don't know how to convert these data to the time formats used in R (POSIXct). The script is very simple and worked before: library(XML) x <- readHTMLTable('url') where the 'url' is the link to the website with the specification of the vessel. I appreciate any help. Cristina -- Cristina Silva Divis?o de Modela??o e Gest?o de Recursos Pesqueiros Av. Dr. Alfredo Magalh?es Ramalho 1495-165 Lisboa @: csilva at ipma.pt <mailto:csilva at ipma.pt> #: +351 213027096 <phoneto:+351213027096> [[alternative HTML version deleted]]
Jeff Newmiller
2017-Jul-10 16:18 UTC
[R] Problems with time formats when importing data using readHTMLTable
Not reproducible. [1][2][3] If our answers don't seem to apply to your situation, it will likely be because you did not explain your question clearly. Not plain text. This is a plain text mailing list, and the best-case scenario when you let your email program send HTML is that what you saw is not what we see (worst case is your email is scrambled on our end). Have you read the documentation for the function you are using? In particular, what about the colClasses argument? If you don't let readHTMLTable guess what the format is (have it read in as character data) then you have a fighting chance to get it right yourself, e.g. as.POSIXct( "2017-07-10 14:04 (UTC)", format="%Y-%m-%d %H:%M (UTC)", tz="UTC" ) ----- [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example [2] http://adv-r.had.co.nz/Reproducibility.html [3] https://cran.r-project.org/web/packages/reprex/index.html -- Sent from my phone. Please excuse my brevity. On July 10, 2017 8:31:30 AM PDT, Cristina Silva <csilva at ipma.pt> wrote:>Hi, > >I am extracting positions data from the marine traffic website. The >table has a "Timestamp" column which, in the browser, appears with the >format yyyy-mm-dd HH:MM (UTC), e.g. 2017-07-10 14:04 (UTC). > >When I import the table, the same date "2017-07-10 14:04 (UTC)" appears > >as "1499696500149969650021 minutes ago", This is the more recent date >and time. Older records, as e.g. "2017-07-09 17:02 (UTC)" appear as >e.g. >"1499619726149961972621 hours, 59 minutes ago". > >I don't know how to convert these data to the time formats used in R >(POSIXct). > >The script is very simple and worked before: > >library(XML) >x <- readHTMLTable('url') > >where the 'url' is the link to the website with the specification of >the >vessel. > >I appreciate any help. > >Cristina > >-- >Cristina Silva >Divis?o de Modela??o e Gest?o de Recursos Pesqueiros >Av. Dr. Alfredo Magalh?es Ramalho >1495-165 Lisboa >@: csilva at ipma.pt <mailto:csilva at ipma.pt> >#: +351 213027096 <phoneto:+351213027096> > > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Cristina Silva
2017-Jul-11 13:25 UTC
[R] Problems with time formats when importing data using readHTMLTable
Dear Jeff, I am sorry, I didn't notice that it was not plain text. I hope that it is now in the correct format. I explain the problem again, now with more detais. I am collecting the track positions of our research vessel from www.marinetraffic.com. In the page, the data appear in a table: Timestamp Source Speed (kn) Latitude (?) Longitude (?) Course (?) Show on Map 2017-07-11 10:57 (UTC) Terr-AIS 8.6 37.36228 -9.176811 200 2017-07-11 10:54 (UTC) Terr-AIS 8.4 37.36875 -9.174048 200 2017-07-11 09:50 (UTC) Terr-AIS 8 37.51499 -9.184502 149 2017-07-11 09:08 (UTC) Terr-AIS 7.7 37.60513 -9.228263 169 2017-07-11 09:05 (UTC) Terr-AIS 8 37.61168 -9.229627 167 2017-07-11 09:03 (UTC) Terr-AIS 6.9 37.61626 -9.230456 132 2017-07-11 08:59 (UTC) Terr-AIS 3.4 37.61758 -9.237928 101 2017-07-11 08:57 (UTC) Terr-AIS 3.6 37.61808 -9.240235 116 2017-07-11 08:47 (UTC) Terr-AIS 3.3 37.62267 -9.249762 112 2017-07-11 08:44 (UTC) Terr-AIS 3.2 37.6241 -9.25348 115 2017-07-11 08:40 (UTC) Terr-AIS 3.3 37.62557 -9.257295 112 2017-07-11 08:32 (UTC) Terr-AIS 3.4 37.62903 -9.266028 111 2017-07-11 08:28 (UTC) Terr-AIS 3.3 37.63051 -9.269725 114 2017-07-11 08:25 (UTC) Terr-AIS 3.3 37.63207 -9.273642 120 2017-07-11 08:20 (UTC) Terr-AIS 3.6 37.63395 -9.278254 118 2017-07-11 08:16 (UTC) Terr-AIS 3.3 37.63586 -9.282853 116 2017-07-11 08:08 (UTC) Terr-AIS 3.2 37.6392 -9.291157 118 2017-07-11 07:38 (UTC) Sat-AIS 2017-07-11 06:58 (UTC) Terr-AIS 3 37.68459 -9.321658 159 2017-07-11 06:53 (UTC) Terr-AIS 3.3 37.68881 -9.324172 150 To get this table I had to copy manually from the webpage, paste in a text file, delete information on figures, links, etc. and then import to R. This works well, I read the csv file and have no problems with date formats. But instead of 20 records, I have to import e.g. 500 records, i.e. 10 pages of 50 records each, or more. I want to import these data in a more automated and quick procedure. I have done this before with no problems, but probably the webpage changed some permissions. Here is the script used to get the data directly from the webpage into R, for a sample of 20 records (10 per page): library(XML) x <- list() for (i in 1:2) { x[i]<- readHTMLTable(paste('http://www.marinetraffic.com/en/ais/index/positions/all/shipid:318358/mmsi:263601000/shipname:NORUEGA/per_page:10/page:', i, sep='')) } ais <- do.call('rbind', x) ais <- ais[,-7] and I got the following table:> aisTimestamp Source Speed (kn) Latitude (??) Longitude (??) Course (??) 1 149977066014997706604 minutes ago Terr-AIS 8.6 37.36228 -9.176811 200 2 149977048714997704876 minutes ago Terr-AIS 8.4 37.36875 -9.174048 200 3 149976661414997666141 hour, 11 minutes ago Terr-AIS 8.0 37.51499 -9.184502 149 4 149976410714997641071 hour, 53 minutes ago Terr-AIS 7.7 37.60513 -9.228263 169 5 149976392714997639271 hour, 56 minutes ago Terr-AIS 8.0 37.61168 -9.229627 167 6 149976378014997637801 hour, 58 minutes ago Terr-AIS 6.9 37.61626 -9.230456 132 7 149976354014997635402 hours, 2 minutes ago Terr-AIS 3.4 37.61758 -9.237928 101 8 149976342014997634202 hours, 4 minutes ago Terr-AIS 3.6 37.61808 -9.240235 116 9 149976286114997628612 hours, 14 minutes ago Terr-AIS 3.3 37.62267 -9.249762 112 10 149976264714997626472 hours, 17 minutes ago Terr-AIS 3.2 37.6241 -9.25348 115 11 149976243014997624302 hours, 21 minutes ago Terr-AIS 3.3 37.62557 -9.257295 112 12 149976193714997619372 hours, 29 minutes ago Terr-AIS 3.4 37.62903 -9.266028 111 13 149976172814997617282 hours, 32 minutes ago Terr-AIS 3.3 37.63051 -9.269725 114 14 149976150714997615072 hours, 36 minutes ago Terr-AIS 3.3 37.63207 -9.273642 120 15 149976124714997612472 hours, 40 minutes ago Terr-AIS 3.6 37.63395 -9.278254 118 16 149976098714997609872 hours, 45 minutes ago Terr-AIS 3.3 37.63586 -9.282853 116 17 149976051014997605102 hours, 53 minutes ago Terr-AIS 3.2 37.6392 -9.291157 118 18 149975870414997587043 hours, 23 minutes ago Sat-AIS Add to SAT Fleet for undelayed data <NA> <NA> <NA> 19 149975631414997563144 hours, 3 minutes ago Terr-AIS 3.0 37.68459 -9.321658 159 20 149975598814997559884 hours, 8 minutes ago Terr-AIS 3.3 37.68881 -9.324172 150 The records correspond exactly to the first table. I have tried to change the colClasses to character, but the results are always the same. My question is: how to change the Timestamp to a POSIXlt POSIXct format, either in the importing phase or within R? I hope that the question is clear now. Cristina On Mon, 10 Jul 2017 09:18:05 -0700, Jeff Newmiller wrote:> Not reproducible. [1][2][3] If our answers don't seem to apply to > your situation, it will likely be because you did not explain your > question clearly. > > Not plain text. This is a plain text mailing list, and the best-case > scenario when you let your email program send HTML is that what you > saw is not what we see (worst case is your email is scrambled on our > end). > > Have you read the documentation for the function you are using? In > particular, what about the colClasses argument? If you don't let > readHTMLTable guess what the format is (have it read in as character > data) then you have a fighting chance to get it right yourself, e.g. > > as.POSIXct( "2017-07-10 14:04 (UTC)", format="%Y-%m-%d %H:%M (UTC)", > tz="UTC" ) > > ----- > > [1] > > http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example > > [2] http://adv-r.had.co.nz/Reproducibility.html > > [3] https://cran.r-project.org/web/packages/reprex/index.html > -- > Sent from my phone. Please excuse my brevity. > > On July 10, 2017 8:31:30 AM PDT, Cristina Silva <csilva at ipma.pt> > wrote: >>Hi, >> >>I am extracting positions data from the marine traffic website. The >>table has a "Timestamp" column which, in the browser, appears with >> the >>format yyyy-mm-dd HH:MM (UTC), e.g. 2017-07-10 14:04 (UTC). >> >>When I import the table, the same date "2017-07-10 14:04 (UTC)" >> appears >> >>as "1499696500149969650021 minutes ago", This is the more recent date >>and time. Older records, as e.g. "2017-07-09 17:02 (UTC)" appear as >>e.g. >>"1499619726149961972621 hours, 59 minutes ago". >> >>I don't know how to convert these data to the time formats used in R >>(POSIXct). >> >>The script is very simple and worked before: >> >>library(XML) >>x <- readHTMLTable('url') >> >>where the 'url' is the link to the website with the specification of >>the >>vessel. >> >>I appreciate any help. >> >>Cristina >> >>-- >>Cristina Silva >>Divis?o de Modela??o e Gest?o de Recursos Pesqueiros >>Av. Dr. Alfredo Magalh?es Ramalho >>1495-165 Lisboa >>@: csilva at ipma.pt <mailto:csilva at ipma.pt> >>#: +351 213027096 <phoneto:+351213027096> >> >> >> [[alternative HTML version deleted]] >> >>______________________________________________ >>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide >>http://www.R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code.-- Cristina Silva Divis?o de Modela??o e Gest?o de Recursos Pesqueiros Avenida de Bras?lia 1449-006 Lisboa @: csilva at ipma.pt #: +351 213027096
Apparently Analagous Threads
- Problems with time formats when importing data using readHTMLTable
- Problems with time formats when importing data using readHTMLTable
- Problems with time formats when importing data using readHTMLTable
- Problems with time formats when importing data using readHTMLTable
- Confidence intervals for predicted values in nls