On 2020-07-18 18:09 +0100, Rui Barradas wrote: | ?s 17:59 de 18/07/2020, H escreveu: | | On Fri, Jul 17, 2020 at 6:28 PM H <agents at meddatainc.com> wrote: | | | | | | The problem I am having is that | | | the csv files have header rows | | | with column names that are | | | slightly different from the column | | | names I have assigned in the | | | dataframe and it seems that when I | | | read the csv data into the | | | dataframe, the column names from | | | the csv file replace the column | | | names I chose when creating the | | | dataframe. | | | | | | A secondary issue is that the csv | | | files have a column with a date in | | | mm/dd/yyyy format that I would | | | like to make into a Date type | | | column in my dataframe. Again, I | | | have been unable to find a way - | | | if at all possible - to force a | | | conversion into a Date format when | | | importing into the dataframe. The | | | best I have so far is to import is | | | a character column and then use | | | as.Date() to later force the | | | conversion of the dataframe | | | column. | | | | The documentation for read.csv.sql() | | suggests that colClasses() and/or | | field.types() should work but I may | | well have misunderstood the | | documentation, hence my question in | | this group. | | As for colClasses, those are R class | names. Ok Mister H, I might have hit the nail on the head this time with this badass example for your usecase: # Make a csv with %d/%m/%Y dates in it ... Lines <- "STM05-1 2005/02/28 17:35 Good -35.562 177.158 STM05-1 2005/02/28 19:44 Good -35.487 177.129 STM05-1 2005/02/28 23:01 Unknown -35.399 177.064 STM05-1 2005/03/01 07:28 Unknown -34.978 177.268 STM05-1 2005/03/01 18:06 Poor -34.799 177.027 STM05-1 2005/03/01 18:47 Poor -34.85 177.059 STM05-2 2005/02/28 12:49 Good -35.928 177.328 STM05-2 2005/02/28 21:23 Poor -35.926 177.314 " DF <- read.table(textConnection(Lines), as.is = TRUE, col.names = c("Id", "Date", "Time", "Quality", "Lat", "Long")) DF$Date <- format(as.Date(DF$Date, "%Y/%m/%d"), "%d/%m/%Y") write.csv(DF, file="df.csv", row.names=FALSE) colClasses <- c("character", "Date", "character", "character", "numeric", "numeric") sql <- paste0( "select ", "date(", # [2] "substr(Date, 8, 4) || '-' || ", # [1] "substr(Date, 5, 2) || '-' || ", "substr(Date, 2, 2)), Long, Lat, Quality ", "from ff where Quality like '%oo%' and Long>177.129") ff <- file(description="df.csv", open="r") dat <- sqldf::read.csv.sql( sql=sql, colClasses=colClasses) close(ff) str(dat) as.Date(dat[,1]) dat[,3] Both sqlite and Postgres has a function substr you can call on strings like this.[5] I have a hunch this has always been possible in sql from way back ... The warning from sqldf about unused connections, might suggest file descriptor handling to be a bit crusty ... [3] The thing is, defining the second column as of type Date in colClasses happens to work, but it's still character when you check with str(dat) ... perhaps it has something to do with this info from [4]: as_tibble_row() converts a vector to a tibble with one row. The input must be a bare vector, e.g. vectors of dates are not supported yet. If the input is a list, all elements must have length one. [1] https://stackoverflow.com/questions/15563656/convert-string-to-date-in-sqlite [2] https://www.sqlite.org/lang_datefunc.html [3] https://groups.google.com/forum/#!topic/sqldf/mcQ_K_E--q8 [4] https://tibble.tidyverse.org/reference/as_tibble.html [5] https://www.sqlite.org/lang_corefunc.html#substr, https://www.postgresql.org/docs/9.1/functions-string.html, http://www.h2database.com/html/functions.html#substring -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200719/f6c63baa/attachment.sig>
On 07/18/2020 11:42 PM, Rasmus Liland wrote:> On 2020-07-18 18:09 +0100, Rui Barradas wrote: > | ?s 17:59 de 18/07/2020, H escreveu: > | | On Fri, Jul 17, 2020 at 6:28 PM H <agents at meddatainc.com> wrote: > | | | > | | | The problem I am having is that > | | | the csv files have header rows > | | | with column names that are > | | | slightly different from the column > | | | names I have assigned in the > | | | dataframe and it seems that when I > | | | read the csv data into the > | | | dataframe, the column names from > | | | the csv file replace the column > | | | names I chose when creating the > | | | dataframe. > | | | > | | | A secondary issue is that the csv > | | | files have a column with a date in > | | | mm/dd/yyyy format that I would > | | | like to make into a Date type > | | | column in my dataframe. Again, I > | | | have been unable to find a way - > | | | if at all possible - to force a > | | | conversion into a Date format when > | | | importing into the dataframe. The > | | | best I have so far is to import is > | | | a character column and then use > | | | as.Date() to later force the > | | | conversion of the dataframe > | | | column. > | | > | | The documentation for read.csv.sql() > | | suggests that colClasses() and/or > | | field.types() should work but I may > | | well have misunderstood the > | | documentation, hence my question in > | | this group. > | > | As for colClasses, those are R class > | names. > > Ok Mister H, I might have hit the nail > on the head this time with this badass > example for your usecase: > > # Make a csv with %d/%m/%Y dates in it ... > Lines <- "STM05-1 2005/02/28 17:35 Good -35.562 177.158 > STM05-1 2005/02/28 19:44 Good -35.487 177.129 > STM05-1 2005/02/28 23:01 Unknown -35.399 177.064 > STM05-1 2005/03/01 07:28 Unknown -34.978 177.268 > STM05-1 2005/03/01 18:06 Poor -34.799 177.027 > STM05-1 2005/03/01 18:47 Poor -34.85 177.059 > STM05-2 2005/02/28 12:49 Good -35.928 177.328 > STM05-2 2005/02/28 21:23 Poor -35.926 177.314 > " > DF <- read.table(textConnection(Lines), as.is = TRUE, > col.names = c("Id", "Date", "Time", "Quality", "Lat", "Long")) > DF$Date <- format(as.Date(DF$Date, "%Y/%m/%d"), "%d/%m/%Y") > write.csv(DF, file="df.csv", row.names=FALSE) > > colClasses <- > c("character", > "Date", > "character", > "character", > "numeric", > "numeric") > sql <- paste0( > "select ", > "date(", # [2] > "substr(Date, 8, 4) || '-' || ", # [1] > "substr(Date, 5, 2) || '-' || ", > "substr(Date, 2, 2)), Long, Lat, Quality ", > "from ff where Quality like '%oo%' and Long>177.129") > ff <- file(description="df.csv", open="r") > dat <- sqldf::read.csv.sql( > sql=sql, colClasses=colClasses) > close(ff) > > str(dat) > > as.Date(dat[,1]) > dat[,3] > > Both sqlite and Postgres has a function > substr you can call on strings like > this.[5] I have a hunch this has always > been possible in sql from way back ... > > The warning from sqldf about unused > connections, might suggest file > descriptor handling to be a bit crusty > ... [3] > > The thing is, defining the second column > as of type Date in colClasses happens to > work, but it's still character when you > check with str(dat) ... perhaps it has > something to do with this info from [4]: > > as_tibble_row() converts a vector to > a tibble with one row. The input > must be a bare vector, e.g. vectors > of dates are not supported yet. If > the input is a list, all elements > must have length one. > > [1] https://stackoverflow.com/questions/15563656/convert-string-to-date-in-sqlite > [2] https://www.sqlite.org/lang_datefunc.html > [3] https://groups.google.com/forum/#!topic/sqldf/mcQ_K_E--q8 > [4] https://tibble.tidyverse.org/reference/as_tibble.html > [5] https://www.sqlite.org/lang_corefunc.html#substr, > https://www.postgresql.org/docs/9.1/functions-string.html, > http://www.h2database.com/html/functions.html#substring > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Thank you for your extensive example. However, I have decided to simply convert column types as necessary and rename columns as desired after importing the data since that seems the simplest solution. [[alternative HTML version deleted]]
On 2020-07-20 17:54 -0400, H wrote:> On 07/18/2020 11:42 PM, Rasmus Liland wrote: > > > > The thing is, defining the second column > > as of type Date in colClasses happens to > > work, but it's still character when you > > check with str(dat) ... perhaps it has > > something to do with this info from [4]: > > > > as_tibble_row() converts a vector to > > a tibble with one row. The input > > must be a bare vector, e.g. vectors > > of dates are not supported yet. If > > the input is a list, all elements > > must have length one. > > > > [4] https://tibble.tidyverse.org/reference/as_tibble.html > > Thank you for your extensive example. > However, I have decided to simply > convert column types as necessary and > rename columns as desired after > importing the data since that seems > the simplest solution.Dear H, Right, I am glad you figured this out. Please just elaborate (if you want to/are able) what solution/idea you were after so I, others, learn something for another time. I imagined you were sitting on tonnes of csv files and were going to handle dates on some very specific, rarely-occurring rows in there ... Best, Rasmus -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200721/8f6ed11a/attachment.sig>