On 07/18/2020 01:38 PM, William Michels wrote:> Do either of the postings/threads below help? > > https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534 > https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html > > Otherwise you can try reading through the FAQ on Github: > > https://github.com/ggrothendieck/sqldf > > HTH, Bill. > > W. Michels, Ph.D. > > > > On Sat, Jul 18, 2020 at 9:59 AM H <agents at meddatainc.com> wrote: >> On 07/18/2020 11:54 AM, Rui Barradas wrote: >>> Hello, >>> >>> I don't believe that what you are asking for is possible but like Bert suggested, you can do it after reading in the data. >>> You could write a convenience function to read the data, then change what you need to change. >>> Then the function would return this final object. >>> >>> Rui Barradas >>> >>> ?s 16:43 de 18/07/2020, H escreveu: >>> >>>> On 07/17/2020 09:49 PM, Bert Gunter wrote: >>>>> Is there some reason that you can't make the changes to the data frame (column names, as.date(), ...) *after* you have read all your data in? >>>>> >>>>> Do all your csv files use the same names and date formats? >>>>> >>>>> >>>>> Bert Gunter >>>>> >>>>> "The trouble with having an open mind is that people keep coming along and sticking things into it." >>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>>> >>>>> >>>>> On Fri, Jul 17, 2020 at 6:28 PM H <agents at meddatainc.com <mailto:agents at meddatainc.com>> wrote: >>>>> >>>>> I have created a dataframe with columns that are characters, integers and numeric and with column names assigned by me. I am using read.csv.sql() to read portions of a number of large csv files into this dataframe, each csv file having a header row with columb names. >>>>> >>>>> The problem I am having is that the csv files have header rows with column names that are slightly different from the column names I have assigned in the dataframe and it seems that when I read the csv data into the dataframe, the column names from the csv file replace the column names I chose when creating the dataframe. >>>>> >>>>> I have been unable to figure out if it is possible to assign column names of my choosing in the read.csv.sql() function? I have tried various variations but none seem to work. I tried colClasses = c(....) but that did not work, I tried field.types = c(...) but could not get that to work either. >>>>> >>>>> It seems that the above should be feasible but I am missing something? Does anyone know? >>>>> >>>>> A secondary issue is that the csv files have a column with a date in mm/dd/yyyy format that I would like to make into a Date type column in my dataframe. Again, I have been unable to find a way - if at all possible - to force a conversion into a Date format when importing into the dataframe. The best I have so far is to import is a character column and then use as.Date() to later force the conversion of the dataframe column. >>>>> >>>>> Is it possible to do this when importing using read.csv.sql()? >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> Yes, the files use the same column names and date format (at least as far as I know now.) I agree I could do it as you suggest above but from a purist perspective I would rather do it when importing the data using read.csv.sql(), particularly if column names and/or date format might change, or be different between different files. I am indeed selecting rows from a large number of csv files so this is entirely plausible. >>>> >>>> Has anyone been able to name columns in the read.csv.sql() call and/or force date format conversion in the call itself? The first refers to naming columns differently from what a header in the csv file may have. >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >> The documentation for read.csv.sql() suggests that colClasses() and/or field.types() should work but I may well have misunderstood the documentation, hence my question in this group. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.I had read the sqldf() documentation but was left with the impression that what I want to do is not easily doable.
You might achieve this using readr: ``` library(readr) lines <- "Id, Date, Time, Quality, Lat, Long STM05-1, 2005/02/28, 17:35, Good, -35.562, 177.158 STM05-1, 2005/02/28, 19:44, Good, -35.487, 177.129 STM05-1, 2005/02/28, 23:01, Unknown, -35.399, 177.064 STM05-1, 2005/03/01, 07:28, Unknown, -34.978, 177.268 STM05-1, 2005/03/01, 18:06, Poor, -34.799, 177.027 STM05-1, 2005/03/01, 18:47, Poor, -34.85, 177.059 STM05-2, 2005/02/28, 12:49, Good, -35.928, 177.328 STM05-2, 2005/02/28, 21:23, Poor, -35.926, 177.314" read_csv(lines) read_csv( lines, skip = 1, # Ignore the header row col_names = c("myId", "myDate", "myTime", "myQuality", "myLat", "myLong"), col_types = cols( myDate = col_date(format = ""), myTime = col_time(format = ""), myLat = col_number(), myLong = col_number(), .default = col_character() ) ) read_csv( lines, col_types = cols_only( Id = col_character(), Date = col_date(format = ""), Time = col_time(format = "") ) ) read_csv( lines, skip = 1, # Ignore the header row col_names = c("myId", "myDate", "myTime", "myQuality", "myLat", "myLong"), col_types = cols_only( myId = col_character(), myDate = col_date(format = ""), myTime = col_time(format = "") ) ) ``` HTH Ulrik On 2020-07-20 02:07, H wrote:> On 07/18/2020 01:38 PM, William Michels wrote: >> Do either of the postings/threads below help? >> >> https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534 >> https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html >> >> Otherwise you can try reading through the FAQ on Github: >> >> https://github.com/ggrothendieck/sqldf >> >> HTH, Bill. >> >> W. Michels, Ph.D. >> >> >> >> On Sat, Jul 18, 2020 at 9:59 AM H <agents at meddatainc.com> wrote: >>> On 07/18/2020 11:54 AM, Rui Barradas wrote: >>>> Hello, >>>> >>>> I don't believe that what you are asking for is possible but like >>>> Bert suggested, you can do it after reading in the data. >>>> You could write a convenience function to read the data, then change >>>> what you need to change. >>>> Then the function would return this final object. >>>> >>>> Rui Barradas >>>> >>>> ?s 16:43 de 18/07/2020, H escreveu: >>>> >>>>> On 07/17/2020 09:49 PM, Bert Gunter wrote: >>>>>> Is there some reason that you can't make the changes to the data >>>>>> frame (column names, as.date(), ...) *after* you have read all >>>>>> your data in? >>>>>> >>>>>> Do all your csv files use the same names and date formats? >>>>>> >>>>>> >>>>>> Bert Gunter >>>>>> >>>>>> "The trouble with having an open mind is that people keep coming >>>>>> along and sticking things into it." >>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>>>> >>>>>> >>>>>> On Fri, Jul 17, 2020 at 6:28 PM H <agents at meddatainc.com >>>>>> <mailto:agents at meddatainc.com>> wrote: >>>>>> >>>>>> I have created a dataframe with columns that are characters, >>>>>> integers and numeric and with column names assigned by me. I am >>>>>> using read.csv.sql() to read portions of a number of large csv >>>>>> files into this dataframe, each csv file having a header row with >>>>>> columb names. >>>>>> >>>>>> The problem I am having is that the csv files have header >>>>>> rows with column names that are slightly different from the column >>>>>> names I have assigned in the dataframe and it seems that when I >>>>>> read the csv data into the dataframe, the column names from the >>>>>> csv file replace the column names I chose when creating the >>>>>> dataframe. >>>>>> >>>>>> I have been unable to figure out if it is possible to assign >>>>>> column names of my choosing in the read.csv.sql() function? I have >>>>>> tried various variations but none seem to work. I tried colClasses >>>>>> = c(....) but that did not work, I tried field.types = c(...) but >>>>>> could not get that to work either. >>>>>> >>>>>> It seems that the above should be feasible but I am missing >>>>>> something? Does anyone know? >>>>>> >>>>>> A secondary issue is that the csv files have a column with a >>>>>> date in mm/dd/yyyy format that I would like to make into a Date >>>>>> type column in my dataframe. Again, I have been unable to find a >>>>>> way - if at all possible - to force a conversion into a Date >>>>>> format when importing into the dataframe. The best I have so far >>>>>> is to import is a character column and then use as.Date() to later >>>>>> force the conversion of the dataframe column. >>>>>> >>>>>> Is it possible to do this when importing using >>>>>> read.csv.sql()? >>>>>> >>>>>> ______________________________________________ >>>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing >>>>>> list -- To UNSUBSCRIBE and more, see >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible >>>>>> code. >>>>>> >>>>> Yes, the files use the same column names and date format (at least >>>>> as far as I know now.) I agree I could do it as you suggest above >>>>> but from a purist perspective I would rather do it when importing >>>>> the data using read.csv.sql(), particularly if column names and/or >>>>> date format might change, or be different between different files. >>>>> I am indeed selecting rows from a large number of csv files so this >>>>> is entirely plausible. >>>>> >>>>> Has anyone been able to name columns in the read.csv.sql() call >>>>> and/or force date format conversion in the call itself? The first >>>>> refers to naming columns differently from what a header in the csv >>>>> file may have. >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>> The documentation for read.csv.sql() suggests that colClasses() >>> and/or field.types() should work but I may well have misunderstood >>> the documentation, hence my question in this group. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > I had read the sqldf() documentation but was left with the impression > that what I want to do is not easily doable. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
You might achieve this using readr: ``` library(readr) lines <- "Id, Date, Time, Quality, Lat, Long STM05-1, 2005/02/28, 17:35, Good, -35.562, 177.158 STM05-1, 2005/02/28, 19:44, Good, -35.487, 177.129 STM05-1, 2005/02/28, 23:01, Unknown, -35.399, 177.064 STM05-1, 2005/03/01, 07:28, Unknown, -34.978, 177.268 STM05-1, 2005/03/01, 18:06, Poor, -34.799, 177.027 STM05-1, 2005/03/01, 18:47, Poor, -34.85, 177.059 STM05-2, 2005/02/28, 12:49, Good, -35.928, 177.328 STM05-2, 2005/02/28, 21:23, Poor, -35.926, 177.314" read_csv(lines) read_csv( lines, skip = 1, # Ignore the header row col_names = c("myId", "myDate", "myTime", "myQuality", "myLat", "myLong"), col_types = cols( myDate = col_date(format = ""), myTime = col_time(format = ""), myLat = col_number(), myLong = col_number(), .default = col_character() ) ) read_csv( lines, col_types = cols_only( Id = col_character(), Date = col_date(format = ""), Time = col_time(format = "") ) ) read_csv( lines, skip = 1, # Ignore the header row col_names = c("myId", "myDate", "myTime", "myQuality", "myLat", "myLong"), col_types = cols_only( myId = col_character(), myDate = col_date(format = ""), myTime = col_time(format = "") ) ) ``` HTH Ulrik On 2020-07-20 02:07, H wrote:> On 07/18/2020 01:38 PM, William Michels wrote: >> Do either of the postings/threads below help? >> >> https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534 >> https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html >> >> Otherwise you can try reading through the FAQ on Github: >> >> https://github.com/ggrothendieck/sqldf >> >> HTH, Bill. >> >> W. Michels, Ph.D. >> >> >> >> On Sat, Jul 18, 2020 at 9:59 AM H <agents at meddatainc.com> wrote: >>> On 07/18/2020 11:54 AM, Rui Barradas wrote: >>>> Hello, >>>> >>>> I don't believe that what you are asking for is possible but like >>>> Bert suggested, you can do it after reading in the data. >>>> You could write a convenience function to read the data, then change >>>> what you need to change. >>>> Then the function would return this final object. >>>> >>>> Rui Barradas >>>> >>>> ?s 16:43 de 18/07/2020, H escreveu: >>>> >>>>> On 07/17/2020 09:49 PM, Bert Gunter wrote: >>>>>> Is there some reason that you can't make the changes to the data >>>>>> frame (column names, as.date(), ...) *after* you have read all >>>>>> your data in? >>>>>> >>>>>> Do all your csv files use the same names and date formats? >>>>>> >>>>>> >>>>>> Bert Gunter >>>>>> >>>>>> "The trouble with having an open mind is that people keep coming >>>>>> along and sticking things into it." >>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>>>> >>>>>> >>>>>> On Fri, Jul 17, 2020 at 6:28 PM H <agents at meddatainc.com >>>>>> <mailto:agents at meddatainc.com>> wrote: >>>>>> >>>>>> I have created a dataframe with columns that are characters, >>>>>> integers and numeric and with column names assigned by me. I am >>>>>> using read.csv.sql() to read portions of a number of large csv >>>>>> files into this dataframe, each csv file having a header row with >>>>>> columb names. >>>>>> >>>>>> The problem I am having is that the csv files have header >>>>>> rows with column names that are slightly different from the column >>>>>> names I have assigned in the dataframe and it seems that when I >>>>>> read the csv data into the dataframe, the column names from the >>>>>> csv file replace the column names I chose when creating the >>>>>> dataframe. >>>>>> >>>>>> I have been unable to figure out if it is possible to assign >>>>>> column names of my choosing in the read.csv.sql() function? I have >>>>>> tried various variations but none seem to work. I tried colClasses >>>>>> = c(....) but that did not work, I tried field.types = c(...) but >>>>>> could not get that to work either. >>>>>> >>>>>> It seems that the above should be feasible but I am missing >>>>>> something? Does anyone know? >>>>>> >>>>>> A secondary issue is that the csv files have a column with a >>>>>> date in mm/dd/yyyy format that I would like to make into a Date >>>>>> type column in my dataframe. Again, I have been unable to find a >>>>>> way - if at all possible - to force a conversion into a Date >>>>>> format when importing into the dataframe. The best I have so far >>>>>> is to import is a character column and then use as.Date() to later >>>>>> force the conversion of the dataframe column. >>>>>> >>>>>> Is it possible to do this when importing using >>>>>> read.csv.sql()? >>>>>> >>>>>> ______________________________________________ >>>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing >>>>>> list -- To UNSUBSCRIBE and more, see >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible >>>>>> code. >>>>>> >>>>> Yes, the files use the same column names and date format (at least >>>>> as far as I know now.) I agree I could do it as you suggest above >>>>> but from a purist perspective I would rather do it when importing >>>>> the data using read.csv.sql(), particularly if column names and/or >>>>> date format might change, or be different between different files. >>>>> I am indeed selecting rows from a large number of csv files so this >>>>> is entirely plausible. >>>>> >>>>> Has anyone been able to name columns in the read.csv.sql() call >>>>> and/or force date format conversion in the call itself? The first >>>>> refers to naming columns differently from what a header in the csv >>>>> file may have. >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>> The documentation for read.csv.sql() suggests that colClasses() >>> and/or field.types() should work but I may well have misunderstood >>> the documentation, hence my question in this group. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > I had read the sqldf() documentation but was left with the impression > that what I want to do is not easily doable. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Dear Ulrik, On 2020-07-29 17:14 +0200, Ulrik Stervbo via R-help wrote:> library(readr) > read_csv(This thread was about sqldf::read.csv.sql ... What is the purpose of bringing up readr::read_csv? I am unfamilliar with it, so it might be a good one. Best, Rasmus -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200729/84c29673/attachment.sig>