On 07/18/2020 01:38 PM, William Michels wrote:> Do either of the postings/threads below help? > > https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534 > https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html > > Otherwise you can try reading through the FAQ on Github: > > https://github.com/ggrothendieck/sqldf > > HTH, Bill. > > W. Michels, Ph.D. > > > > On Sat, Jul 18, 2020 at 9:59 AM H <agents at meddatainc.com> wrote: >> On 07/18/2020 11:54 AM, Rui Barradas wrote: >>> Hello, >>> >>> I don't believe that what you are asking for is possible but like Bert suggested, you can do it after reading in the data. >>> You could write a convenience function to read the data, then change what you need to change. >>> Then the function would return this final object. >>> >>> Rui Barradas >>> >>> ?s 16:43 de 18/07/2020, H escreveu: >>> >>>> On 07/17/2020 09:49 PM, Bert Gunter wrote: >>>>> Is there some reason that you can't make the changes to the data frame (column names, as.date(), ...) *after* you have read all your data in? >>>>> >>>>> Do all your csv files use the same names and date formats? >>>>> >>>>> >>>>> Bert Gunter >>>>> >>>>> "The trouble with having an open mind is that people keep coming along and sticking things into it." >>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>>> >>>>> >>>>> On Fri, Jul 17, 2020 at 6:28 PM H <agents at meddatainc.com <mailto:agents at meddatainc.com>> wrote: >>>>> >>>>> I have created a dataframe with columns that are characters, integers and numeric and with column names assigned by me. I am using read.csv.sql() to read portions of a number of large csv files into this dataframe, each csv file having a header row with columb names. >>>>> >>>>> The problem I am having is that the csv files have header rows with column names that are slightly different from the column names I have assigned in the dataframe and it seems that when I read the csv data into the dataframe, the column names from the csv file replace the column names I chose when creating the dataframe. >>>>> >>>>> I have been unable to figure out if it is possible to assign column names of my choosing in the read.csv.sql() function? I have tried various variations but none seem to work. I tried colClasses = c(....) but that did not work, I tried field.types = c(...) but could not get that to work either. >>>>> >>>>> It seems that the above should be feasible but I am missing something? Does anyone know? >>>>> >>>>> A secondary issue is that the csv files have a column with a date in mm/dd/yyyy format that I would like to make into a Date type column in my dataframe. Again, I have been unable to find a way - if at all possible - to force a conversion into a Date format when importing into the dataframe. The best I have so far is to import is a character column and then use as.Date() to later force the conversion of the dataframe column. >>>>> >>>>> Is it possible to do this when importing using read.csv.sql()? >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> Yes, the files use the same column names and date format (at least as far as I know now.) I agree I could do it as you suggest above but from a purist perspective I would rather do it when importing the data using read.csv.sql(), particularly if column names and/or date format might change, or be different between different files. I am indeed selecting rows from a large number of csv files so this is entirely plausible. >>>> >>>> Has anyone been able to name columns in the read.csv.sql() call and/or force date format conversion in the call itself? The first refers to naming columns differently from what a header in the csv file may have. >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >> The documentation for read.csv.sql() suggests that colClasses() and/or field.types() should work but I may well have misunderstood the documentation, hence my question in this group. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.I had read the sqldf() documentation but was left with the impression that what I want to do is not easily doable.
You might achieve this using readr:
```
library(readr)
lines <- "Id, Date, Time, Quality, Lat, Long
STM05-1, 2005/02/28, 17:35, Good, -35.562, 177.158
STM05-1, 2005/02/28, 19:44, Good, -35.487, 177.129
STM05-1, 2005/02/28, 23:01, Unknown, -35.399, 177.064
STM05-1, 2005/03/01, 07:28, Unknown, -34.978, 177.268
STM05-1, 2005/03/01, 18:06, Poor, -34.799, 177.027
STM05-1, 2005/03/01, 18:47, Poor, -34.85, 177.059
STM05-2, 2005/02/28, 12:49, Good, -35.928, 177.328
STM05-2, 2005/02/28, 21:23, Poor, -35.926, 177.314"
read_csv(lines)
read_csv(
lines,
skip = 1, # Ignore the header row
col_names = c("myId", "myDate", "myTime",
"myQuality", "myLat",
"myLong"),
col_types = cols(
myDate = col_date(format = ""),
myTime = col_time(format = ""),
myLat = col_number(),
myLong = col_number(),
.default = col_character()
)
)
read_csv(
lines,
col_types = cols_only(
Id = col_character(),
Date = col_date(format = ""),
Time = col_time(format = "")
)
)
read_csv(
lines,
skip = 1, # Ignore the header row
col_names = c("myId", "myDate", "myTime",
"myQuality", "myLat",
"myLong"),
col_types = cols_only(
myId = col_character(),
myDate = col_date(format = ""),
myTime = col_time(format = "")
)
)
```
HTH
Ulrik
On 2020-07-20 02:07, H wrote:> On 07/18/2020 01:38 PM, William Michels wrote:
>> Do either of the postings/threads below help?
>>
>>
https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534
>>
https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html
>>
>> Otherwise you can try reading through the FAQ on Github:
>>
>> https://github.com/ggrothendieck/sqldf
>>
>> HTH, Bill.
>>
>> W. Michels, Ph.D.
>>
>>
>>
>> On Sat, Jul 18, 2020 at 9:59 AM H <agents at meddatainc.com>
wrote:
>>> On 07/18/2020 11:54 AM, Rui Barradas wrote:
>>>> Hello,
>>>>
>>>> I don't believe that what you are asking for is possible
but like
>>>> Bert suggested, you can do it after reading in the data.
>>>> You could write a convenience function to read the data, then
change
>>>> what you need to change.
>>>> Then the function would return this final object.
>>>>
>>>> Rui Barradas
>>>>
>>>> ?s 16:43 de 18/07/2020, H escreveu:
>>>>
>>>>> On 07/17/2020 09:49 PM, Bert Gunter wrote:
>>>>>> Is there some reason that you can't make the
changes to the data
>>>>>> frame (column names, as.date(), ...) *after* you have
read all
>>>>>> your data in?
>>>>>>
>>>>>> Do all your csv files use the same names and date
formats?
>>>>>>
>>>>>>
>>>>>> Bert Gunter
>>>>>>
>>>>>> "The trouble with having an open mind is that
people keep coming
>>>>>> along and sticking things into it."
>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 17, 2020 at 6:28 PM H <agents at
meddatainc.com
>>>>>> <mailto:agents at meddatainc.com>> wrote:
>>>>>>
>>>>>> I have created a dataframe with columns that are
characters,
>>>>>> integers and numeric and with column names assigned by
me. I am
>>>>>> using read.csv.sql() to read portions of a number of
large csv
>>>>>> files into this dataframe, each csv file having a
header row with
>>>>>> columb names.
>>>>>>
>>>>>> The problem I am having is that the csv files have
header
>>>>>> rows with column names that are slightly different from
the column
>>>>>> names I have assigned in the dataframe and it seems
that when I
>>>>>> read the csv data into the dataframe, the column names
from the
>>>>>> csv file replace the column names I chose when creating
the
>>>>>> dataframe.
>>>>>>
>>>>>> I have been unable to figure out if it is possible
to assign
>>>>>> column names of my choosing in the read.csv.sql()
function? I have
>>>>>> tried various variations but none seem to work. I tried
colClasses
>>>>>> = c(....) but that did not work, I tried field.types =
c(...) but
>>>>>> could not get that to work either.
>>>>>>
>>>>>> It seems that the above should be feasible but I
am missing
>>>>>> something? Does anyone know?
>>>>>>
>>>>>> A secondary issue is that the csv files have a
column with a
>>>>>> date in mm/dd/yyyy format that I would like to make
into a Date
>>>>>> type column in my dataframe. Again, I have been unable
to find a
>>>>>> way - if at all possible - to force a conversion into a
Date
>>>>>> format when importing into the dataframe. The best I
have so far
>>>>>> is to import is a character column and then use
as.Date() to later
>>>>>> force the conversion of the dataframe column.
>>>>>>
>>>>>> Is it possible to do this when importing using
>>>>>> read.csv.sql()?
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org <mailto:R-help at
r-project.org> mailing
>>>>>> list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained,
reproducible
>>>>>> code.
>>>>>>
>>>>> Yes, the files use the same column names and date format
(at least
>>>>> as far as I know now.) I agree I could do it as you suggest
above
>>>>> but from a purist perspective I would rather do it when
importing
>>>>> the data using read.csv.sql(), particularly if column names
and/or
>>>>> date format might change, or be different between different
files.
>>>>> I am indeed selecting rows from a large number of csv files
so this
>>>>> is entirely plausible.
>>>>>
>>>>> Has anyone been able to name columns in the read.csv.sql()
call
>>>>> and/or force date format conversion in the call itself? The
first
>>>>> refers to naming columns differently from what a header in
the csv
>>>>> file may have.
>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>> The documentation for read.csv.sql() suggests that colClasses()
>>> and/or field.types() should work but I may well have misunderstood
>>> the documentation, hence my question in this group.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> I had read the sqldf() documentation but was left with the impression
> that what I want to do is not easily doable.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
You might achieve this using readr:
```
library(readr)
lines <- "Id, Date, Time, Quality, Lat, Long
STM05-1, 2005/02/28, 17:35, Good, -35.562, 177.158
STM05-1, 2005/02/28, 19:44, Good, -35.487, 177.129
STM05-1, 2005/02/28, 23:01, Unknown, -35.399, 177.064
STM05-1, 2005/03/01, 07:28, Unknown, -34.978, 177.268
STM05-1, 2005/03/01, 18:06, Poor, -34.799, 177.027
STM05-1, 2005/03/01, 18:47, Poor, -34.85, 177.059
STM05-2, 2005/02/28, 12:49, Good, -35.928, 177.328
STM05-2, 2005/02/28, 21:23, Poor, -35.926, 177.314"
read_csv(lines)
read_csv(
lines,
skip = 1, # Ignore the header row
col_names = c("myId", "myDate", "myTime",
"myQuality", "myLat",
"myLong"),
col_types = cols(
myDate = col_date(format = ""),
myTime = col_time(format = ""),
myLat = col_number(),
myLong = col_number(),
.default = col_character()
)
)
read_csv(
lines,
col_types = cols_only(
Id = col_character(),
Date = col_date(format = ""),
Time = col_time(format = "")
)
)
read_csv(
lines,
skip = 1, # Ignore the header row
col_names = c("myId", "myDate", "myTime",
"myQuality", "myLat",
"myLong"),
col_types = cols_only(
myId = col_character(),
myDate = col_date(format = ""),
myTime = col_time(format = "")
)
)
```
HTH
Ulrik
On 2020-07-20 02:07, H wrote:> On 07/18/2020 01:38 PM, William Michels wrote:
>> Do either of the postings/threads below help?
>>
>>
https://r.789695.n4.nabble.com/read-csv-sql-to-select-from-a-large-csv-file-td4650565.html#a4651534
>>
https://r.789695.n4.nabble.com/using-sqldf-s-read-csv-sql-to-read-a-file-with-quot-NA-quot-for-missing-td4642327.html
>>
>> Otherwise you can try reading through the FAQ on Github:
>>
>> https://github.com/ggrothendieck/sqldf
>>
>> HTH, Bill.
>>
>> W. Michels, Ph.D.
>>
>>
>>
>> On Sat, Jul 18, 2020 at 9:59 AM H <agents at meddatainc.com>
wrote:
>>> On 07/18/2020 11:54 AM, Rui Barradas wrote:
>>>> Hello,
>>>>
>>>> I don't believe that what you are asking for is possible
but like
>>>> Bert suggested, you can do it after reading in the data.
>>>> You could write a convenience function to read the data, then
change
>>>> what you need to change.
>>>> Then the function would return this final object.
>>>>
>>>> Rui Barradas
>>>>
>>>> ?s 16:43 de 18/07/2020, H escreveu:
>>>>
>>>>> On 07/17/2020 09:49 PM, Bert Gunter wrote:
>>>>>> Is there some reason that you can't make the
changes to the data
>>>>>> frame (column names, as.date(), ...) *after* you have
read all
>>>>>> your data in?
>>>>>>
>>>>>> Do all your csv files use the same names and date
formats?
>>>>>>
>>>>>>
>>>>>> Bert Gunter
>>>>>>
>>>>>> "The trouble with having an open mind is that
people keep coming
>>>>>> along and sticking things into it."
>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 17, 2020 at 6:28 PM H <agents at
meddatainc.com
>>>>>> <mailto:agents at meddatainc.com>> wrote:
>>>>>>
>>>>>> I have created a dataframe with columns that are
characters,
>>>>>> integers and numeric and with column names assigned by
me. I am
>>>>>> using read.csv.sql() to read portions of a number of
large csv
>>>>>> files into this dataframe, each csv file having a
header row with
>>>>>> columb names.
>>>>>>
>>>>>> The problem I am having is that the csv files have
header
>>>>>> rows with column names that are slightly different from
the column
>>>>>> names I have assigned in the dataframe and it seems
that when I
>>>>>> read the csv data into the dataframe, the column names
from the
>>>>>> csv file replace the column names I chose when creating
the
>>>>>> dataframe.
>>>>>>
>>>>>> I have been unable to figure out if it is possible
to assign
>>>>>> column names of my choosing in the read.csv.sql()
function? I have
>>>>>> tried various variations but none seem to work. I tried
colClasses
>>>>>> = c(....) but that did not work, I tried field.types =
c(...) but
>>>>>> could not get that to work either.
>>>>>>
>>>>>> It seems that the above should be feasible but I
am missing
>>>>>> something? Does anyone know?
>>>>>>
>>>>>> A secondary issue is that the csv files have a
column with a
>>>>>> date in mm/dd/yyyy format that I would like to make
into a Date
>>>>>> type column in my dataframe. Again, I have been unable
to find a
>>>>>> way - if at all possible - to force a conversion into a
Date
>>>>>> format when importing into the dataframe. The best I
have so far
>>>>>> is to import is a character column and then use
as.Date() to later
>>>>>> force the conversion of the dataframe column.
>>>>>>
>>>>>> Is it possible to do this when importing using
>>>>>> read.csv.sql()?
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org <mailto:R-help at
r-project.org> mailing
>>>>>> list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained,
reproducible
>>>>>> code.
>>>>>>
>>>>> Yes, the files use the same column names and date format
(at least
>>>>> as far as I know now.) I agree I could do it as you suggest
above
>>>>> but from a purist perspective I would rather do it when
importing
>>>>> the data using read.csv.sql(), particularly if column names
and/or
>>>>> date format might change, or be different between different
files.
>>>>> I am indeed selecting rows from a large number of csv files
so this
>>>>> is entirely plausible.
>>>>>
>>>>> Has anyone been able to name columns in the read.csv.sql()
call
>>>>> and/or force date format conversion in the call itself? The
first
>>>>> refers to naming columns differently from what a header in
the csv
>>>>> file may have.
>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>> The documentation for read.csv.sql() suggests that colClasses()
>>> and/or field.types() should work but I may well have misunderstood
>>> the documentation, hence my question in this group.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> I had read the sqldf() documentation but was left with the impression
> that what I want to do is not easily doable.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Dear Ulrik, On 2020-07-29 17:14 +0200, Ulrik Stervbo via R-help wrote:> library(readr) > read_csv(This thread was about sqldf::read.csv.sql ... What is the purpose of bringing up readr::read_csv? I am unfamilliar with it, so it might be a good one. Best, Rasmus -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200729/84c29673/attachment.sig>