thr3ads.net - R help - [R] How to import and create time series data frames in an efficient way? [Nov 2019]

If this information is useful, please help other people find it:
Share via:

Bert Gunter

2019-Nov-15 04:44 UTC

[R] How to import and create time series data frames in an efficient way?

Ha! -- A bug! "Corrected" version inline below:
Bert Gunter

On Thu, Nov 14, 2019 at 8:10 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> Brute force approach, possibly inefficient:
>
> 1. You have a vector of file names. Sort them in the appropriate (time)
> order. These names are also the component names of all the data frames in
> your list that you read in, call it yourlist.
>
> 2. Create a vector of all the unique ticker names, perhaps by creating a
> vector of all the names and then unique() -ing it. Call this vector snames
> with n.names in it. It will probably have length several hundred at least I
> assume.
>
> 3. Suppose the  6 columns of data of each data frame that you want are
> named cnames = c("stocknames","Open", "High",
"Low", "Close", "Volume").
>
> 4. You could proceed as you suggested, but it would likely be more
> efficient, since all data that you want are numeric, to create a 3D array
> of NA's via:
>
> yourdat <- array(dim = c(n.dates, n.names, 5), dimnames = list(NULL,
> snames, cnames[-1]))
>
> 5. Then just loop  through your list of files and use indexing to fill in
> the columns x category slices for each date. Stocks that are missing will
> be NA automatically. e.g. (warning: UNTESTED):
>
> For date "d", let df be the data frame from date "d" in
your list, i.e.
>
> df <- yourlist[["d"]][, cnames]
> ## Note The order of the listed stocks in the "stocknames" column
can be
> different from frame to frame of your master list.
>
> Then fill in the flat for the dth date (i.e. dth row) in your array by:
>
> ## corrected line here:
> yourdat[ ,df[ ,"stocknames"], cnames[-1] <- as.matrix(df[
,-1]) ## need
> to omit the column names so it converts to numeric matrix
>## need to get the names of the stocks in the "stocknames" column in
the
order they appear in df.
>
> This should fill in  the values of the 2nd and 3rd dimensions of the array
> for all the stocks on the dth date with the data for each stock in the data
> frame matched to the appropriate column in the array.
>
> The entire loop will give all dates for all stocks and all categories with
> NA's for missing days. (*IF IT WORKS!*)
> You may need to modify this sightly if, for example, your stock names are
> row names rather than a field in your data frame. I leave such adjustments
> to you.
>
> Note again that this is fairly elementary with just arrays and indexing.
> Basic tutorials should tell you about all of this. Also, when plotting,
> you'll have to convert your dates to suitable date-time format.
>
> Cheers,
> Bert
>
>
>
>
> On Thu, Nov 14, 2019 at 4:55 PM Nhan La <lathanhnhan at gmail.com>
wrote:
>
>> Hi Bert,
>>
>> I've attempted to find the answer and actually been able to import
the
>> individual data sets into a list of data frames.
>>
>> But I'm not sure how to go ahead with the next step. I'm not
necessarily
>> asking for a final answer. Perhaps if you (I mean others as well) would
>> like a constructive coaching, you would suggest a few key words to look
at?
>>
>> Sorry for the HTML thing, this is my first post. I'll do better
next
>> times.
>>
>> Thanks,
>> Nathan
>>
>>
>>
>> On Fri, Nov 15, 2019 at 11:34 AM Bert Gunter <bgunter.4567 at
gmail.com>
>> wrote:
>>
>>> So you've made no attempt at all to do this for yourself?!
>>>
>>> That suggests to me that you need to spend time with some R
tutorials.
>>>
>>> Also, please post in plain text on this plain text list. HTML can
get
>>> mangled, as it may have here.
>>>
>>> -- Bert
>>> "The trouble with having an open mind is that people keep
coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
>>>
>>>
>>> On Thu, Nov 14, 2019 at 4:11 PM Nhan La <lathanhnhan at
gmail.com> wrote:
>>>
>>>> I have many separate data files in csv format for a lot of
daily stock
>>>> prices. Over a few years there are hundreds of those data
files, whose
>>>> names are the dates of data record.
>>>>
>>>> In each file there are variables of ticker (or stock trading
code),
>>>> date,
>>>> open price, high price, low price, close price, and trading
volume. For
>>>> example, inside a data file named 20150128.txt it looks like
this:
>>>>
>>>> FB,20150128,1.075,1.075,0.97,0.97,725221
>>>> AAPL,20150128,2.24,2.24,2.2,2.24,63682
>>>> AMZN,20150128,0.4,0.415,0.4,0.415,194900
>>>> NFLX,20150128,50.19,50.21,50.19,50.19,761845
>>>> GOOGL,20150128,1.62,1.645,1.59,1.63,684835
...................and many
>>>> more..................
>>>>
>>>> In case it's relevant, the number of stocks in these files
are not
>>>> necessarily the same (so there will be missing data). I need to
import
>>>> and
>>>> create 5 separate time series data frames from those files, one
each for
>>>> Open, High, Low, Close and Volume. In each data frame, rows are
indexed
>>>> by
>>>> date, and columns by ticker. For example, the data frame Open
may look
>>>> like
>>>> this:
>>>>
>>>> DATE,FB,AAPL,AMZN,NFLX,GOOGL,...
20150128,1.5,2.2,0.4,5.1,1.6,...
>>>> 20150129,NA,2.3,0.5,5.2,1.7,... ...
>>>>
>>>> What will be an efficient way to do that? I've used the
following codes
>>>> to
>>>> read the files into a list of data frames but don't know
what to do next
>>>> from here.
>>>>
>>>> files = list.files(pattern="*.txt") mydata =
lapply(files,
>>>> read.csv,head=FALSE)
>>>>
>>>> Thanks,
>>>>
>>>> Nathan
>>>>
>>>> Disclaimer: In case it's relevant, this question is also
posted on
>>>> stackoverflow.
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>>
	[[alternative HTML version deleted]]

Nhan La

2019-Nov-18 02:33 UTC

head link

[R] How to import and create time series data frames in an efficient way?

Thanks Bert. I also managed to get this work

files = list.files(pattern="*.txt")
df = ldply(files,
read_csv,col_names=c("ticker","date","open","high",
"low", "close", "volume"))
Cheers,
Nathan

On Fri, Nov 15, 2019 at 3:45 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> Ha! -- A bug! "Corrected" version inline below:
> Bert Gunter
>
> On Thu, Nov 14, 2019 at 8:10 PM Bert Gunter <bgunter.4567 at
gmail.com>
> wrote:
>
>> Brute force approach, possibly inefficient:
>>
>> 1. You have a vector of file names. Sort them in the appropriate (time)
>> order. These names are also the component names of all the data frames
in
>> your list that you read in, call it yourlist.
>>
>> 2. Create a vector of all the unique ticker names, perhaps by creating
a
>> vector of all the names and then unique() -ing it. Call this vector
snames
>> with n.names in it. It will probably have length several hundred at
least I
>> assume.
>>
>> 3. Suppose the  6 columns of data of each data frame that you want are
>> named cnames = c("stocknames","Open",
"High", "Low", "Close", "Volume").
>>
>> 4. You could proceed as you suggested, but it would likely be more
>> efficient, since all data that you want are numeric, to create a 3D
array
>> of NA's via:
>>
>> yourdat <- array(dim = c(n.dates, n.names, 5), dimnames = list(NULL,
>> snames, cnames[-1]))
>>
>> 5. Then just loop  through your list of files and use indexing to fill
in
>> the columns x category slices for each date. Stocks that are missing
will
>> be NA automatically. e.g. (warning: UNTESTED):
>>
>> For date "d", let df be the data frame from date
"d" in your list, i.e.
>>
>> df <- yourlist[["d"]][, cnames]
>> ## Note The order of the listed stocks in the "stocknames"
column can be
>> different from frame to frame of your master list.
>>
>> Then fill in the flat for the dth date (i.e. dth row) in your array by:
>>
>> ## corrected line here:
>
>> yourdat[ ,df[ ,"stocknames"], cnames[-1] <- as.matrix(df[
,-1]) ## need
>> to omit the column names so it converts to numeric matrix
>>
> ## need to get the names of the stocks in the "stocknames" column
in the
> order they appear in df.
>
>>
>> This should fill in  the values of the 2nd and 3rd dimensions of the
>> array for all the stocks on the dth date with the data for each stock
in
>> the data frame matched to the appropriate column in the array.
>>
>> The entire loop will give all dates for all stocks and all categories
>> with NA's for missing days. (*IF IT WORKS!*)
>> You may need to modify this sightly if, for example, your stock names
are
>> row names rather than a field in your data frame. I leave such
adjustments
>> to you.
>>
>> Note again that this is fairly elementary with just arrays and
indexing.
>> Basic tutorials should tell you about all of this. Also, when plotting,
>> you'll have to convert your dates to suitable date-time format.
>>
>> Cheers,
>> Bert
>>
>>
>>
>>
>> On Thu, Nov 14, 2019 at 4:55 PM Nhan La <lathanhnhan at
gmail.com> wrote:
>>
>>> Hi Bert,
>>>
>>> I've attempted to find the answer and actually been able to
import the
>>> individual data sets into a list of data frames.
>>>
>>> But I'm not sure how to go ahead with the next step. I'm
not necessarily
>>> asking for a final answer. Perhaps if you (I mean others as well)
would
>>> like a constructive coaching, you would suggest a few key words to
look at?
>>>
>>> Sorry for the HTML thing, this is my first post. I'll do better
next
>>> times.
>>>
>>> Thanks,
>>> Nathan
>>>
>>>
>>>
>>> On Fri, Nov 15, 2019 at 11:34 AM Bert Gunter <bgunter.4567 at
gmail.com>
>>> wrote:
>>>
>>>> So you've made no attempt at all to do this for yourself?!
>>>>
>>>> That suggests to me that you need to spend time with some R
tutorials.
>>>>
>>>> Also, please post in plain text on this plain text list. HTML
can get
>>>> mangled, as it may have here.
>>>>
>>>> -- Bert
>>>> "The trouble with having an open mind is that people keep
coming along
>>>> and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
>>>>
>>>>
>>>> On Thu, Nov 14, 2019 at 4:11 PM Nhan La <lathanhnhan at
gmail.com> wrote:
>>>>
>>>>> I have many separate data files in csv format for a lot of
daily stock
>>>>> prices. Over a few years there are hundreds of those data
files, whose
>>>>> names are the dates of data record.
>>>>>
>>>>> In each file there are variables of ticker (or stock
trading code),
>>>>> date,
>>>>> open price, high price, low price, close price, and trading
volume. For
>>>>> example, inside a data file named 20150128.txt it looks
like this:
>>>>>
>>>>> FB,20150128,1.075,1.075,0.97,0.97,725221
>>>>> AAPL,20150128,2.24,2.24,2.2,2.24,63682
>>>>> AMZN,20150128,0.4,0.415,0.4,0.415,194900
>>>>> NFLX,20150128,50.19,50.21,50.19,50.19,761845
>>>>> GOOGL,20150128,1.62,1.645,1.59,1.63,684835
...................and many
>>>>> more..................
>>>>>
>>>>> In case it's relevant, the number of stocks in these
files are not
>>>>> necessarily the same (so there will be missing data). I
need to import
>>>>> and
>>>>> create 5 separate time series data frames from those files,
one each
>>>>> for
>>>>> Open, High, Low, Close and Volume. In each data frame, rows
are
>>>>> indexed by
>>>>> date, and columns by ticker. For example, the data frame
Open may look
>>>>> like
>>>>> this:
>>>>>
>>>>> DATE,FB,AAPL,AMZN,NFLX,GOOGL,...
20150128,1.5,2.2,0.4,5.1,1.6,...
>>>>> 20150129,NA,2.3,0.5,5.2,1.7,... ...
>>>>>
>>>>> What will be an efficient way to do that? I've used the
following
>>>>> codes to
>>>>> read the files into a list of data frames but don't
know what to do
>>>>> next
>>>>> from here.
>>>>>
>>>>> files = list.files(pattern="*.txt") mydata =
lapply(files,
>>>>> read.csv,head=FALSE)
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Nathan
>>>>>
>>>>> Disclaimer: In case it's relevant, this question is
also posted on
>>>>> stackoverflow.
>>>>>
>>>>>         [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>
>>>>
	[[alternative HTML version deleted]]

Jeff Newmiller

2019-Nov-18 06:50 UTC

head link

[R] How to import and create time series data frames in an efficient way?

The pattern argument is supposed to be a regular expression, not a file globbing
expression, so "." matches anything, and "*" is not supposed
to be the first character of the search string (read ?regex).

Also, I think you forgot to indicate where the ldply function came from.. the
plyr package.. and the read_csv function comes from the readr package which is
part of the tidyverse. The plyr package doesn't play well with the tidyverse
packages, so if you are going there then it is better to use the tidyr unnest
function... something like:

library(dplyr)
library(tidyr)
library(readr)

dta <- (   tibble( fname = list.files( pattern="\\.txt$" )
       %>% mutate( data = lapply( fnames, read_csv, col_names=c(
"ticker", "date", "open", "high",
"low", "close", "volume" ) )
       %>% unnest( cols = data )
       )

On November 18, 2019 3:33:49 AM GMT+01:00, Nhan La <lathanhnhan at
gmail.com> wrote:>Thanks Bert. I also managed to get this work
>
>files = list.files(pattern="*.txt")
>df = ldply(files,
read_csv,col_names=c("ticker","date","open","high",
>"low", "close", "volume"))
>Cheers,
>Nathan
>
>On Fri, Nov 15, 2019 at 3:45 PM Bert Gunter <bgunter.4567 at
gmail.com>
>wrote:
>
>> Ha! -- A bug! "Corrected" version inline below:
>> Bert Gunter
>>
>> On Thu, Nov 14, 2019 at 8:10 PM Bert Gunter <bgunter.4567 at
gmail.com>
>> wrote:
>>
>>> Brute force approach, possibly inefficient:
>>>
>>> 1. You have a vector of file names. Sort them in the appropriate
>(time)
>>> order. These names are also the component names of all the data
>frames in
>>> your list that you read in, call it yourlist.
>>>
>>> 2. Create a vector of all the unique ticker names, perhaps by
>creating a
>>> vector of all the names and then unique() -ing it. Call this vector
>snames
>>> with n.names in it. It will probably have length several hundred at
>least I
>>> assume.
>>>
>>> 3. Suppose the  6 columns of data of each data frame that you want
>are
>>> named cnames = c("stocknames","Open",
"High", "Low", "Close",
>"Volume").
>>>
>>> 4. You could proceed as you suggested, but it would likely be more
>>> efficient, since all data that you want are numeric, to create a 3D
>array
>>> of NA's via:
>>>
>>> yourdat <- array(dim = c(n.dates, n.names, 5), dimnames =
list(NULL,
>>> snames, cnames[-1]))
>>>
>>> 5. Then just loop  through your list of files and use indexing to
>fill in
>>> the columns x category slices for each date. Stocks that are
missing
>will
>>> be NA automatically. e.g. (warning: UNTESTED):
>>>
>>> For date "d", let df be the data frame from date
"d" in your list,
>i.e.
>>>
>>> df <- yourlist[["d"]][, cnames]
>>> ## Note The order of the listed stocks in the
"stocknames" column
>can be
>>> different from frame to frame of your master list.
>>>
>>> Then fill in the flat for the dth date (i.e. dth row) in your array
>by:
>>>
>>> ## corrected line here:
>>
>>> yourdat[ ,df[ ,"stocknames"], cnames[-1] <-
as.matrix(df[ ,-1]) ##
>need
>>> to omit the column names so it converts to numeric matrix
>>>
>> ## need to get the names of the stocks in the "stocknames"
column in
>the
>> order they appear in df.
>>
>>>
>>> This should fill in  the values of the 2nd and 3rd dimensions of
the
>>> array for all the stocks on the dth date with the data for each
>stock in
>>> the data frame matched to the appropriate column in the array.
>>>
>>> The entire loop will give all dates for all stocks and all
>categories
>>> with NA's for missing days. (*IF IT WORKS!*)
>>> You may need to modify this sightly if, for example, your stock
>names are
>>> row names rather than a field in your data frame. I leave such
>adjustments
>>> to you.
>>>
>>> Note again that this is fairly elementary with just arrays and
>indexing.
>>> Basic tutorials should tell you about all of this. Also, when
>plotting,
>>> you'll have to convert your dates to suitable date-time format.
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>>
>>>
>>> On Thu, Nov 14, 2019 at 4:55 PM Nhan La <lathanhnhan at
gmail.com>
>wrote:
>>>
>>>> Hi Bert,
>>>>
>>>> I've attempted to find the answer and actually been able to
import
>the
>>>> individual data sets into a list of data frames.
>>>>
>>>> But I'm not sure how to go ahead with the next step.
I'm not
>necessarily
>>>> asking for a final answer. Perhaps if you (I mean others as
well)
>would
>>>> like a constructive coaching, you would suggest a few key words
to
>look at?
>>>>
>>>> Sorry for the HTML thing, this is my first post. I'll do
better
>next
>>>> times.
>>>>
>>>> Thanks,
>>>> Nathan
>>>>
>>>>
>>>>
>>>> On Fri, Nov 15, 2019 at 11:34 AM Bert Gunter
><bgunter.4567 at gmail.com>
>>>> wrote:
>>>>
>>>>> So you've made no attempt at all to do this for
yourself?!
>>>>>
>>>>> That suggests to me that you need to spend time with some R
>tutorials.
>>>>>
>>>>> Also, please post in plain text on this plain text list.
HTML can
>get
>>>>> mangled, as it may have here.
>>>>>
>>>>> -- Bert
>>>>> "The trouble with having an open mind is that people
keep coming
>along
>>>>> and sticking things into it."
>>>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
>>>>>
>>>>>
>>>>> On Thu, Nov 14, 2019 at 4:11 PM Nhan La <lathanhnhan at
gmail.com>
>wrote:
>>>>>
>>>>>> I have many separate data files in csv format for a lot
of daily
>stock
>>>>>> prices. Over a few years there are hundreds of those
data files,
>whose
>>>>>> names are the dates of data record.
>>>>>>
>>>>>> In each file there are variables of ticker (or stock
trading
>code),
>>>>>> date,
>>>>>> open price, high price, low price, close price, and
trading
>volume. For
>>>>>> example, inside a data file named 20150128.txt it looks
like
>this:
>>>>>>
>>>>>> FB,20150128,1.075,1.075,0.97,0.97,725221
>>>>>> AAPL,20150128,2.24,2.24,2.2,2.24,63682
>>>>>> AMZN,20150128,0.4,0.415,0.4,0.415,194900
>>>>>> NFLX,20150128,50.19,50.21,50.19,50.19,761845
>>>>>> GOOGL,20150128,1.62,1.645,1.59,1.63,684835
...................and
>many
>>>>>> more..................
>>>>>>
>>>>>> In case it's relevant, the number of stocks in
these files are
>not
>>>>>> necessarily the same (so there will be missing data). I
need to
>import
>>>>>> and
>>>>>> create 5 separate time series data frames from those
files, one
>each
>>>>>> for
>>>>>> Open, High, Low, Close and Volume. In each data frame,
rows are
>>>>>> indexed by
>>>>>> date, and columns by ticker. For example, the data
frame Open may
>look
>>>>>> like
>>>>>> this:
>>>>>>
>>>>>> DATE,FB,AAPL,AMZN,NFLX,GOOGL,...
20150128,1.5,2.2,0.4,5.1,1.6,...
>>>>>> 20150129,NA,2.3,0.5,5.2,1.7,... ...
>>>>>>
>>>>>> What will be an efficient way to do that? I've used
the following
>>>>>> codes to
>>>>>> read the files into a list of data frames but don't
know what to
>do
>>>>>> next
>>>>>> from here.
>>>>>>
>>>>>> files = list.files(pattern="*.txt") mydata =
lapply(files,
>>>>>> read.csv,head=FALSE)
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Nathan
>>>>>>
>>>>>> Disclaimer: In case it's relevant, this question is
also posted
>on
>>>>>> stackoverflow.
>>>>>>
>>>>>>         [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained,
reproducible
>code.
>>>>>>
>>>>>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
-- 
Sent from my phone. Please excuse my brevity.

R help - Nov 2019 - How to import and create time series data frames in an efficient way?

[R] How to import and create time series data frames in an efficient way?

[R] How to import and create time series data frames in an efficient way?

[R] How to import and create time series data frames in an efficient way?