Nhan La
2019-Nov-14 23:04 UTC
[R] How to import and create time series data frames in an efficient way?
I have many separate data files in csv format for a lot of daily stock prices. Over a few years there are hundreds of those data files, whose names are the dates of data record. In each file there are variables of ticker (or stock trading code), date, open price, high price, low price, close price, and trading volume. For example, inside a data file named 20150128.txt it looks like this: FB,20150128,1.075,1.075,0.97,0.97,725221 AAPL,20150128,2.24,2.24,2.2,2.24,63682 AMZN,20150128,0.4,0.415,0.4,0.415,194900 NFLX,20150128,50.19,50.21,50.19,50.19,761845 GOOGL,20150128,1.62,1.645,1.59,1.63,684835 ...................and many more.................. In case it's relevant, the number of stocks in these files are not necessarily the same (so there will be missing data). I need to import and create 5 separate time series data frames from those files, one each for Open, High, Low, Close and Volume. In each data frame, rows are indexed by date, and columns by ticker. For example, the data frame Open may look like this: DATE,FB,AAPL,AMZN,NFLX,GOOGL,... 20150128,1.5,2.2,0.4,5.1,1.6,... 20150129,NA,2.3,0.5,5.2,1.7,... ... What will be an efficient way to do that? I've used the following codes to read the files into a list of data frames but don't know what to do next from here. files = list.files(pattern="*.txt") mydata = lapply(files, read.csv,head=FALSE) Thanks, Nathan Disclaimer: In case it's relevant, this question is also posted on stackoverflow. [[alternative HTML version deleted]]
Bert Gunter
2019-Nov-15 00:34 UTC
[R] How to import and create time series data frames in an efficient way?
So you've made no attempt at all to do this for yourself?! That suggests to me that you need to spend time with some R tutorials. Also, please post in plain text on this plain text list. HTML can get mangled, as it may have here. -- Bert "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Nov 14, 2019 at 4:11 PM Nhan La <lathanhnhan at gmail.com> wrote:> I have many separate data files in csv format for a lot of daily stock > prices. Over a few years there are hundreds of those data files, whose > names are the dates of data record. > > In each file there are variables of ticker (or stock trading code), date, > open price, high price, low price, close price, and trading volume. For > example, inside a data file named 20150128.txt it looks like this: > > FB,20150128,1.075,1.075,0.97,0.97,725221 > AAPL,20150128,2.24,2.24,2.2,2.24,63682 > AMZN,20150128,0.4,0.415,0.4,0.415,194900 > NFLX,20150128,50.19,50.21,50.19,50.19,761845 > GOOGL,20150128,1.62,1.645,1.59,1.63,684835 ...................and many > more.................. > > In case it's relevant, the number of stocks in these files are not > necessarily the same (so there will be missing data). I need to import and > create 5 separate time series data frames from those files, one each for > Open, High, Low, Close and Volume. In each data frame, rows are indexed by > date, and columns by ticker. For example, the data frame Open may look like > this: > > DATE,FB,AAPL,AMZN,NFLX,GOOGL,... 20150128,1.5,2.2,0.4,5.1,1.6,... > 20150129,NA,2.3,0.5,5.2,1.7,... ... > > What will be an efficient way to do that? I've used the following codes to > read the files into a list of data frames but don't know what to do next > from here. > > files = list.files(pattern="*.txt") mydata = lapply(files, > read.csv,head=FALSE) > > Thanks, > > Nathan > > Disclaimer: In case it's relevant, this question is also posted on > stackoverflow. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Nhan La
2019-Nov-15 00:57 UTC
[R] How to import and create time series data frames in an efficient way?
Hi Bert, I've attempted to find the answer and actually been able to import the individual data sets into a list of data frames. But I'm not sure how to go ahead with the next step. I'm not necessarily asking for a final answer. Perhaps if you (I mean others as well) would like a constructive coaching, you would suggest a few key words to look at? Sorry for the HTML thing, this is my first post. I'll do better next times. Thanks, Nathan On Fri, Nov 15, 2019 at 11:34 AM Bert Gunter <bgunter.4567 at gmail.com> wrote:> So you've made no attempt at all to do this for yourself?! > > That suggests to me that you need to spend time with some R tutorials. > > Also, please post in plain text on this plain text list. HTML can get > mangled, as it may have here. > > -- Bert > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Thu, Nov 14, 2019 at 4:11 PM Nhan La <lathanhnhan at gmail.com> wrote: > >> I have many separate data files in csv format for a lot of daily stock >> prices. Over a few years there are hundreds of those data files, whose >> names are the dates of data record. >> >> In each file there are variables of ticker (or stock trading code), date, >> open price, high price, low price, close price, and trading volume. For >> example, inside a data file named 20150128.txt it looks like this: >> >> FB,20150128,1.075,1.075,0.97,0.97,725221 >> AAPL,20150128,2.24,2.24,2.2,2.24,63682 >> AMZN,20150128,0.4,0.415,0.4,0.415,194900 >> NFLX,20150128,50.19,50.21,50.19,50.19,761845 >> GOOGL,20150128,1.62,1.645,1.59,1.63,684835 ...................and many >> more.................. >> >> In case it's relevant, the number of stocks in these files are not >> necessarily the same (so there will be missing data). I need to import and >> create 5 separate time series data frames from those files, one each for >> Open, High, Low, Close and Volume. In each data frame, rows are indexed by >> date, and columns by ticker. For example, the data frame Open may look >> like >> this: >> >> DATE,FB,AAPL,AMZN,NFLX,GOOGL,... 20150128,1.5,2.2,0.4,5.1,1.6,... >> 20150129,NA,2.3,0.5,5.2,1.7,... ... >> >> What will be an efficient way to do that? I've used the following codes to >> read the files into a list of data frames but don't know what to do next >> from here. >> >> files = list.files(pattern="*.txt") mydata = lapply(files, >> read.csv,head=FALSE) >> >> Thanks, >> >> Nathan >> >> Disclaimer: In case it's relevant, this question is also posted on >> stackoverflow. >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >[[alternative HTML version deleted]]