Hi everybody,
I'm working on the very
messy data, I have tried to clean it up in SAS and
SAS/IML but there is not enough info on how to handle certain things
in SAS so I have turned to R. The thing itself should be rather
simple, so i was wondering if someone could help me out.
The original .csv has ([1] 7138 6338 ) dimensions with funds with the
corresponding dates and observations for each date for around 10 years and 4000+
funds, meaning in COL5 has the next fund's name and so on.
COL1 COL2 COL3 COL4
HBNNF US Equity Date EQY_SH_OUT PX_VOLUME
#NAME? #N/A N/A 135000
7/7/2008 #N/A N/A 105000
7/17/2008 #N/A N/A 590000
7/22/2008 #N/A N/A 40000
so in R this .csv is somehow read as list (using typeof) and not as dataframe,
and a lot of stuff like regexpr searches in the whole file do not work or behave
strangely. I want to stack the fund data, and create a long dataset with a fund
name, date, eqy_sh_out and px_volume, with fund name present for each date.
That should look like this,
Fund_name Date EQY_SH_OUT PX_VOLUME
HBNNF US Equity 7/7/2008 #N/A N/A 105000
HBNNF US Equity 7/17/2008 #N/A N/A 590000
HBNNF US Equity 7/22/2008 #N/A N/A 40000
HBNNF US Equity 7/24/2008 #N/A N/A 3000
HBNNF US Equity 7/31/2008 #N/A N/A 1000
HBNNF US Equity 8/20/2008 #N/A N/A 1000
HBNNF US Equity 8/26/2008 #N/A N/A 2000
HBNNF US Equity 8/27/2008 #N/A N/A 2000
HBNNF US Equity 9/2/2008 #N/A N/A 5000
HND CN Equity 1/17/2008 #N/A N/A 28000
HND CN Equity 1/18/2008 #N/A N/A 25000
HND CN Equity 1/21/2008 #N/A N/A 5000
HND CN Equity 1/22/2008 #N/A N/A 101000
HND CN Equity 1/23/2008 #N/A N/A 122000
Any way to accomplish this? Should be an easy way, but i have never worked with
lists and somehow it doesn't read as a dataframe with strange results.
> small_raw[1,1]
[1] HBNNF US Equity
Levels: 0.26 0.46 COL1 HBNNF US Equity
> grep("Equity",as.character(small_raw))
integer(0)
> small_raw[[1]]
[1] HBNNF US Equity
[5]
[9]
[13]
[17]
[21]
[25]
[29]
[33]
[37]
[41]
[45]
[49]
[53]
[57]
[61]
[65]
[69]
[73]
[77]
[81]
[85]
[89]
[93]
[97] 0.46 0.46
[101] 0.46 0.26
[105] 0.26 0.26
[109] 0.26 0.26
[113] 0.26 0.26
[117] 0.26 0.26
[121] 0.26 0.26
[125] 0.26 0.26
[129] 0.26 0.26
[133] 0.26 0.26
[137] 0.26 0.26
[141] 0.26 0.26
[145] 0.26 0.26
[149] 0.26 0.26
[153] 0.26 0.26
[157] 0.26 0.26
[161] 0.26 0.26
[165] 0.26
[169]
[173]
[177]
[181]
[185]
[189]
[193]
[197]
Levels: 0.26 0.46 COL1 HBNNF US Equity
I have been on this for a while. Thank you in advance!
Arsenio