I would have thought that a csv file written out by Excel would have looked
like this for your data:
ADS GY Equity, ,ALV GY Equity,
Date , Px Last ,Date , Px Last
07/02/04 ," 41,395 ",07/01/31 ," 130,234"
07/02/05 ," 42,134 ",07/02/01 ," 133,353"
07/02/06 ," 41,875 ",07/02/04 ," 133,824"
, , 07/02/05," 134,734"
Here we have commas separating the fields and this will read in the data:
> x <- read.csv("/tempxx.csv", skip=2, header=FALSE, as.is=TRUE)
> str(x)
'data.frame': 4 obs. of 4 variables:
$ V1: chr "07/02/04 " "07/02/05 " "07/02/06
" " "
$ V2: chr " 41,395 " " 42,134 "
" 41,875
" " "
$ V3: chr "07/01/31 " "07/02/01 " "07/02/04
" " 07/02/05"
$ V4: chr " 130,234" " 133,353" " 133,824"
" 134,734"> x
V1 V2 V3 V4
1 07/02/04 41,395 07/01/31 130,234
2 07/02/05 42,134 07/02/01 133,353
3 07/02/06 41,875 07/02/04 133,824
4 07/02/05
134,734>
You will have to delete the ',' from the numeric fields, The
'dec=","' on
your command would have read '133,353' in as '133.353' which I
don't think
is what you want. You can delete the commas and convert to numeric with the
following:
> x$V4 <- as.numeric(gsub(',', '', x$V4))
> x
V1 V2 V3 V4
1 07/02/04 41,395 07/01/31 130234
2 07/02/05 42,134 07/02/01 133353
3 07/02/06 41,875 07/02/04 133824
4 07/02/05
134734>
So take a look at the options that you have with writing out the Excel data;
you should be able to get either tab or comma delimited. If not, you can
try reading in each line as a character string and then using 'substr'
to
split it apart assuming that you have constant width columns.
On 3/7/07, Hofert Marius <m_hofert@web.de> wrote:>
> Dear r-help users,
>
> I have the following simple problem: Reading data from a file. The
> file is a .txt file exported ("save as...") from Excel (see below
for
> an example). The Excel file consists of two header rows (first row
> consists of ticker symbols of stocks, the second row consists of
> column explanations ("Date","Px Last"), followed by
several rows of
> data. Now forget about the first two rows, I can deal with that (read
> separately, then extract the actual ticker symbols "ADS",
> "ALV", ...). For reading the rest, I tried several things, for
example:
>
data=read.table(infile,quote="",fill=T,dec=",",skip=2,colClasses=rep(c
> ("character","numeric"),ntickers))
> or
> data=matrix(scan(file=infile,what=rep(c
>
("character","numeric"),ntickers),dec=",",skip=2),ncol=2*ntickers,byrow>
T)
> where "infile" specifies the path to the input file and
"ntickers" is
> the number of ticker-columns in the data set, so in the example
> below, ntickers=2.
>
> Both ways of reading the data work perfectly fine if all columns have
> the same length (i.e. the same number of filled rows), so if the data
> is given in a (filled) "rectangular" form. Now, as you can
imagine,
> there are days when one stock is traded but not the other... so,
> there might be columns that do not have the same number of filled
> rows (see below, for the stock with ticker symbol "ADS", only 3
> trading days are shown, so this column is shorter than the data
> column for the stock "ALV"). Now, if I export such a structure to
> a .txt file, then all (by default) blank fields will be replaced by
> "\t", i.e. tabs. Both reading procedures as give above have
problems
> as they either display that the number of rows/columns do not fit
> together or as they read the table, but some cells are shifted to the
> left (for the example below, the entry "07/02/05 134,7"
appears in
> the empty field of the stock "ADS" which is of course not what we
want).
> So the simple question is: How do I read such a structure?
> Can there be a simple solution? The problem is simply that empty
> cells are replace by "\t" which are then ignored for reading. So
how
> do we distinguish between the empty cells that are given between the
> columns and the empty cells that actually "fill" a column to have
the
> same length as other columns. Of course I could manually put in a
> certain character (e.g. a "*") to fill in the gaps, but the data
set
> is simply too large. If it helps, these blank fields only appear in
> the end of each column, not in the middle.
>
> As I work on a Mac (OS X 10.4), it was not possible (at least to me)
> to read the data directly from the Excel file vial the library RODBC
> or read.xls.
>
> Note, that the same problem arises, when I export the Excel file as
> a .csv, then all blank fields are separated by ";" instead of
"\t"
> and the reading procedure can also not decide if the field
> corresponds to an empty separating column or actually to a column
> with given entries, but which is simply not as long as another column
> in the file.
>
> Hope, you can help. I would really appreciate it.
>
> Best regards.
>
> Marius
>
> Excel example (I hope it's displayed correctly, the entry in the last
> row should be aligned with the last column):
>
> ADS GY Equity ALV GY Equity
> Date Px Last Date Px Last
> 07/02/04 41,395 07/01/31 130,234
> 07/02/05 42,134 07/02/01 133,353
> 07/02/06 41,875 07/02/04 133,824
> 07/02/05 134,734
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
[[alternative HTML version deleted]]