thr3ads.net - R help - [R] "Missing value representation in Excel before [Jan 2006]

If this information is useful, please help other people find it:
Share via:

Leif Kirschenbaum

2006-Jan-10 20:50 UTC

[R] "Missing value representation in Excel before

I reproduce from memory my exhaustive look into this issue.
RODBC uses the Microsoft ODBC DLL's developed by Microsoft.
  These DLL's perform an automatic determination of column type based on the
contents of the first N rows of cells in each column, where N [0,16]. N may be
set in the Windows system registry, and there are a few other things that may be
set in the system registry which control how the DLL parses an Excel
spreadsheet. Unfortunately, the Microsoft DLL's do not always pay attention
to the registry settings and do not always interpret them in the same manner.
  The end result is that no matter what you do with RODBC, and no matter how the
authors of RODBC re-write it, some Excel spreadsheets will always be unreadable
via RODBC given particular insidious combinations of data in some columns of
your spreadsheet. (until such time as Microsoft fixes their DLL bugs, I mean
features) I have some faint recollection that the Microsoft DLL incorrectly
parses a column with non-empty rows due to some formatting issue of those
particular columns, which I was unable to cure by re-formatting the source
worksheet.
  I have had to resort to using the gdata package which runs a Perl script
"xls2csv.pl", which converts an Excel spreadsheet to CSV, for a few
Excel spreadsheets which exhibit the particular anomalies preventing use of
RODBC.

Leif Kirschenbaum
Senior Yield Engineer
Reflectivity, Inc.
(408) 737-8100 x307
leif at reflectivity.com 
> Message: 21
> Date: Mon, 9 Jan 2006 18:06:49 +0100
> From: "Fredrik Lundgren" <fredrik.bg.lundgren at
bredband.net>
> Subject: Re: [R] "Missing value representation in Excel before
> 	extraction to	R with RODBC"
> To: "Prof Brian Ripley" <ripley at stats.ox.ac.uk>,
"Petr Pikal"
> 	<petr.pikal at precheza.cz>
> Cc: R-help <r-help at stat.math.ethz.ch>
> Message-ID: <000801c6153f$14ab6e60$4a9d72d5 at Larissa>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
> 	reply-type=response
> 
> Dear list,
> 
> Well, those columns in Excel that starts with NA (actually 8 
> NA's in my 
> case) is imported as all NA in R but if the columns starts 
> with at least 
> 3 cells with values (i.e not NA) the are imported correctly 
> to R. When 
> as.is=TRUE is used a simular conversion takes place but now 
> as all <NA> 
> and dates are represented as date-and-time.
> Is there any way to get this correct even when the Excel 
> columns start 
> with several NA's?
> 
> Sincerely
> Fredrik
> 
> 
> ----- Original Message ----- 
> From: "Prof Brian Ripley" <ripley at stats.ox.ac.uk>
> To: "Petr Pikal" <petr.pikal at precheza.cz>
> Cc: "Fredrik Lundgren" <fredrik.bg.lundgren at
bredband.net>; "R-help"
> <r-help at stat.math.ethz.ch>
> Sent: Monday, January 09, 2006 9:36 AM
> Subject: Re: [R] "Missing value representation in Excel before 
> extraction to R with RODBC"
> 
> 
> > On Mon, 9 Jan 2006, Petr Pikal wrote:
> >
> >> Hi
> >>
> >> I believe it has something to do with the column identification
> >> decision. When R decides what is in a column it uses only 
> some values
> >> from the beginning of a file.
> >
> > Not R, Excel.  Excel tells ODBC what the column types are.
> >

Fredrik Lundgren

2006-Jan-10 22:01 UTC

head link

[R] "Missing value representation in Excel before - solved in a way- summary"

Thanks to Leif Kirschenbaum, Brian Ripley and Petr Pikal,

who helped with this problem. Unfortunately it appears as if the problem 
with columns beginning with NAs is deeply connected to Microsoft ODBC 
DLL's automatic determination of column type based on the contents of 
the first N rows of each column and further that this behaviour can't be 
reliably tweaked so that RODBC can function with every file.xls. Until 
the Microsoft ODBC DLL is fixed I will have to use read.table from 
'textfile' or 'clipboard' or Gregory Warnes read.xls in package
'gdata'.
All three methods have worked OK (inspite of some minor formatting 
aspects of dateformats) even with this tricky file.

Fredrik
----- Original Message ----- 
From: "Leif Kirschenbaum" <leif at reflectivity.com>
To: <r-help at stat.math.ethz.ch>
Sent: Tuesday, January 10, 2006 9:50 PM
Subject: Re: [R] "Missing value representation in Excel before

>I reproduce from memory my exhaustive look into this issue.
> RODBC uses the Microsoft ODBC DLL's developed by Microsoft.
>  These DLL's perform an automatic determination of column type based 
> on the contents of the first N rows of cells in each column, where N 
> [0,16]. N may be set in the Windows system registry, and there are a 
> few other things that may be set in the system registry which control 
> how the DLL parses an Excel spreadsheet. Unfortunately, the Microsoft 
> DLL's do not always pay attention to the registry settings and do not 
> always interpret them in the same manner.
>  The end result is that no matter what you do with RODBC, and no 
> matter how the authors of RODBC re-write it, some Excel spreadsheets 
> will always be unreadable via RODBC given particular insidious 
> combinations of data in some columns of your spreadsheet. (until such 
> time as Microsoft fixes their DLL bugs, I mean features) I have some 
> faint recollection that the Microsoft DLL incorrectly parses a column 
> with non-empty rows due to some formatting issue of those particular 
> columns, which I was unable to cure by re-formatting the source 
> worksheet.
>  I have had to resort to using the gdata package which runs a Perl 
> script "xls2csv.pl", which converts an Excel spreadsheet to CSV,
for a
> few Excel spreadsheets which exhibit the particular anomalies 
> preventing use of RODBC.
>
> Leif Kirschenbaum
> Senior Yield Engineer
> Reflectivity, Inc.
> (408) 737-8100 x307
> leif at reflectivity.com
>
>> Message: 21
>> Date: Mon, 9 Jan 2006 18:06:49 +0100
>> From: "Fredrik Lundgren" <fredrik.bg.lundgren at
bredband.net>
>> Subject: Re: [R] "Missing value representation in Excel before
>> extraction to R with RODBC"
>> To: "Prof Brian Ripley" <ripley at stats.ox.ac.uk>,
"Petr Pikal"
>> <petr.pikal at precheza.cz>
>> Cc: R-help <r-help at stat.math.ethz.ch>
>> Message-ID: <000801c6153f$14ab6e60$4a9d72d5 at Larissa>
>> Content-Type: text/plain; format=flowed;
charset="iso-8859-1";
>> reply-type=response
>>
>> Dear list,
>>
>> Well, those columns in Excel that starts with NA (actually 8
>> NA's in my
>> case) is imported as all NA in R but if the columns starts
>> with at least
>> 3 cells with values (i.e not NA) the are imported correctly
>> to R. When
>> as.is=TRUE is used a simular conversion takes place but now
>> as all <NA>
>> and dates are represented as date-and-time.
>> Is there any way to get this correct even when the Excel
>> columns start
>> with several NA's?
>>
>> Sincerely
>> Fredrik
>>
>>
>> ----- Original Message ----- 
>> From: "Prof Brian Ripley" <ripley at stats.ox.ac.uk>
>> To: "Petr Pikal" <petr.pikal at precheza.cz>
>> Cc: "Fredrik Lundgren" <fredrik.bg.lundgren at
bredband.net>; "R-help"
>> <r-help at stat.math.ethz.ch>
>> Sent: Monday, January 09, 2006 9:36 AM
>> Subject: Re: [R] "Missing value representation in Excel before
>> extraction to R with RODBC"
>>
>>
>> > On Mon, 9 Jan 2006, Petr Pikal wrote:
>> >
>> >> Hi
>> >>
>> >> I believe it has something to do with the column
identification
>> >> decision. When R decides what is in a column it uses only
>> some values
>> >> from the beginning of a file.
>> >
>> > Not R, Excel.  Excel tells ODBC what the column types are.
>> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Maybe Matching Threads

Search for more seemingly similar threads

R help - Jan 2006 - "Missing value representation in Excel before

[R] "Missing value representation in Excel before

[R] "Missing value representation in Excel before - solved in a way- summary"

Maybe Matching Threads