Dear All,
Writing sooner than I thought I'd need to.
I'm using R 2.4 on Mac OS X, with RODBC, PostgreSQL 8.1 and Actual's
ODBC driver. I have all my data in Filemaker 8.5, but it is
automatically exported into PostgreSQL for analysis as Filemaker's ODBC
and JDBC access is awful, slow and has a tendency to crash.
I have disability data where for each patient there is a survival time
in years from disease onset to a particular disease stage, namely
unilateral support, bilateral support, wheelchair use, and death. Valid
values may include NULL (patient hasn't reached that stage), 0 (for
example, patient needed support immediately at disease onset), and any
positive integer.
When I query the database manually using psql, it is clear there are
NULL values.
        3 |         3 |         18 |       |       27 |        1
          |           |            |       |       13 |        1
        1 |         5 |            |       |       10 |        0
       10 |        13 |         13 |       |       22 |        0
However, these are all converted to zeros when I use RODBC's sqlQuery(),
making interpretation impossible. I have tried using the nullstring and
na.strings options, but these don't seem to have any effect. I have
tried various combinations of NULL, NA and "". Forgive my awkward SQL.
> channel = odbcConnect("ataxia", uid="mark")
> disease = sqlQuery(channel, "select calc_survival_unilateral_support
as unlateral, calc_survival_bilateral_support as bilateral,
calc_survival_wheelchair as wheelchair,calc_survival_death as death,
calc_follow_up as followup, has_family_history_ataxia as familial from
clinical, patient where clinical.patient_fk = patient_id and excluded=0
and calc_walking_disability_valid=1")> disease   # and show results
127        3         3         18     0       27        1
128        0         0          0     0       13        1
129        1         5          0     0       10        0
130       10        13         13     0       22        0
It doesn't seem to be the old repeating rows NULL bug talked about <a
href="http://tolstoy.newcastle.edu.au/R/help/04/07/0803.html">here</a>.
Is this because my ODBC driver is not returning the correct values for
RODBC to parse? Is there anyway of debugging this (the intricacies of
ODBC are beyond my skill) and is my only alternative to store a
non-valid number in the database (999?) and use my query or R to remove
those datapoints afterwards?
Looking in the archives, there are lots of people asking about how to
convert NAs to numeric, but I want the NAs passed through unaltered!
Many thanks in advance,
Mark
-- 
Dr. Mark Wardle
Clinical research fellow and Specialist Registrar in Neurology,
C2-B2 link, Cardiff University, Heath Park, CARDIFF, CF14 4XN. UK
Mark Wardle wrote:> ... > Is this because my ODBC driver is not returning the correct values for > RODBC to parse? Is there anyway of debugging this (the intricacies of > ODBC are beyond my skill) and is my only alternative to store a > non-valid number in the database (999?) and use my query or R to remove > those datapoints afterwards? > ... >Actually, it appears that the Actual ODBC driver isn't returning the data properly. I've just tested it using Excel and it returns zeros for NULLs. Wasn't able to use iodbctest as it got very confused and tried to connect to a MySQL database (which I don't have). There is nothing RODBC can magic to fix this. It's a bit odd, as I use Filemaker to export data via raw SQL commands against the ODBC driver, and that does cope with NULLs, but it appears fetching, at least with Excel and RODBC, does not. I was just going to try installing "Rdbi" to see whether that has better luck, but I can't access CRAN this morning. Hopefully the "403 Forbidden" message will be temporary! So unless anyone knows a better alternative, I shall have to store nonsense values rather than NULLs in the database (or fix it within the SELECT query as a quick hack solution instead). Best wishes, Mark -- Dr. Mark Wardle Clinical research fellow and Specialist Registrar in Neurology, C2-B2 link, Cardiff University, Heath Park, CARDIFF, CF14 4XN. UK
What sqltype(s) are your variables?
For numeric types, RODBC merely maps values the ODBC driver says are NULL 
to NA.  Since you appear not to have character data,
nullstring: character string to be used when reading 'SQL_NULL_DATA'
           character items from the database.
na.strings: character string(s) to be mapped to 'NA' when reading
           character data.
are not relevant to you.
At least on Windows and Linux the PostgreSQL 8.1 ODBC driver works 
correctly, and NULLs in numeric columns are mapped to NAs in R.  (There is 
an example in my test suite.)
On Tue, 17 Oct 2006, Mark Wardle wrote:
> Dear All,
>
> Writing sooner than I thought I'd need to.
>
> I'm using R 2.4 on Mac OS X, with RODBC, PostgreSQL 8.1 and
Actual's
> ODBC driver. I have all my data in Filemaker 8.5, but it is
> automatically exported into PostgreSQL for analysis as Filemaker's ODBC
> and JDBC access is awful, slow and has a tendency to crash.
>
> I have disability data where for each patient there is a survival time
> in years from disease onset to a particular disease stage, namely
> unilateral support, bilateral support, wheelchair use, and death. Valid
> values may include NULL (patient hasn't reached that stage), 0 (for
> example, patient needed support immediately at disease onset), and any
> positive integer.
>
> When I query the database manually using psql, it is clear there are
> NULL values.
>        3 |         3 |         18 |       |       27 |        1
>          |           |            |       |       13 |        1
>        1 |         5 |            |       |       10 |        0
>       10 |        13 |         13 |       |       22 |        0
No, it is not clear.  It is clear that there are values which are printed 
as blank or empty strings.
> However, these are all converted to zeros when I use RODBC's
sqlQuery(),
> making interpretation impossible. I have tried using the nullstring and
> na.strings options, but these don't seem to have any effect. I have
> tried various combinations of NULL, NA and "". Forgive my awkward
SQL.
>
>> channel = odbcConnect("ataxia", uid="mark")
>> disease = sqlQuery(channel, "select
calc_survival_unilateral_support
> as unlateral, calc_survival_bilateral_support as bilateral,
> calc_survival_wheelchair as wheelchair,calc_survival_death as death,
> calc_follow_up as followup, has_family_history_ataxia as familial from
> clinical, patient where clinical.patient_fk = patient_id and excluded=0
> and calc_walking_disability_valid=1")
>> disease   # and show results
>
> 127        3         3         18     0       27        1
> 128        0         0          0     0       13        1
> 129        1         5          0     0       10        0
> 130       10        13         13     0       22        0
>
> It doesn't seem to be the old repeating rows NULL bug talked about
<a
>
href="http://tolstoy.newcastle.edu.au/R/help/04/07/0803.html">here</a>.
That was about R 1.9.1, about a problem solved long before then.  Let's 
not drag up ancient history ....
> Is this because my ODBC driver is not returning the correct values for
> RODBC to parse? Is there anyway of debugging this (the intricacies of
> ODBC are beyond my skill) and is my only alternative to store a
> non-valid number in the database (999?) and use my query or R to remove
> those datapoints afterwards?
Find out what the types involved are.  Perhaps try as.is=FALSE?
> Looking in the archives, there are lots of people asking about how to
> convert NAs to numeric, but I want the NAs passed through unaltered!
Since the mapping of NULLs to NAs works in other examples, I find it hard 
to see how this can be an RODBC issue.
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595