thr3ads.net - R help - [R] OT: batch processing XLS files to CSV [May 2008]

If this information is useful, please help other people find it:
Share via:

Chris Evans

2008-May-28 17:52 UTC

[R] OT: batch processing XLS files to CSV

Dear R gurus, particularly those of generous M$ tolerance and diverse 
gifts and knowledge!

I have an interesting challenge that I will end up crunching in R 
involving service usage by patients.  Maybe I can do all of it in R but 
I can't see how yet.

My situation is that our IT Department can give me loads of XLS files 
about patients one of our services have seen.  The are one per patient 
per time period.  All the data are in the first sheet of the XLS files 
and that sheet contains four variable length but fixed format matrices 
of data:
1) demographics (actually, this is fixed length, one row!);
2) community contacts with services, variable length, rarely zero rows 
but could be;
3) inpatient admissions, variable length, often zero rows;
and CPA information (don't ask what that is!), two rows, fixed format, 
just to make things tricky, they're spearated by a fixed few junk rows 
in the xls files. The column format of each block is different.

Each block starts with standard label rows so it will be easy to 
identify these start points and know the format on the rows that follow 
each one.  I could use perl to scan for these and then read the zero to 
many lines of the data in the matrix and end on finding the next header.

I would be fairly happy to do this with perl but would need to convert 
the xls (xls 2002) files to CSV to get at them in Perl (I think).

Anyone out there done anything like this and can give me any advice? 
I'm sorry, I'm sure there are more specific lists or web resources but I
think the skills are here too and if someone can tell me how to do this 
all in R, I'd be fascinated.

Many thanks,

Chris

-- 
Chris Evans <chris at psyctc.org> Skype: chris-psyctc
Professor of Psychotherapy, Nottingham University;
Consultant Psychiatrist in Psychotherapy, Notts PDD network;
Research Programmes Director, Nottinghamshire NHS Trust;
*If I am writing from one of those roles, it will be clear. Otherwise*
*my views are my own and not representative of those institutions    *

Pascal Pascal

2008-May-29 09:31 UTC

head link

[R] OT: batch processing XLS files to CSV

Un texte encapsul? et encod? dans un jeu de caract?res inconnu a ?t? nettoy?...
Nom : non disponible
URL :
<https://stat.ethz.ch/pipermail/r-help/attachments/20080529/25f263cd/attachment.pl>

Gabor Grothendieck

2008-May-29 10:24 UTC

head link

[R] OT: batch processing XLS files to CSV

1. read.xls in the gdata package has a pattern= argument that
will skip everything before the first cell containing that pattern.

It calls xls2csv from the same package which in turn uses
perl so it works on all platforms and does not require a copy
of Excel.

2. If that is not good enough you could call xls2csv directly and pick
out the required rows in a similar way to how read.xls works.  That
could be done entirely in R so no perl programming would be needed.

3. If even that is not sufficient then you could probably make a small
change to the underlying perl program that comes with gdata.

4. Alternatives to the above include the rcom and RDCOMClient packages
which allow one to control Excel from R. This only works on Windows,  This
requires a copy of Excel and involves detailed programming but is very
flexible.

On Wed, May 28, 2008 at 1:52 PM, Chris Evans <chris at psyctc.org>
wrote:> Dear R gurus, particularly those of generous M$ tolerance and diverse gifts
> and knowledge!
>
> I have an interesting challenge that I will end up crunching in R involving
> service usage by patients.  Maybe I can do all of it in R but I can't
see
> how yet.
>
> My situation is that our IT Department can give me loads of XLS files about
> patients one of our services have seen.  The are one per patient per time
> period.  All the data are in the first sheet of the XLS files and that
sheet
> contains four variable length but fixed format matrices of data:
> 1) demographics (actually, this is fixed length, one row!);
> 2) community contacts with services, variable length, rarely zero rows but
> could be;
> 3) inpatient admissions, variable length, often zero rows;
> and CPA information (don't ask what that is!), two rows, fixed format,
just
> to make things tricky, they're spearated by a fixed few junk rows in
the xls
> files. The column format of each block is different.
>
> Each block starts with standard label rows so it will be easy to identify
> these start points and know the format on the rows that follow each one.  I
> could use perl to scan for these and then read the zero to many lines of
the
> data in the matrix and end on finding the next header.
>
> I would be fairly happy to do this with perl but would need to convert the
> xls (xls 2002) files to CSV to get at them in Perl (I think).
>
> Anyone out there done anything like this and can give me any advice?
I'm
> sorry, I'm sure there are more specific lists or web resources but I
think
> the skills are here too and if someone can tell me how to do this all in R,
> I'd be fascinated.
>
> Many thanks,
>
> Chris
>
> --
> Chris Evans <chris at psyctc.org> Skype: chris-psyctc
> Professor of Psychotherapy, Nottingham University;
> Consultant Psychiatrist in Psychotherapy, Notts PDD network;
> Research Programmes Director, Nottinghamshire NHS Trust;
> *If I am writing from one of those roles, it will be clear. Otherwise*
> *my views are my own and not representative of those institutions    *
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Moshe Olshansky

2008-May-29 12:15 UTC

head link

[R] OT: batch processing XLS files to CSV

Another, not very nice but quite practical possibility
would be to put all your Excel files into a single
directory, convert one such file into CSV or any other
text file while recording a macro doing this (Record
Macro in Excel), then edit the recording macro making
it run over all the files in that directory (some
simple VBA is required). Once all files become regular
text files (CSV or other) you can easily process them
using Perl (or even R itself).

Regards,

Moshe.

--- Chris Evans <chris at psyctc.org> wrote:
> Dear R gurus, particularly those of generous M$
> tolerance and diverse 
> gifts and knowledge!
> 
> I have an interesting challenge that I will end up
> crunching in R 
> involving service usage by patients.  Maybe I can do
> all of it in R but 
> I can't see how yet.
> 
> My situation is that our IT Department can give me
> loads of XLS files 
> about patients one of our services have seen.  The
> are one per patient 
> per time period.  All the data are in the first
> sheet of the XLS files 
> and that sheet contains four variable length but
> fixed format matrices 
> of data:
> 1) demographics (actually, this is fixed length, one
> row!);
> 2) community contacts with services, variable
> length, rarely zero rows 
> but could be;
> 3) inpatient admissions, variable length, often zero
> rows;
> and CPA information (don't ask what that is!), two
> rows, fixed format, 
> just to make things tricky, they're spearated by a
> fixed few junk rows 
> in the xls files. The column format of each block is
> different.
> 
> Each block starts with standard label rows so it
> will be easy to 
> identify these start points and know the format on
> the rows that follow 
> each one.  I could use perl to scan for these and
> then read the zero to 
> many lines of the data in the matrix and end on
> finding the next header.
> 
> I would be fairly happy to do this with perl but
> would need to convert 
> the xls (xls 2002) files to CSV to get at them in
> Perl (I think).
> 
> Anyone out there done anything like this and can
> give me any advice? 
> I'm sorry, I'm sure there are more specific lists or
> web resources but I 
> think the skills are here too and if someone can
> tell me how to do this 
> all in R, I'd be fascinated.
> 
> Many thanks,
> 
> Chris
> 
> -- 
> Chris Evans <chris at psyctc.org> Skype: chris-psyctc
> Professor of Psychotherapy, Nottingham University;
> Consultant Psychiatrist in Psychotherapy, Notts PDD
> network;
> Research Programmes Director, Nottinghamshire NHS
> Trust;
> *If I am writing from one of those roles, it will be
> clear. Otherwise*
> *my views are my own and not representative of those
> institutions    *
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>

sjbarry

2008-May-29 22:29 UTC

head link

[R] OT: batch processing XLS files to CSV

[This email is either empty or too large to be displayed at this time]

Reasonably Related Threads

Search for more possibly parallel threads

R help - May 2008 - OT: batch processing XLS files to CSV

[R] OT: batch processing XLS files to CSV

[R] OT: batch processing XLS files to CSV

[R] OT: batch processing XLS files to CSV

[R] OT: batch processing XLS files to CSV

[R] OT: batch processing XLS files to CSV

Reasonably Related Threads