thr3ads.net - R devel - [Rd] read.fwf doesn't work with header = TRUE (PR#8226) [Oct 2005]

If this information is useful, please help other people find it:
Share via:

Emmanuel.Paradis@mpl.ird.fr

2005-Oct-20 08:40 UTC

[Rd] read.fwf doesn't work with header = TRUE (PR#8226)

Full_Name: Emmanuel Paradis
Version: 2.1.1
OS: Linux
Submission from: (NULL) (193.49.41.105)


read.fwf(..., header = TRUE) does not work properly since:

1/ the original header is printed on the console and not in FILE;
2/ the different 'parts' of the header should be separated with tabs
   to work with the call to read.table.

Here is a suggested fix for src/library/utils/R/read.fwf.R:

38c38,40
<         cat(FILE, headerline, "\n")
--->         headerline <- unlist(strsplit(headerline, " {1,}"))
>         headerline <- paste(headerline, collapse = "\t")
>         cat(file = FILE, headerline, "\n")
PS: my R is not updated by read.fwf.R does not seem to have been changed in R
2.2.0.

ripley@stats.ox.ac.uk

2005-Oct-21 07:36 UTC

head link

[Rd] read.fwf doesn't work with header = TRUE (PR#8226)

On Thu, 20 Oct 2005 Emmanuel.Paradis at mpl.ird.fr wrote:
> Full_Name: Emmanuel Paradis
> Version: 2.1.1
> OS: Linux
> Submission from: (NULL) (193.49.41.105)
>
>
> read.fwf(..., header = TRUE) does not work properly since:
>
> 1/ the original header is printed on the console and not in FILE;
> 2/ the different 'parts' of the header should be separated with
tabs
>   to work with the call to read.table.
>
> Here is a suggested fix for src/library/utils/R/read.fwf.R:
>
> 38c38,40
> <         cat(FILE, headerline, "\n")
> ---
>>         headerline <- unlist(strsplit(headerline, "
{1,}"))
>>         headerline <- paste(headerline, collapse = "\t")
>>         cat(file = FILE, headerline, "\n")
Thanks, but I don't think that is right.  It assumes the header line is 
space-delimited (or at least that spaces get converted to tabs).  We have 
not specified the format of the header line, and it cannot usefully be 
fixed format.  So I think we need to specify it is delimited by 'sep'
(not tab).


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Emmanuel.Paradis@mpl.ird.fr

2005-Oct-21 16:03 UTC

head link

[Rd] read.fwf doesn't work with header = TRUE (PR#8226)

Prof Brian Ripley wrote:> On Thu, 20 Oct 2005 Emmanuel.Paradis at mpl.ird.fr wrote:
> 
>> Full_Name: Emmanuel Paradis
>> Version: 2.1.1
>> OS: Linux
>> Submission from: (NULL) (193.49.41.105)
>>
>>
>> read.fwf(..., header = TRUE) does not work properly since:
>>
>> 1/ the original header is printed on the console and not in FILE;
>> 2/ the different 'parts' of the header should be separated with
tabs
>>   to work with the call to read.table.
>>
>> Here is a suggested fix for src/library/utils/R/read.fwf.R:
>>
>> 38c38,40
>> <         cat(FILE, headerline, "\n")
>> ---
>>
>>>         headerline <- unlist(strsplit(headerline, "
{1,}"))
>>>         headerline <- paste(headerline, collapse =
"\t")
>>>         cat(file = FILE, headerline, "\n")
> 
> 
> Thanks, but I don't think that is right.  It assumes the header line is
> space-delimited (or at least that spaces get converted to tabs).  We 
> have not specified the format of the header line, and it cannot usefully 
> be fixed format.  So I think we need to specify it is delimited by
'sep'
> (not tab).
I see, but suppose we read selectively some columns in a file, eg with 
widths=c(1, -4, 2), how can we know how many variables have been skipped 
and then select the appropriate names in the header line?

Here is another proposed fix, but this assumes the header line is in 
fixed-width format (as specified by 'widths'):

38c38,41
<         cat(FILE, headerline, "\n")
---
 >         head.last <- cumsum(widths)
 >         head.first <- head.last - widths + 1
 >         headerline <- substring(headerline, head.first,
head.last)[drop]
 >         cat(file = FILE, headerline, "\n", sep = sep)

?read.fwf says clearly that sep is used internally.

ripley@stats.ox.ac.uk

2005-Oct-23 16:59 UTC

head link

[Rd] read.fwf doesn't work with header = TRUE (PR#8226)

On Fri, 21 Oct 2005, Emmanuel Paradis wrote:
> Prof Brian Ripley wrote:
>> On Thu, 20 Oct 2005 Emmanuel.Paradis at mpl.ird.fr wrote:
>> 
>>> Full_Name: Emmanuel Paradis
>>> Version: 2.1.1
>>> OS: Linux
>>> Submission from: (NULL) (193.49.41.105)
>>> 
>>> 
>>> read.fwf(..., header = TRUE) does not work properly since:
>>> 
>>> 1/ the original header is printed on the console and not in FILE;
>>> 2/ the different 'parts' of the header should be separated
with tabs
>>>   to work with the call to read.table.
>>> 
>>> Here is a suggested fix for src/library/utils/R/read.fwf.R:
>>> 
>>> 38c38,40
>>> <         cat(FILE, headerline, "\n")
>>> ---
>>> 
>>>>         headerline <- unlist(strsplit(headerline, "
{1,}"))
>>>>         headerline <- paste(headerline, collapse =
"\t")
>>>>         cat(file = FILE, headerline, "\n")
>> 
>> 
>> Thanks, but I don't think that is right.  It assumes the header
line is
>> space-delimited (or at least that spaces get converted to tabs).  We
have
>> not specified the format of the header line, and it cannot usefully be 
>> fixed format.  So I think we need to specify it is delimited by
'sep'
>> (not tab).
>
> I see, but suppose we read selectively some columns in a file, eg with 
> widths=c(1, -4, 2), how can we know how many variables have been skipped
and
> then select the appropriate names in the header line?
You do not: as the help file says

      Negative-width fields are used to indicate columns to be skipped,
      eg '-5' to skip 5 columns.  These fields are not seen by
      'read.table' and so should not be included in a
'col.names' or
      'colClasses' argument.
> Here is another proposed fix, but this assumes the header line is in 
> fixed-width format (as specified by 'widths'):
What happens if there are multi-line records?  Your `fix' crashes.
> 38c38,41
> <         cat(FILE, headerline, "\n")
> ---
>>         head.last <- cumsum(widths)
>>         head.first <- head.last - widths + 1
>>         headerline <- substring(headerline, head.first,
head.last)[drop]
>>         cat(file = FILE, headerline, "\n", sep = sep)
>
> ?read.fwf says clearly that sep is used internally.
Not so: please check the current version.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Emmanuel.Paradis@mpl.ird.fr

2005-Oct-24 13:22 UTC

head link

[Rd] read.fwf doesn't work with header = TRUE (PR#8226)

Prof Brian Ripley wrote:> On Fri, 21 Oct 2005, Emmanuel Paradis wrote:
> 
>> Prof Brian Ripley wrote:
>>
>>> On Thu, 20 Oct 2005 Emmanuel.Paradis at mpl.ird.fr wrote:
>>>
>>>> Full_Name: Emmanuel Paradis
>>>> Version: 2.1.1
>>>> OS: Linux
>>>> Submission from: (NULL) (193.49.41.105)
>>>>
>>>>
>>>> read.fwf(..., header = TRUE) does not work properly since:
>>>>
>>>> 1/ the original header is printed on the console and not in
FILE;
>>>> 2/ the different 'parts' of the header should be
separated with tabs
>>>>   to work with the call to read.table.
>>>>
>>>> Here is a suggested fix for src/library/utils/R/read.fwf.R:
>>>>
>>>> 38c38,40
>>>> <         cat(FILE, headerline, "\n")
>>>> ---
>>>>
>>>>>         headerline <- unlist(strsplit(headerline, "
{1,}"))
>>>>>         headerline <- paste(headerline, collapse =
"\t")
>>>>>         cat(file = FILE, headerline, "\n")
>>>
>>>
>>>
>>> Thanks, but I don't think that is right.  It assumes the header
line
>>> is space-delimited (or at least that spaces get converted to tabs).
>>> We have not specified the format of the header line, and it cannot 
>>> usefully be fixed format.  So I think we need to specify it is 
>>> delimited by 'sep'
>>> (not tab).
>>
>>
>> I see, but suppose we read selectively some columns in a file, eg with 
>> widths=c(1, -4, 2), how can we know how many variables have been 
>> skipped and then select the appropriate names in the header line?
> 
> 
> You do not: as the help file says
> 
>      Negative-width fields are used to indicate columns to be skipped,
>      eg '-5' to skip 5 columns.  These fields are not seen by
>      'read.table' and so should not be included in a
'col.names' or
>      'colClasses' argument.
OK, but it is strange to me to not have all variables named in a header 
line.
>> Here is another proposed fix, but this assumes the header line is in 
>> fixed-width format (as specified by 'widths'):
> 
> 
> What happens if there are multi-line records?  Your `fix' crashes.
It crashes anyway because it should be [!drop] and not [drop] ;)
>> 38c38,41
>> <         cat(FILE, headerline, "\n")
>> ---
>>
>>>         head.last <- cumsum(widths)
>>>         head.first <- head.last - widths + 1
>>>         headerline <- substring(headerline, head.first,
head.last)[drop]
>>>         cat(file = FILE, headerline, "\n", sep = sep)
>>
>>
>> ?read.fwf says clearly that sep is used internally.
> 
> 
> Not so: please check the current version.
Here is what I have in R 2.2.0:

      sep: character; the separator used internally; should be a
           character that does not occur in the file.

So, should the fix be simply:

38c38
<         cat(FILE, headerline, "\n")
---
 >         cat(file = FILE, headerline, "\n")

?

Maybe Matching Threads

Search for more maybe matching threads

R devel - Oct 2005 - read.fwf doesn't work with header = TRUE (PR#8226)

[Rd] read.fwf doesn't work with header = TRUE (PR#8226)

[Rd] read.fwf doesn't work with header = TRUE (PR#8226)

[Rd] read.fwf doesn't work with header = TRUE (PR#8226)

[Rd] read.fwf doesn't work with header = TRUE (PR#8226)

[Rd] read.fwf doesn't work with header = TRUE (PR#8226)

Maybe Matching Threads