Emmanuel.Paradis@mpl.ird.fr
2005-Oct-20 08:40 UTC
[Rd] read.fwf doesn't work with header = TRUE (PR#8226)
Full_Name: Emmanuel Paradis Version: 2.1.1 OS: Linux Submission from: (NULL) (193.49.41.105) read.fwf(..., header = TRUE) does not work properly since: 1/ the original header is printed on the console and not in FILE; 2/ the different 'parts' of the header should be separated with tabs to work with the call to read.table. Here is a suggested fix for src/library/utils/R/read.fwf.R: 38c38,40 < cat(FILE, headerline, "\n") ---> headerline <- unlist(strsplit(headerline, " {1,}")) > headerline <- paste(headerline, collapse = "\t") > cat(file = FILE, headerline, "\n")PS: my R is not updated by read.fwf.R does not seem to have been changed in R 2.2.0.
ripley@stats.ox.ac.uk
2005-Oct-21 07:36 UTC
[Rd] read.fwf doesn't work with header = TRUE (PR#8226)
On Thu, 20 Oct 2005 Emmanuel.Paradis at mpl.ird.fr wrote:> Full_Name: Emmanuel Paradis > Version: 2.1.1 > OS: Linux > Submission from: (NULL) (193.49.41.105) > > > read.fwf(..., header = TRUE) does not work properly since: > > 1/ the original header is printed on the console and not in FILE; > 2/ the different 'parts' of the header should be separated with tabs > to work with the call to read.table. > > Here is a suggested fix for src/library/utils/R/read.fwf.R: > > 38c38,40 > < cat(FILE, headerline, "\n") > --- >> headerline <- unlist(strsplit(headerline, " {1,}")) >> headerline <- paste(headerline, collapse = "\t") >> cat(file = FILE, headerline, "\n")Thanks, but I don't think that is right. It assumes the header line is space-delimited (or at least that spaces get converted to tabs). We have not specified the format of the header line, and it cannot usefully be fixed format. So I think we need to specify it is delimited by 'sep' (not tab). -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Emmanuel.Paradis@mpl.ird.fr
2005-Oct-21 16:03 UTC
[Rd] read.fwf doesn't work with header = TRUE (PR#8226)
Prof Brian Ripley wrote:> On Thu, 20 Oct 2005 Emmanuel.Paradis at mpl.ird.fr wrote: > >> Full_Name: Emmanuel Paradis >> Version: 2.1.1 >> OS: Linux >> Submission from: (NULL) (193.49.41.105) >> >> >> read.fwf(..., header = TRUE) does not work properly since: >> >> 1/ the original header is printed on the console and not in FILE; >> 2/ the different 'parts' of the header should be separated with tabs >> to work with the call to read.table. >> >> Here is a suggested fix for src/library/utils/R/read.fwf.R: >> >> 38c38,40 >> < cat(FILE, headerline, "\n") >> --- >> >>> headerline <- unlist(strsplit(headerline, " {1,}")) >>> headerline <- paste(headerline, collapse = "\t") >>> cat(file = FILE, headerline, "\n") > > > Thanks, but I don't think that is right. It assumes the header line is > space-delimited (or at least that spaces get converted to tabs). We > have not specified the format of the header line, and it cannot usefully > be fixed format. So I think we need to specify it is delimited by 'sep' > (not tab).I see, but suppose we read selectively some columns in a file, eg with widths=c(1, -4, 2), how can we know how many variables have been skipped and then select the appropriate names in the header line? Here is another proposed fix, but this assumes the header line is in fixed-width format (as specified by 'widths'): 38c38,41 < cat(FILE, headerline, "\n") --- > head.last <- cumsum(widths) > head.first <- head.last - widths + 1 > headerline <- substring(headerline, head.first, head.last)[drop] > cat(file = FILE, headerline, "\n", sep = sep) ?read.fwf says clearly that sep is used internally.
ripley@stats.ox.ac.uk
2005-Oct-23 16:59 UTC
[Rd] read.fwf doesn't work with header = TRUE (PR#8226)
On Fri, 21 Oct 2005, Emmanuel Paradis wrote:> Prof Brian Ripley wrote: >> On Thu, 20 Oct 2005 Emmanuel.Paradis at mpl.ird.fr wrote: >> >>> Full_Name: Emmanuel Paradis >>> Version: 2.1.1 >>> OS: Linux >>> Submission from: (NULL) (193.49.41.105) >>> >>> >>> read.fwf(..., header = TRUE) does not work properly since: >>> >>> 1/ the original header is printed on the console and not in FILE; >>> 2/ the different 'parts' of the header should be separated with tabs >>> to work with the call to read.table. >>> >>> Here is a suggested fix for src/library/utils/R/read.fwf.R: >>> >>> 38c38,40 >>> < cat(FILE, headerline, "\n") >>> --- >>> >>>> headerline <- unlist(strsplit(headerline, " {1,}")) >>>> headerline <- paste(headerline, collapse = "\t") >>>> cat(file = FILE, headerline, "\n") >> >> >> Thanks, but I don't think that is right. It assumes the header line is >> space-delimited (or at least that spaces get converted to tabs). We have >> not specified the format of the header line, and it cannot usefully be >> fixed format. So I think we need to specify it is delimited by 'sep' >> (not tab). > > I see, but suppose we read selectively some columns in a file, eg with > widths=c(1, -4, 2), how can we know how many variables have been skipped and > then select the appropriate names in the header line?You do not: as the help file says Negative-width fields are used to indicate columns to be skipped, eg '-5' to skip 5 columns. These fields are not seen by 'read.table' and so should not be included in a 'col.names' or 'colClasses' argument.> Here is another proposed fix, but this assumes the header line is in > fixed-width format (as specified by 'widths'):What happens if there are multi-line records? Your `fix' crashes.> 38c38,41 > < cat(FILE, headerline, "\n") > --- >> head.last <- cumsum(widths) >> head.first <- head.last - widths + 1 >> headerline <- substring(headerline, head.first, head.last)[drop] >> cat(file = FILE, headerline, "\n", sep = sep) > > ?read.fwf says clearly that sep is used internally.Not so: please check the current version. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Emmanuel.Paradis@mpl.ird.fr
2005-Oct-24 13:22 UTC
[Rd] read.fwf doesn't work with header = TRUE (PR#8226)
Prof Brian Ripley wrote:> On Fri, 21 Oct 2005, Emmanuel Paradis wrote: > >> Prof Brian Ripley wrote: >> >>> On Thu, 20 Oct 2005 Emmanuel.Paradis at mpl.ird.fr wrote: >>> >>>> Full_Name: Emmanuel Paradis >>>> Version: 2.1.1 >>>> OS: Linux >>>> Submission from: (NULL) (193.49.41.105) >>>> >>>> >>>> read.fwf(..., header = TRUE) does not work properly since: >>>> >>>> 1/ the original header is printed on the console and not in FILE; >>>> 2/ the different 'parts' of the header should be separated with tabs >>>> to work with the call to read.table. >>>> >>>> Here is a suggested fix for src/library/utils/R/read.fwf.R: >>>> >>>> 38c38,40 >>>> < cat(FILE, headerline, "\n") >>>> --- >>>> >>>>> headerline <- unlist(strsplit(headerline, " {1,}")) >>>>> headerline <- paste(headerline, collapse = "\t") >>>>> cat(file = FILE, headerline, "\n") >>> >>> >>> >>> Thanks, but I don't think that is right. It assumes the header line >>> is space-delimited (or at least that spaces get converted to tabs). >>> We have not specified the format of the header line, and it cannot >>> usefully be fixed format. So I think we need to specify it is >>> delimited by 'sep' >>> (not tab). >> >> >> I see, but suppose we read selectively some columns in a file, eg with >> widths=c(1, -4, 2), how can we know how many variables have been >> skipped and then select the appropriate names in the header line? > > > You do not: as the help file says > > Negative-width fields are used to indicate columns to be skipped, > eg '-5' to skip 5 columns. These fields are not seen by > 'read.table' and so should not be included in a 'col.names' or > 'colClasses' argument.OK, but it is strange to me to not have all variables named in a header line.>> Here is another proposed fix, but this assumes the header line is in >> fixed-width format (as specified by 'widths'): > > > What happens if there are multi-line records? Your `fix' crashes.It crashes anyway because it should be [!drop] and not [drop] ;)>> 38c38,41 >> < cat(FILE, headerline, "\n") >> --- >> >>> head.last <- cumsum(widths) >>> head.first <- head.last - widths + 1 >>> headerline <- substring(headerline, head.first, head.last)[drop] >>> cat(file = FILE, headerline, "\n", sep = sep) >> >> >> ?read.fwf says clearly that sep is used internally. > > > Not so: please check the current version.Here is what I have in R 2.2.0: sep: character; the separator used internally; should be a character that does not occur in the file. So, should the fix be simply: 38c38 < cat(FILE, headerline, "\n") --- > cat(file = FILE, headerline, "\n") ?