Artur Neumann
2016-Oct-13 19:12 UTC
[R] read.epiinfo() returns wrong data when reading epiinfo files with \032 at the end
Sorry to send this report by email, but I cannot see a way how to create a login on https://bugs.r-project.org Problem-Description: I'm using the foreign package to read EPIINFO 6 files. (.REC) All my .REC files end with a single character after the last line break. Octal: 032 / Hex: 1A The output data frame of read.epiinfo() has an extra line that has the content of the first line but with shifted data and \032 added at the beginning Expected result: the last line (content: " \032") would be ignored How to reproduce: Read an epiinfo 6 file with Octal: 032 / Hex: 1A as the only character in the last line -------------------------------------------------- earData2<-read.epiinfo("EAR35.REC") Warnmeldungen: 1: In read.epiinfo("EAR35.REC") : wrong number of records 2: In matrix(datalines, nrow = multiline) : Datenl?nge [3226] ist kein Teiler oder Vielfaches der Anzahl der Zeilen [3] -------------------------------------------------- Now the first data line is added at the end again and the data is partly shifted ------------------------------------------------------------------------------------ earData2[1,] FIRSTNAME SURNAME AGEYEARS MTHS SEX VDC SYRINGED 1 OM LAL SUBEDI 41 NA M BUR FALSE AUDIO FIRSTEARMA OTHEREARMA SNHLDETAIL CHLOTOSCLE DUMBYN CSOMDETAIL 1 TRUE CSO CSO <NA> NA NA TT OTHERDIAGN OPERATIONY GROMMETS HEARINGAID MYRINGTYMP EARDROPS 1 <NA> TRUE <NA> <NA> 2 NA MASTOIDECT ORALANTIBI STAPEDECTO OTHERTR OTHEROP TREATMENTD 1 <NA> NA <NA> NA <NA> <NA> earData2[nrow(earData2),] FIRSTNAME SURNAME AGEYEARS MTHS SEX VDC 1076 \032OM LAL SUBEDI 4 1 <NA> MBUR SYRINGED AUDIO FIRSTEARMA OTHEREARMA SNHLDETAIL CHLOTOSCLE DUMBYN 1076 NA FALSE YCS OCS O NA NA CSOMDETAIL 1076 T OTHERDIAGN 1076 T OPERATIONY GROMMETS HEARINGAID MYRINGTYMP EARDROPS MASTOIDECT 1076 NA Y <NA> <NA> NA <NA> ORALANTIBI STAPEDECTO OTHERTR OTHEROP TREATMENTD 1076 NA <NA> NA <NA> ! ------------------------------------------------------------------------------------ Debugging: The problem is in https://github.com/cran/foreign/blob/master/R/read.epiinfo.R#L74 after row 66 datalines look like that: ......... [2737] "GOBINDA RAJ 34 MTIJ NNO.EO.E !" [2738] " N Y !" [2739] " BETNESOL !" [2740] "AMAR RAJ 40 MKAL NNMYRNOR !" [2741] " N Y !" [2742] " GENT HC !" [2743] "\032" Warning is shown in line 73 after line 74: datalines[,1] [1] "HARI SUNDAR SHRESTHA 37 MDIP NNNORNOR EPISTAXIS !" [2] " N !" [3] " NEOSPORIN !" and datalines[,915] [1] "\032" [2] "HARI SUNDAR SHRESTHA 37 MDIP NNNORNOR EPISTAXIS !" [3] " this does result in a wrong result. I propose to ignore a last line that only contains "\032" Maybe something like this in line 67: if(identical(tail(datalines, n=1),c("\032"))) { length(datalines)<-(length(datalines)-1) } ------------------------------------------------------------------ Package: foreign Version: 0.8-67 Maintainer: R Core Team <R-core at R-project.org> Built: R 3.3.1; x86_64-pc-linux-gnu; 2016-09-26 12:57:55 UTC; unix R Version: platform = x86_64-pc-linux-gnu arch = x86_64 os = linux-gnu system = x86_64, linux-gnu status major = 3 minor = 3.1 year = 2016 month = 06 day = 21 svn rev = 70800 language = R version.string = R version 3.3.1 (2016-06-21) nickname = Bug in Your Hair Locale: LC_CTYPE=de_DE.utf8;LC_NUMERIC=C;LC_TIME=de_DE.utf8;LC_COLLATE=de_DE.utf8;LC_MONETARY=de_DE.utf8;LC_MESSAGES=de_DE.utf8;LC_PAPER=de_DE.utf8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.utf8;LC_IDENTIFICATION=C Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base This package has a bug submission web page, which we will now attempt to open. The information above may be useful in your report. If the web page doesn't work, you should send email to the maintainer, R Core Team <R-core at R-project.org>. Mit freundlichen Gr??en Artur Neumann -- www.individual-it-services.de EDV L?sungen, die auf Ihre W?nsche und Anforderungen angepasst sind. Blog: http://individualit.wordpress.com/ Aktuelle Infos: http://twitter.com/INDIVIDUALIT Bankverbindung: KtoNr: 46201786 BLZ: 47650130 Sparkasse Detmold Steuernummer: 313/5277/1775