Michael Bärtl
2012-May-22 16:26 UTC
[R] Could "incomplete final line found" be more serious than a warning?
Dear all, I've been successfully reading Web of Science-data from tab-delimited text files into a data.frame using an R-script based on readLines(). With new data I just downloaded I suddenly get this warning: incomplete final line found I know this warning has already been discussed numerous times but none of the previously suggested solutions worked for me, unfortunately; so please bear with me: I shut the warning down using "warn = FALSE", but the data still won't get read so this seems to be more serious than a warning. Adding a blank line or two at the end of the file did NOT help, i.e. R still does not read the file. But my old files still work properly, though. So I opened the text files using Notepad++ and saw that the last lines of both old text files (i.e. working) as well as new text files (i.e. the ones that don't work for some reason) always end with a tab stop followed by a line break. Personally I couldn't tell any difference between the ways these files ended. Their endings looked identical to me. I was using R 2.14.0 (64 bit) on Windows when I dioscovered the problem. So I upgraded to 2.15.0 (64-bit) but the problem persists. You can see small examples of an old and new file at https://www.dropbox.com/s/2joadjo9ce86rij/WoS-old.txt and https://www.dropbox.com/s/lp9l1exx4mfws1s/WoS-new.txt, respectively. Does anybody happen to have an idea of what could cause these problems for me? Thank you very much for your consideration!
peter dalgaard
2012-May-22 17:33 UTC
[R] Could "incomplete final line found" be more serious than a warning?
(Original below) Looks like someone had the bright idea of changing it to 16-bit UTF, so every 2nd byte is NUL. It works for me with x <- readLines(file("~/Downloads/WoS-new.txt", encoding="UTF-16")) (except that for some reason, x won't print properly although each individual line prints fine. Never mind, who cares as long as it reads...) -pd PS: The reason the printing is wacky is that one line has 148934 characters in it and the print routines pad all lines to the maximum length. Not sure what the point is in that. On May 22, 2012, at 18:26 , Michael B?rtl wrote:> Dear all, > > I've been successfully reading Web of Science-data from tab-delimited text files into a data.frame using an R-script based on readLines(). > With new data I just downloaded I suddenly get this warning: > > incomplete final line found > > I know this warning has already been discussed numerous times but none of the previously suggested solutions worked for me, unfortunately; so please bear with me: > > I shut the warning down using "warn = FALSE", but the data still won't get read so this seems to be more serious than a warning. > > Adding a blank line or two at the end of the file did NOT help, i.e. R still does not read the file. > > But my old files still work properly, though. > So I opened the text files using Notepad++ and saw that the last lines of both old text files (i.e. working) as well as new text files (i.e. the ones that don't work for some reason) always end with a tab stop followed by a line break. Personally I couldn't tell any difference between the ways these files ended. Their endings looked identical to me. > > I was using R 2.14.0 (64 bit) on Windows when I dioscovered the problem. So I upgraded to 2.15.0 (64-bit) but the problem persists. > > You can see small examples of an old and new file at https://www.dropbox.com/s/2joadjo9ce86rij/WoS-old.txt and https://www.dropbox.com/s/lp9l1exx4mfws1s/WoS-new.txt, respectively. > > Does anybody happen to have an idea of what could cause these problems for me? > > Thank you very much for your consideration! > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Zhou Fang
2012-May-22 17:38 UTC
[R] Could "incomplete final line found" be more serious than a warning?
If you look at the new file in raw mode, you'll see that it's chock full of ASCII nuls, while the old file has none. This is probably what's giving you the problems, because R does not allow strings containing embedded nul characters. (I believe this is because Nul in strings is pretty dangerous in programming, because they are often used to delimit the end of strings, and so allowing you to read it in directly can be used for various code injection exploits.) To read the new data files, you need some way of dealing with the file as a raw stream, and stripping out all the nul characters before converting back to character. Investigate ?readBin... Zhou -- View this message in context: http://r.789695.n4.nabble.com/Could-incomplete-final-line-found-be-more-serious-than-a-warning-tp4630932p4630944.html Sent from the R help mailing list archive at Nabble.com.
Michael Bärtl
2012-May-25 20:45 UTC
[R] Could "incomplete final line found" be more serious than a warning?
Dear all, thank you very much for the prompt reply by Professor Dalgaard! Unfortunately, the option he recommended doesn't work for me but still gives me the same error. But when I manually change the encoding of the text file and save it, my script works properly again. So, thanks a lot again! I hadn't thought the encoding could be to blame in this case. Now I'm just glad this is solved! Best wishes, Michael Baertl> Date: Tue, 22 May 2012 19:33:45 +0200 > From: peter dalgaard<pdalgd at gmail.com> > To: Michael B?rtl<michael.baertl at student.hu-berlin.de> > Cc:r-help at r-project.org > Subject: Re: [R] Could "incomplete final line found" be more serious > than a warning? > Message-ID:<FFDAF67B-E540-46E7-8F47-D641AC2E96D5 at gmail.com> > Content-Type: text/plain; charset=iso-8859-1 > > (Original below) > > Looks like someone had the bright idea of changing it to 16-bit UTF, so every 2nd byte is NUL. It works for me with > > x<- readLines(file("~/Downloads/WoS-new.txt", encoding="UTF-16")) > > (except that for some reason, x won't print properly although each individual line prints fine. Never mind, who cares as long as it reads...) > > -pd > > PS: The reason the printing is wacky is that one line has 148934 characters in it and the print routines pad all lines to the maximum length. Not sure what the point is in that. > > On May 22, 2012, at 18:26 , Michael B?rtl wrote: > >> > Dear all, >> > >> > I've been successfully reading Web of Science-data from tab-delimited text files into a data.frame using an R-script based on readLines(). >> > With new data I just downloaded I suddenly get this warning: >> > >> > incomplete final line found >> > >> > I know this warning has already been discussed numerous times but none of the previously suggested solutions worked for me, unfortunately; so please bear with me: >> > >> > I shut the warning down using "warn = FALSE", but the data still won't get read so this seems to be more serious than a warning. >> > >> > Adding a blank line or two at the end of the file did NOT help, i.e. R still does not read the file. >> > >> > But my old files still work properly, though. >> > So I opened the text files using Notepad++ and saw that the last lines of both old text files (i.e. working) as well as new text files (i.e. the ones that don't work for some reason) always end with a tab stop followed by a line break. Personally I couldn't tell any difference between the ways these files ended. Their endings looked identical to me. >> > >> > I was using R 2.14.0 (64 bit) on Windows when I dioscovered the problem. So I upgraded to 2.15.0 (64-bit) but the problem persists. >> > >> > You can see small examples of an old and new file athttps://www.dropbox.com/s/2joadjo9ce86rij/WoS-old.txt andhttps://www.dropbox.com/s/lp9l1exx4mfws1s/WoS-new.txt, respectively. >> > >> > Does anybody happen to have an idea of what could cause these problems for me? >> > >> > Thank you very much for your consideration! >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. > -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen > Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: > (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com