Dear all, I have a large dataframe where one of the records in a column must have been wrongly formatted, in particular i think is missing a closing ". When I try to show only that column's value I get a [1] with plenty of empty space, the final record [45] and the system freezes. also, when i try to plot i get a table's printout instead of a real plot. Is there a way to identify the record with the format? On a spreadsheet or text editor, all records seem OK; end there are too many records to visually inspect them all. -- Best regards, Luigi
On 05/06/2019 6:12 a.m., Luigi Marongiu wrote:> Dear all, > I have a large dataframe where one of the records in a column must > have been wrongly formatted, in particular i think is missing a > closing ". > When I try to show only that column's value I get a [1] with plenty of > empty space, the final record [45] and the system freezes. also, when > i try to plot i get a table's printout instead of a real plot. > > Is there a way to identify the record with the format? On a > spreadsheet or text editor, all records seem OK; end there are too > many records to visually inspect them all. >Without seeing the data it is hard to be specific, but the count.fields() function should normally return the same number of fields for every line. You may need to specify some of its optional arguments, e.g. sep="," for a CSV file, etc. For example, with this file: 1,2,3 1,2,"4" 1,2," 1,2,5 1,2,"6" I see > count.fields("~/temp/test.txt",sep=",") [1] 3 3 NA NA NA 3 indicating that there are problems on lines 3-5 (a missing closing quote on line 3). Duncan Murdoch
I've seen that behaviour with a C" atom in a chemical structure. Here is code to identify lines with an uneven number of quotation marks. Read your file with readLines() to use it. myTxt <- '"This" "is" "fine"' myTxt[2] <- '"This" "is "not"' myTxt[3] <- 'This is ok' x <- lengths(regmatches(myTxt, gregexpr('\\"', myTxt))) # (1) which(x %% 2 == 1) [1] 2 Cheers, Boris (1) credit to https://stackoverflow.com/questions/12427385/how-to-calculate-the-number-of-occurrence-of-a-given-character-in-each-row-of-a> On 2019-06-05, at 06:12, Luigi Marongiu <marongiu.luigi at gmail.com> wrote: > > Dear all, > I have a large dataframe where one of the records in a column must > have been wrongly formatted, in particular i think is missing a > closing ". > When I try to show only that column's value I get a [1] with plenty of > empty space, the final record [45] and the system freezes. also, when > i try to plot i get a table's printout instead of a real plot. > > Is there a way to identify the record with the format? On a > spreadsheet or text editor, all records seem OK; end there are too > many records to visually inspect them all. > > -- > Best regards, > Luigi > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.