Dear all, I have encountered a strange problem with read.table(). When I try to read a tab delimited file I get an error message for line 260 not being equal to 14 (see below). Using count.fields() suggests that a number of lines have length not equal to 14, but not 260. Looking at the actual file, however, I cannot see anything wrong with any lines. They all seem to have length 14, there are no double tabs etc., and the file reads correctly in other programs. Does anyone have any suggestions as to what this might stem from? I have placed a copy of the file at http://dss.ucsd.edu/~kgledits/archigos_v.1.9.asc regards, Kristian Skrede Gleditsch > archigos1.9 <- read.table("c:/work/work12/archigos/archigos_v.1.9.asc", + sep="\t",header=T,as.is=T,row.names=NULL) Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 260 did not have 14 elements > a <- count.fields("c:/work/work12/archigos/archigos_v.1.9.asc",sep="\t") > a <- data.frame(c(1:length(a)),a) > a[a[,2]!=14,] c.1.length.a.. a 150 150 10 313 313 10 424 424 10 1189 1189 5 1510 1510 10 1514 1514 10 1590 1590 5 1600 1600 10 1612 1612 10 1618 1618 10 1619 1619 10 1709 1709 10 1722 1722 10 1981 1981 10 1985 1985 10 2112 2112 10 2178 2178 10 2208 2208 10 2224 2224 10 2530 2530 5 2536 2536 5 2573 2573 5 2928 2928 5 -- Kristian Skrede Gleditsch Department of Political Science, UCSD (On leave, University of Essex, 2005-6) Tel: +44 1206 872499, Fax: +44 1206 873234 Email: kgleditsch at ucsd.edu or ksg at essex.ac.uk http://weber.ucsd.edu/~kgledits/
On Thu, 21 Jul 2005, Kristian Skrede Gleditsch wrote:> Dear all, > > I have encountered a strange problem with read.table().Most `strange problems' are user error, so please try not to blame your tools.> When I try to > read a tab delimited file I get an error message for line 260 not being > equal to 14 (see below).Yes, but not line 260 in that file, but line 260 as read by scan(). Think about quotes ... it works for me with quote="", and the quote on ca line 150 is causing you to get some very large fields with embedded new lines and tabs. BTW, there is a 'R Data Import/Export' manual which goes through step-by-step the assumptions you make when using read.table with various options. Do read it now.> Using count.fields() suggests that a number of lines have length not > equal to 14, but not 260. > > Looking at the actual file, however, I cannot see anything wrong with > any lines. They all seem to have length 14, there are no double tabs > etc., and the file reads correctly in other programs. Does anyone have > any suggestions as to what this might stem from? > > I have placed a copy of the file at > http://dss.ucsd.edu/~kgledits/archigos_v.1.9.asc > > regards, > Kristian Skrede Gleditsch > > > > archigos1.9 <- read.table("c:/work/work12/archigos/archigos_v.1.9.asc", > + sep="\t",header=T,as.is=T,row.names=NULL) > Error in scan(file = file, what = what, sep = sep, quote = quote, dec > dec, : > line 260 did not have 14 elements > > a <- count.fields("c:/work/work12/archigos/archigos_v.1.9.asc",sep="\t") > > a <- data.frame(c(1:length(a)),a) > > a[a[,2]!=14,] > c.1.length.a.. a > 150 150 10 > 313 313 10 > 424 424 10 > 1189 1189 5 > 1510 1510 10 > 1514 1514 10 > 1590 1590 5 > 1600 1600 10 > 1612 1612 10 > 1618 1618 10 > 1619 1619 10 > 1709 1709 10 > 1722 1722 10 > 1981 1981 10 > 1985 1985 10 > 2112 2112 10 > 2178 2178 10 > 2208 2208 10 > 2224 2224 10 > 2530 2530 5 > 2536 2536 5 > 2573 2573 5 > 2928 2928 5 > -- > Kristian Skrede Gleditsch > Department of Political Science, UCSD > (On leave, University of Essex, 2005-6) > Tel: +44 1206 872499, Fax: +44 1206 873234 > Email: kgleditsch at ucsd.edu or ksg at essex.ac.uk > http://weber.ucsd.edu/~kgledits/ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
I don't really understand it, but the problem seems to come down to the presence of apostrophes (single right quotes "'") in the text strings. The first of these occurs in line 149 (not counting the header line). If one tries to scan just that line, one gets a vector of length 10. Fields 10 to 14 are read as a single field. Upon deleting the apostrophe, I got a a vector of length 14 (OMMMMMMMMMMM!) The help on scan() talks about a quote argument and indicates that if sep is not the newline character, then quote defaults to "'\"". It remarks that you can include quotes inside strings by doubling them. I did a global substitution, changing "'" to "''" throughout, and the read.table() worked (i.e. didn't complain and yielded up a data frame of dimension 2935 x 14). But no apostrophes appeared in the fields in the resulting data frame. The help seems to indicate that you can get around the problem by specifying quote = some character which doesn't appear in the file. (This also saves having to do a global edit.) I tried quote="#" and it seemed to work in this instance. And the apostrophes ***did*** appear in the strings in the data frame. I don't grok why the complaint shows up at line 260 rather than immediately at line 149 .... but it's a start. cheers, Rolf Turner rolf at math.unb.ca Original message:> From r-help-bounces at stat.math.ethz.ch Thu Jul 21 10:12:09 2005 > Date: Thu, 21 Jul 2005 14:11:36 +0100 > From: Kristian Skrede Gleditsch <kgleditsch at ucsd.edu> > User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) > X-Accept-Language: en-us, en > MIME-Version: 1.0 > To: r-help at stat.math.ethz.ch > X-Essex-ClamAV: No malware found > X-Essex-MailScanner: Found to be clean > X-Essex-MailScanner-SpamCheck: not spam, SpamAssassin (score=-2.82, > required 5, autolearn=disabled, ALL_TRUSTED -2.82) > X-MailScanner-From: kgleditsch at ucsd.edu > X-Virus-Scanned: by amavisd-new at stat.math.ethz.ch > Subject: [R] Problem with read.table() > X-BeenThere: r-help at stat.math.ethz.ch > X-Mailman-Version: 2.1.6 > List-Id: "Main R Mailing List: Primary help" <r-help.stat.math.ethz.ch> > List-Unsubscribe: <https://stat.ethz.ch/mailman/listinfo/r-help>, > <mailto:r-help-request at stat.math.ethz.ch?subject=unsubscribe> > List-Archive: <https://stat.ethz.ch/pipermail/r-help> > List-Post: <mailto:r-help at stat.math.ethz.ch> > List-Help: <mailto:r-help-request at stat.math.ethz.ch?subject=help> > List-Subscribe: <https://stat.ethz.ch/mailman/listinfo/r-help>, > <mailto:r-help-request at stat.math.ethz.ch?subject=subscribe> > Content-Transfer-Encoding: 7bit > X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on erdos.math.unb.ca > X-Spam-Math-Flag: NO > X-Spam-Math-Status: No, hits=0.0 required=5.0 tests=BAYES_50 autolearn=no > version=3.0.4 > > Dear all, > > I have encountered a strange problem with read.table(). When I try to > read a tab delimited file I get an error message for line 260 not being > equal to 14 (see below). > > Using count.fields() suggests that a number of lines have length not > equal to 14, but not 260. > > Looking at the actual file, however, I cannot see anything wrong with > any lines. They all seem to have length 14, there are no double tabs > etc., and the file reads correctly in other programs. Does anyone have > any suggestions as to what this might stem from? > > I have placed a copy of the file at > http://dss.ucsd.edu/~kgledits/archigos_v.1.9.asc > > regards, > Kristian Skrede Gleditsch > > > > archigos1.9 <- read.table("c:/work/work12/archigos/archigos_v.1.9.asc", > + sep="\t",header=T,as.is=T,row.names=NULL) > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = > dec, : > line 260 did not have 14 elements > > a <- count.fields("c:/work/work12/archigos/archigos_v.1.9.asc",sep="\t") > > a <- data.frame(c(1:length(a)),a) > > a[a[,2]!=14,] > c.1.length.a.. a > 150 150 10 > 313 313 10 > 424 424 10 > 1189 1189 5 > 1510 1510 10 > 1514 1514 10 > 1590 1590 5 > 1600 1600 10 > 1612 1612 10 > 1618 1618 10 > 1619 1619 10 > 1709 1709 10 > 1722 1722 10 > 1981 1981 10 > 1985 1985 10 > 2112 2112 10 > 2178 2178 10 > 2208 2208 10 > 2224 2224 10 > 2530 2530 5 > 2536 2536 5 > 2573 2573 5 > 2928 2928 5 > -- > Kristian Skrede Gleditsch > Department of Political Science, UCSD > (On leave, University of Essex, 2005-6) > Tel: +44 1206 872499, Fax: +44 1206 873234 > Email: kgleditsch at ucsd.edu or ksg at essex.ac.uk > http://weber.ucsd.edu/~kgledits/
Thanks to all who responded to my earlier message. The problem lies in that apostrophes (i.e., ') in some of the text fields are read as quotes. The file can be read without problems setting quotes="" in read.table. Incidently, read.delim() also works, even without setting quotes="" explicitly. best regards, Kristian Skrede Gleditsch Department of Political Science, UCSD (On leave, University of Essex, 2005-6) Tel: +44 1206 872499, Fax: +44 1206 873234 Email: kgleditsch at ucsd.edu or ksg at essex.ac.uk http://weber.ucsd.edu/~kgledits/ Kristian Skrede Gleditsch wrote:> Dear all, > > I have encountered a strange problem with read.table(). When I try to > read a tab delimited file I get an error message for line 260 not being > equal to 14 (see below). > > Using count.fields() suggests that a number of lines have length not > equal to 14, but not 260. > > Looking at the actual file, however, I cannot see anything wrong with > any lines. They all seem to have length 14, there are no double tabs > etc., and the file reads correctly in other programs. Does anyone have > any suggestions as to what this might stem from? > > I have placed a copy of the file at > http://dss.ucsd.edu/~kgledits/archigos_v.1.9.asc > > regards, > Kristian Skrede Gleditsch > > > > archigos1.9 <- read.table("c:/work/work12/archigos/archigos_v.1.9.asc", > + sep="\t",header=T,as.is=T,row.names=NULL) > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = > dec, : > line 260 did not have 14 elements > > a <- count.fields("c:/work/work12/archigos/archigos_v.1.9.asc",sep="\t") > > a <- data.frame(c(1:length(a)),a) > > a[a[,2]!=14,] > c.1.length.a.. a > 150 150 10 > 313 313 10 > 424 424 10 > 1189 1189 5 > 1510 1510 10 > 1514 1514 10 > 1590 1590 5 > 1600 1600 10 > 1612 1612 10 > 1618 1618 10 > 1619 1619 10 > 1709 1709 10 > 1722 1722 10 > 1981 1981 10 > 1985 1985 10 > 2112 2112 10 > 2178 2178 10 > 2208 2208 10 > 2224 2224 10 > 2530 2530 5 > 2536 2536 5 > 2573 2573 5 > 2928 2928 5