agent dunham
2011-Feb-08 10:05 UTC
[R] read.xls counts more rows than I really have in my Excel file
Hi, i'm using read.xls, and it counts more rows in my Excel than I really have. i've used both: especie26 <- read.xls("especie26cargar.xls") datos26 <- read.xls("especie26cargar.xls", header = TRUE, as.is = FALSE, na.strings = "NA", skip = 0, check.names = TRUE, fill = FALSE, strip.white = FALSE, blank.lines.skip = TRUE) when i check dim(especie26) or dim(datos26) i got 858 17 and it should be 667 17 thanks in advance !!!! -- View this message in context: http://r.789695.n4.nabble.com/read-xls-counts-more-rows-than-I-really-have-in-my-Excel-file-tp3275643p3275643.html Sent from the R help mailing list archive at Nabble.com.
(Ted Harding)
2011-Feb-08 10:41 UTC
[R] read.xls counts more rows than I really have in my Excel
On 08-Feb-11 10:05:58, agent dunham wrote:> Hi, > i'm using read.xls, and it counts more rows in my Excel > than I really have. > > i've used both: > > especie26 <- read.xls("especie26cargar.xls") > > datos26 <- read.xls("especie26cargar.xls", header = TRUE, > as.is = FALSE, na.strings = "NA", > skip = 0, check.names = TRUE, fill = FALSE, > strip.white = FALSE, blank.lines.skip = TRUE) > > when i check dim(especie26) or dim(datos26) i got 858 17 > and it should be 667 17 > > thanks in advance !!!! > --You say "i'm using read.xls, and it counts more rows in my Excel than I really have." I would be inclined to the view that this should be "it counts more rows in my Excel than I think I really have." In other words, very probably there really are 858 rows in your spreadsheet, even though you think there are 667 (and can only see 667). As a check, I would suggest that you export the entire worksheet into a CSV file, e.g. "especie26cargar.csv". For the purposes of this check DO NOT highlight the data area to be saved (since doing this would export your possibly false impression of the data as well). Then, using some suitable utility, count the number of lines in the resulting CSV file "especie26cargar.csv". I am prepared to bet that you will find 858 lines in this file (plus 1 for the header). In that case the excess (858 - 667) = 191 rows would probably consist of rows containing only commas, like ,,,,,,,,,,,,,,,, (i.e. 16 commas, for the 17 fields, though the nunber might vary from row to row). The phenomenon on which I base this guess is that very often, when people enter data into an Excel spreadsheet, they somehow allow the cursor to wander outside the area of the spreadsheet which is to be occupied by real data. Internally (and invisibly to the user) Excel will then attribute the extraneous cells into which the cursor had wandered as being data cells with empty content. They therefore continue to appear on screen as pure blanks, but they have been invisibly added to the structure of the spreadsheet. A possible alternative to this check (but less definitive) is to look at the dataframe which you get after you have read the spreadsheet into R. E.g., after especie26 <- read.xls("especie26cargar.xls") on the basis tghat you should only have 667 data rows, have a look at the result of, say, especie26[(670:680),] or some other range of rows beyond (1:667). On the same basis as above, I am betting that these rows probably consist solely of NAs. Why am I saying this? Because more times than I have any wish to remember in detail I have encountered exactly this problem in spreadsheets sloppily prepared by people who believed what they saw on screen, totally unaware of what Excel had really done to their data. Excel is a con artist: it imparts an impression of success and confidence to the user, which may be quite false. By the way: By a similar mechanism, you can find that different rows in the spreadsheet have different numbers of fields. Ted. feel -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.harding at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 08-Feb-11 Time: 10:41:27 ------------------------------ XFMail ------------------------------