I'm new to R. I have a mac (OS10.6). I have converted an Excel file to a csv to import into R. I have used many methods to import the file, most do not work, the best so far is: filename <- read.csv(/Users/Desktop/csvfile.csv", header=T, sep=","). I have also tried taking out the header and sep lines and it still imports fine. Regardless of what I do, it always brings in the data but also includes all 16,000+ columns from Excel with X.1 as the first column up to X.16345 (for instance). Of course the files are filled with "NA" since no data is present. How do I get around this? So far I have included strip.white=T and fill=T, something I've seen in other csv import posts but this does not work. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/importing-csv-gets-me-all-16-000-columns-with-NA-tp3006480p3006480.html Sent from the R help mailing list archive at Nabble.com.
Petr PIKAL
2010-Oct-22  07:25 UTC
[R] Odp: importing csv gets me all 16,000 columns with "NA"
Hi You did not provide much info for help. What is size of imported file (columnsxrows)? r-help-bounces at r-project.org napsal dne 22.10.2010 01:23:03:> > I'm new to R. I have a mac (OS10.6). I have converted an Excel file to acsv> to import into R. I have used many methods to import the file, most donot> work, the best so far is: > > filename <- read.csv(/Users/Desktop/csvfile.csv", header=T, sep=","). Ihave> also tried taking out the header and sep lines and it still importsfine. Imports fine or not?> > Regardless of what I do, it always brings in the data but also includesall> 16,000+ columns from Excel with X.1 as the first column up to X.16345(for> instance). Of course the files are filled with "NA" since no data is > present.It seems that this file have no header or the header are numbers from 1 to 16345> > How do I get around this? > > So far I have included strip.white=T and fill=T, something I've seen in > other csv import posts but this does not work.There are many ways how to import whole file by read.* commands but you could also check scan or readLines functions. The result always depends on looklike of your input file (separators, decimals, missing values etc.) Regards Petr> > Thanks! > -- > View this message in context:http://r.789695.n4.nabble.com/importing-csv-> gets-me-all-16-000-columns-with-NA-tp3006480p3006480.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
(Ted Harding)
2010-Oct-22  10:14 UTC
[R] importing csv gets me all 16,000 columns with "NA"
On 21-Oct-10 23:23:03, mkinseth wrote:> I'm new to R. I have a mac (OS10.6). I have converted an Excel > file to a csv to import into R. I have used many methods to > import the file, most do not work, the best so far is: > > filename <- read.csv(/Users/Desktop/csvfile.csv", > header=T, sep=","). > I have also tried taking out the header and sep lines and it > still imports fine. > > Regardless of what I do, it always brings in the data but also > includes all 16,000+ columns from Excel with X.1 as the first > column up to X.16345 (for instance). Of course the files are > filled with "NA" since no data is present. > > How do I get around this? > > So far I have included strip.white=T and fill=T, something I've > seen in other csv import posts but this does not work. > > Thanks!"strip.white=T" is not going to solve the problem, since this only changes the values of character fields which are present (by removing leading and trailing blanks). It will not remove blank fields. Nor will "fill=T": if there is, anywhere in the CSV file, a row with more commas in it than the others, then *all* the rows will be extended so as to have the same number of fields as the excessively long one. See the result of entering ?read.csv for more detail. Since, you say, you have converted the Excel file to CSV yourself, the solution is probably in your own hands. Excel is notorious for messing up data structures in ways which are not readily perceptible to the user. One way in which this can happen is that if, during data entry, the cursor wanders beyond the intended limits of the data set (e.g. to a row below the number of rows in the data, or to a column to the right of the number of columns in the data) then, without the user entering anything whatever into the cell where the cursor happens to be, Excel may register that cell as having content. The user is unlikely to be aware that this has happened, but once it has happened the CSV file will contain a rectanguler block which includes this "outlying" cell, hence too many rows and/or too many columns. One way to try to overcome this, in Excel, is to select the area of the table to be exported prior to initiating the export to CSV. I won't advise on techniques for that, since I haven't used an Excel version later than Win98 ... ! (and I have never happy that I have done the right thing in any Excel operation). I hope others will give detailed advice. Then you should find that the CSV contains only as many rows and columns as you have in your data and you should be OK. It may simply be that, by default, your Excel has saved the whole spreadsheet, so you get all the columns. The same solution should apply. Hoping that this help[s. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.harding at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 22-Oct-10 Time: 11:14:19 ------------------------------ XFMail ------------------------------
Gabor Grothendieck
2010-Oct-22  13:31 UTC
[R] importing csv gets me all 16,000 columns with "NA"
On Thu, Oct 21, 2010 at 7:23 PM, mkinseth <mkinseth at projects.sdsu.edu> wrote:> > I'm new to R. I have a mac (OS10.6). I have converted an Excel file to a csv > to import into R. I have used many methods to import the file, most do not > work, the best so far is: > > filename <- read.csv(/Users/Desktop/csvfile.csv", header=T, sep=","). I have > also tried taking out the header and sep lines and it still imports fine. > > Regardless of what I do, it always brings in the data but also includes all > 16,000+ columns from Excel with X.1 as the first column up to X.16345 (for > instance). Of course the files are filled with "NA" since no data is > present. > > How do I get around this? > > So far I have included strip.white=T and fill=T, something I've seen in > other csv import posts but this does not work. > > Thanks!It might be that the content of your spreadsheet is not what you think it is. You might want to try reading it from Excel using one of the methods listed in this link in case the problem arose through the creation of the intermediate csv file: http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windows -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com