Dear List: I had some strange experience in importing data. I wonder if anyone of you had the same problem before and would greatly appreciate your suggestion in advance. The original data set in excel format. Here is a brief summary of the procedure I did: 1. I saved the original excel data as csv and txt formats, separately. 2. I imported two data using the following codes. There were no error messages. dftxt = read.table('df.txt',header=T, sep='\t') dfcsv = read.csv('df.csv',header=T, sep=',') 3. When I checked data with 'str', I found that factor levels of a variable were different each other. Levels of dftxt were less than those of dfcsv (48 vs 52). 4. So, I checked 'df.txt' file and found that the missing levels were still there, i.e., there is a no problem in text file. I suspect that something happened when I imported it into R. Since there was no errors in importing the file into R, I do not have an idea where to start to fix it. Do you have any suggestion? Thank you very much in advance, SH [[alternative HTML version deleted]]
Hi, We don't know anything about your data or your file, so it's utterly impossible to offer useful suggestions. The very best thing you can do is condense your problem into a reproducible example, with fake data if necessary. Otherwise you're limited by the ability of the list to guess what you're looking at, and our track record with that is spotty. Sarah On Wed, Aug 21, 2013 at 10:35 AM, SH <emptican at gmail.com> wrote:> Dear List: > > I had some strange experience in importing data. I wonder if anyone of you > had the same problem before and would greatly appreciate your suggestion in > advance. > > The original data set in excel format. > > Here is a brief summary of the procedure I did: > 1. I saved the original excel data as csv and txt formats, separately. > 2. I imported two data using the following codes. There were no error > messages. > dftxt = read.table('df.txt',header=T, sep='\t') > dfcsv = read.csv('df.csv',header=T, sep=',') > 3. When I checked data with 'str', I found that factor levels of a variable > were different each other. > Levels of dftxt were less than those of dfcsv (48 vs 52). > 4. So, I checked 'df.txt' file and found that the missing levels were still > there, i.e., there is a no problem in text file. I suspect that something > happened when I imported it into R. > > Since there was no errors in importing the file into R, I do not have an > idea where to start to fix it. Do you have any suggestion? > > Thank you very much in advance, > > SH >-- Sarah Goslee http://www.functionaldiversity.org
This is not really enough information to diagnose the problem. What are the missing factor levels? Were the missing levels combined with another level or do you have missing values (NA) for those observations? Do the extra factor levels include embedded commas? There are differences between read.table and read.csv in the default quote= and comment.char= arguments. ------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of SH Sent: Wednesday, August 21, 2013 9:36 AM To: r-help at r-project.org Subject: [R] data import: strange experience Dear List: I had some strange experience in importing data. I wonder if anyone of you had the same problem before and would greatly appreciate your suggestion in advance. The original data set in excel format. Here is a brief summary of the procedure I did: 1. I saved the original excel data as csv and txt formats, separately. 2. I imported two data using the following codes. There were no error messages. dftxt = read.table('df.txt',header=T, sep='\t') dfcsv = read.csv('df.csv',header=T, sep=',') 3. When I checked data with 'str', I found that factor levels of a variable were different each other. Levels of dftxt were less than those of dfcsv (48 vs 52). 4. So, I checked 'df.txt' file and found that the missing levels were still there, i.e., there is a no problem in text file. I suspect that something happened when I imported it into R. Since there was no errors in importing the file into R, I do not have an idea where to start to fix it. Do you have any suggestion? Thank you very much in advance, SH [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Wed, 21 Aug 2013 10:35:53 -0400 SH <emptican at gmail.com> wrote: It looks like your problem has already been answered, however, as a rule of thumb anytime you see a peculiarity like this you should look for minor variations between what you expected to export and what Excel really exported as delimited text. Occasionally there will be a space or other character (",'/$,- etc.) that maybe handled as a signal or ignored by the importing program but not by Excel. Usually Excel works as expected, but it is a good idea to examine the text file(s) in an editor like notepad in Windows or Kate in Linux if you encounter an oddity. BTW, there better choices than notepad for Windows and I would recommend one with column selecting abilities for work on delimited data files. jwdougherty