HI, Dear R community, My original file has 1932 lines, but when I read into R, it changed to 1068 lines, how comes? cdu@nuuk:~/operon$ wc -l id_name_gh5.txt 1932 id_name_gh5.txt> gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t",skip=0, header=F, fill=T)> dim(gene_name)[1] 1068 3 -- Sincerely, Changbin -- Changbin Du DOE Joint Genome Institute Bldg 400 Rm 457 2800 Mitchell Dr Walnut Creet, CA 94598 Phone: 925-927-2856 [[alternative HTML version deleted]]
Hi Changbin, Try to use this code in R to count the lines of your file without open it length(count.fields("id_name_gh5.txt")) Regards Mohamed Changbin Du a ?crit :> HI, Dear R community, > > My original file has 1932 lines, but when I read into R, it changed to 1068 > lines, how comes? > > > cdu at nuuk:~/operon$ wc -l id_name_gh5.txt > 1932 id_name_gh5.txt > > > >> gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t", >> > skip=0, header=F, fill=T) > >> dim(gene_name) >> > [1] 1068 3 > > >-- Mohamed Lajnef,IE INSERM U955 eq 15 P?le de Psychiatrie H?pital CHENEVIER 40, rue Mesly 94010 CRETEIL Cedex FRANCE Mohamed.lajnef at inserm.fr tel : 01 49 81 31 31 (poste 18470) Sec : 01 49 81 32 90 fax : 01 49 81 30 99
On May 25, 2010, at 11:42 AM, Changbin Du wrote:> HI, Dear R community, > > My original file has 1932 lines, but when I read into R, it changed > to 1068 > lines, how comes?We are being asked to investigate this quest, how? Have you looked at the last line to see if it looks like gene_name? Isn't this isomorphic to genetics questions? What sort of mutation is it? Deletion? Abnormal stop codon? Figure out where the transcription process went wrong. This sort of analysis would appear to be right up the alley of someone doing genetics.> > > cdu at nuuk:~/operon$ wc -l id_name_gh5.txt > 1932 id_name_gh5.txt > > >> gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t", > skip=0, header=F, fill=T) >> dim(gene_name) > [1] 1068 3 > > > --David Winsemius, MD West Hartford, CT
On Tue, May 25, 2010 at 4:42 PM, Changbin Du <changbind at gmail.com> wrote:> HI, Dear R community, > > My original file has 1932 lines, but when I read into R, it changed to 1068 > lines, how comes? > > > cdu at nuuk:~/operon$ wc -l id_name_gh5.txt > 1932 id_name_gh5.txt > > >> gene_name<-read.table("/home/cdu/operon/id_name_gh5.txt", sep="\t", > skip=0, header=F, fill=T) >> dim(gene_name) > [1] 1068 ? ?3 > >Do any of your lines start with a "#"?> read.table("test.txt",sep="\t")V1 1 line 1 2 line 2 3 line 3 4 line 4> read.table("test.txt",comment.char="",sep="\t")V1 1 line 1 2 #commented 3 line 2 4 line 3 5 #nother comment 6 line 4 just a guess. hard to tell without the file... Barry
Gene names often have single quotes like 5'-methylthioadenosine phosphorylase ATP synthase B' chain ppGpp 3'-pyrophosphohydrolase so maybe try adding quote="" to the read table options. Chris Stubben -- View this message in context: http://r.789695.n4.nabble.com/R-eat-my-data-tp2230217p2230303.html Sent from the R help mailing list archive at Nabble.com.