thr3ads.net - R help - [R] write.table: strange output has been produced [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Igor

2012-Sep-19 16:12 UTC

[R] write.table: strange output has been produced

Good afternoon all -

While making a steady progress in learning R after Matlab I encountered
a problem which seems to require some extra help to move over.
Basically I want to merge a data from biological statistical dataset
with annotation data extracted from another dataset using an 'id'
crossreference and write it to report file. The first part goes
absolutely fine, I have merged both data into data.frame but when I try
to write it to csv file using 'write.table' it seems like it does write
the 'data.frame' object but it also insert some parts from the
annotation data which are not suppose to be there...
There is a little snapshot of the file output below to illustrate. The
upper half is fine, that's how it should be. The lower half, which is
actually appears to be space-separated, not coma, obviously grabbed from
the annotation dataset and is not supposed to be here.

--------------------------------8<--------------------------------------------
"344","166128",126.44286392082,179.904700814932,72.9810270267088,0.40566492535281,-1.3016395254146,2.47449355237252e-07,4.2901159299567e-06,"Chitinas
"18816","238247",92.5282508325735,135.981255262454,49.0752464026927,0.36089714209487,-1.47034037615176,2.5330054329543e-07,4.38862252337004e-06,"Prot
"22072","222365",30.8191942806426,52.4262903365628,9.21209822472236,0.17571524068522,-2.50868876576414,2.54433836512085e-07,4.40531098485028e-06,NA,N
"25062","226605",30.808007579908,50.3976662241578,11.2183489356581,0.22259659575825,-2.16749656564076,2.54934711860645e-07,4.41103467375713e-06,NA,NA
"7539","247009",75.4175439970731,34.4643221134552,116.370765880691,3.37655751642533,1.75555313265164,2.60010673210741e-07,4.49585878338091e-06,NA,NA,
"407","267139",425.559675915702,279.393013150954,571.72633868045,2.04631580522577,1.03302881149302,2.61074218843609e-07,4.51123710239304e-06,NA,NA,NA
"26530","171300",146.80096060985,80.0063286553601,213.595592564339,2.66973370924738,1.4166958484644,2.68061220749976e-07,4.62888115991058e-06,NA,NA,N
"3078","159013",34.3260176515511,52.4580790080106,16.1939562950917,0.308702808057816,-1.69570948866688,2.69104298652827e-07,4.64379716436078e-06,"40S
"4657","159998",133.10761487064,185.450704462326,80.7645252789532,0.435504009074069,-1.19924209513405,2.75544399955331e-07,4.75176501174632e-06,"IMP-

171597  171597  KOG1347 Uncharacterized membrane protein, predicted
efflux pump General function prediction only    POORLY CHARACTERIZED
171658  171658  KOG4290 Predicted membrane protein  Function unknown
POORLY CHARACTERIZED
171660  171660  KOG0903 Phosphatidylinositol 4-kinase, involved in
intracellular trafficking and secretion  Signal transduction mechanisms
CELLULAR 
171660  171660  KOG0903 Phosphatidylinositol 4-kinase, involved in
intracellular trafficking and secretion  Intracellular trafficking,
secretion, and
171703  171703  KOG2674 Cysteine protease required for autophagy -
Apg4p/Aut2p  Cytoskeleton    CELLULAR PROCESSES AND SIGNALING
171703  171703  KOG2674 Cysteine protease required for autophagy -
Apg4p/Aut2p  Intracellular trafficking, secretion, and vesicular
transport   CELLU
and metabolism     METABOLISM
--------------------------------8<--------------------------------------------
And this is a piece of code that produced this:

--------------------------------8<-------------------------------------------->n = nrow(statdata)
>extra = data.frame(kogdefline=rep(NA,n), kogClass = rep(NA,n), kogGroup
= rep(NA,n))>subset = intersect(statdata$id, annot$id)
>MR = match(subset, annot$id)
>ML = match(subset, statdata$id)
>extra[ML,1] = as.character(annot[MR,2])
>extra[ML,2] = as.character(annot[MR,3])
>extra[ML,3] = as.character(annot[MR,4])# strangely, if I do    
# extra[ML,] = as.character(annot[MR,2:4])
# it produces digits (???) instead of a string value
>mergedData = data.frame(statdata, extra)
>write.table(mergedData, 'filename.csv', sep=',')--------------------------------8<--------------------------------------------

Any ideas why this is happening?

Many thanks
-Igor

David Winsemius

2012-Sep-19 17:26 UTC

head link

[R] write.table: strange output has been produced

On Sep 19, 2012, at 9:12 AM, Igor wrote:
> Good afternoon all -
> 
> While making a steady progress in learning R after Matlab I encountered
> a problem which seems to require some extra help to move over.
> Basically I want to merge a data from biological statistical dataset
> with annotation data extracted from another dataset using an 'id'
> crossreference and write it to report file. The first part goes
> absolutely fine, I have merged both data into data.frame but when I try
> to write it to csv file using 'write.table' it seems like it does
write
> the 'data.frame' object but it also insert some parts from the
> annotation data which are not suppose to be there...
> There is a little snapshot of the file output below to illustrate. The
> upper half is fine, that's how it should be. The lower half, which is
> actually appears to be space-separated, not coma, obviously grabbed from
> the annotation dataset and is not supposed to be here.
> 
>
--------------------------------8<--------------------------------------------
>
"344","166128",126.44286392082,179.904700814932,72.9810270267088,0.40566492535281,-1.3016395254146,2.47449355237252e-07,4.2901159299567e-06,"Chitinas
>
"18816","238247",92.5282508325735,135.981255262454,49.0752464026927,0.36089714209487,-1.47034037615176,2.5330054329543e-07,4.38862252337004e-06,"Prot
>
"22072","222365",30.8191942806426,52.4262903365628,9.21209822472236,0.17571524068522,-2.50868876576414,2.54433836512085e-07,4.40531098485028e-06,NA,N
>
"25062","226605",30.808007579908,50.3976662241578,11.2183489356581,0.22259659575825,-2.16749656564076,2.54934711860645e-07,4.41103467375713e-06,NA,NA
>
"7539","247009",75.4175439970731,34.4643221134552,116.370765880691,3.37655751642533,1.75555313265164,2.60010673210741e-07,4.49585878338091e-06,NA,NA,
>
"407","267139",425.559675915702,279.393013150954,571.72633868045,2.04631580522577,1.03302881149302,2.61074218843609e-07,4.51123710239304e-06,NA,NA,NA
>
"26530","171300",146.80096060985,80.0063286553601,213.595592564339,2.66973370924738,1.4166958484644,2.68061220749976e-07,4.62888115991058e-06,NA,NA,N
>
"3078","159013",34.3260176515511,52.4580790080106,16.1939562950917,0.308702808057816,-1.69570948866688,2.69104298652827e-07,4.64379716436078e-06,"40S
>
"4657","159998",133.10761487064,185.450704462326,80.7645252789532,0.435504009074069,-1.19924209513405,2.75544399955331e-07,4.75176501174632e-06,"IMP-
> 
> 171597  171597  KOG1347 Uncharacterized membrane protein, predicted
> efflux pump General function prediction only    POORLY CHARACTERIZED
> 171658  171658  KOG4290 Predicted membrane protein  Function unknown
> POORLY CHARACTERIZED
> 171660  171660  KOG0903 Phosphatidylinositol 4-kinase, involved in
> intracellular trafficking and secretion  Signal transduction mechanisms
> CELLULAR 
> 171660  171660  KOG0903 Phosphatidylinositol 4-kinase, involved in
> intracellular trafficking and secretion  Intracellular trafficking,
> secretion, and
> 171703  171703  KOG2674 Cysteine protease required for autophagy -
> Apg4p/Aut2p  Cytoskeleton    CELLULAR PROCESSES AND SIGNALING
> 171703  171703  KOG2674 Cysteine protease required for autophagy -
> Apg4p/Aut2p  Intracellular trafficking, secretion, and vesicular
> transport   CELLU
> and metabolism     METABOLISM
This looks like the sort of thing that occurs when there is a mismatched or
missing double or single quote or perhaps comment character ( "#" that
terminated a line read) somewhare. The logical place to look is in the line of
data just above the pathological stretch of data. You have clearly only offered
a truncated version of the data, since there are many instances of lines ending
without matching quotes, even one in the first line.

-- 
David.

>
--------------------------------8<--------------------------------------------
> And this is a piece of code that produced this:
> 
>
--------------------------------8<--------------------------------------------
>> n = nrow(statdata)
>> extra = data.frame(kogdefline=rep(NA,n), kogClass = rep(NA,n), kogGroup
> = rep(NA,n))
>> subset = intersect(statdata$id, annot$id)
>> MR = match(subset, annot$id)
>> ML = match(subset, statdata$id)
> 
>> extra[ML,1] = as.character(annot[MR,2])
>> extra[ML,2] = as.character(annot[MR,3])
>> extra[ML,3] = as.character(annot[MR,4])
> # strangely, if I do    
> # extra[ML,] = as.character(annot[MR,2:4])
> # it produces digits (???) instead of a string value
> 
>> mergedData = data.frame(statdata, extra)
>> write.table(mergedData, 'filename.csv', sep=',')
>
--------------------------------8<--------------------------------------------
> 
> Any ideas why this is happening?
> 
> Many thanks
> -Igor
David Winsemius, MD
Alameda, CA, USA

jim holtman

2012-Sep-19 17:36 UTC

head link

[R] write.table: strange output has been produced

It would also be helpful if you could provide the output of 'str' for
all the objects that you are using.

e.g.,  str(statdata)    str(extra)


Also in creating your data.frame, use "stringsAsFactors = FALSE":

extra = data.frame(kogdefline=rep(NA,n)
    , kogClass = rep(NA,n)
    , kogGroup = rep(NA,n)
    , stringsAsFactors = FALSE
)

On Wed, Sep 19, 2012 at 12:12 PM, Igor <igorc at essex.ac.uk>
wrote:> Good afternoon all -
>
> While making a steady progress in learning R after Matlab I encountered
> a problem which seems to require some extra help to move over.
> Basically I want to merge a data from biological statistical dataset
> with annotation data extracted from another dataset using an 'id'
> crossreference and write it to report file. The first part goes
> absolutely fine, I have merged both data into data.frame but when I try
> to write it to csv file using 'write.table' it seems like it does
write
> the 'data.frame' object but it also insert some parts from the
> annotation data which are not suppose to be there...
> There is a little snapshot of the file output below to illustrate. The
> upper half is fine, that's how it should be. The lower half, which is
> actually appears to be space-separated, not coma, obviously grabbed from
> the annotation dataset and is not supposed to be here.
>
>
--------------------------------8<--------------------------------------------
>
"344","166128",126.44286392082,179.904700814932,72.9810270267088,0.40566492535281,-1.3016395254146,2.47449355237252e-07,4.2901159299567e-06,"Chitinas
>
"18816","238247",92.5282508325735,135.981255262454,49.0752464026927,0.36089714209487,-1.47034037615176,2.5330054329543e-07,4.38862252337004e-06,"Prot
>
"22072","222365",30.8191942806426,52.4262903365628,9.21209822472236,0.17571524068522,-2.50868876576414,2.54433836512085e-07,4.40531098485028e-06,NA,N
>
"25062","226605",30.808007579908,50.3976662241578,11.2183489356581,0.22259659575825,-2.16749656564076,2.54934711860645e-07,4.41103467375713e-06,NA,NA
>
"7539","247009",75.4175439970731,34.4643221134552,116.370765880691,3.37655751642533,1.75555313265164,2.60010673210741e-07,4.49585878338091e-06,NA,NA,
>
"407","267139",425.559675915702,279.393013150954,571.72633868045,2.04631580522577,1.03302881149302,2.61074218843609e-07,4.51123710239304e-06,NA,NA,NA
>
"26530","171300",146.80096060985,80.0063286553601,213.595592564339,2.66973370924738,1.4166958484644,2.68061220749976e-07,4.62888115991058e-06,NA,NA,N
>
"3078","159013",34.3260176515511,52.4580790080106,16.1939562950917,0.308702808057816,-1.69570948866688,2.69104298652827e-07,4.64379716436078e-06,"40S
>
"4657","159998",133.10761487064,185.450704462326,80.7645252789532,0.435504009074069,-1.19924209513405,2.75544399955331e-07,4.75176501174632e-06,"IMP-
>
> 171597  171597  KOG1347 Uncharacterized membrane protein, predicted
> efflux pump General function prediction only    POORLY CHARACTERIZED
> 171658  171658  KOG4290 Predicted membrane protein  Function unknown
> POORLY CHARACTERIZED
> 171660  171660  KOG0903 Phosphatidylinositol 4-kinase, involved in
> intracellular trafficking and secretion  Signal transduction mechanisms
> CELLULAR
> 171660  171660  KOG0903 Phosphatidylinositol 4-kinase, involved in
> intracellular trafficking and secretion  Intracellular trafficking,
> secretion, and
> 171703  171703  KOG2674 Cysteine protease required for autophagy -
> Apg4p/Aut2p  Cytoskeleton    CELLULAR PROCESSES AND SIGNALING
> 171703  171703  KOG2674 Cysteine protease required for autophagy -
> Apg4p/Aut2p  Intracellular trafficking, secretion, and vesicular
> transport   CELLU
> and metabolism     METABOLISM
>
--------------------------------8<--------------------------------------------
> And this is a piece of code that produced this:
>
>
--------------------------------8<--------------------------------------------
>>n = nrow(statdata)
>>extra = data.frame(kogdefline=rep(NA,n), kogClass = rep(NA,n), kogGroup
> = rep(NA,n))
>>subset = intersect(statdata$id, annot$id)
>>MR = match(subset, annot$id)
>>ML = match(subset, statdata$id)
>
>>extra[ML,1] = as.character(annot[MR,2])
>>extra[ML,2] = as.character(annot[MR,3])
>>extra[ML,3] = as.character(annot[MR,4])
> # strangely, if I do
> # extra[ML,] = as.character(annot[MR,2:4])
> # it produces digits (???) instead of a string value
>
>>mergedData = data.frame(statdata, extra)
>>write.table(mergedData, 'filename.csv', sep=',')
>
--------------------------------8<--------------------------------------------
>
> Any ideas why this is happening?
>
> Many thanks
> -Igor
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

Maybe Matching Threads

Search for more reasonably related threads

R help - Sep 2012 - write.table: strange output has been produced

[R] write.table: strange output has been produced

[R] write.table: strange output has been produced

[R] write.table: strange output has been produced

Maybe Matching Threads