Hi All, Is there some document/manual about data manipulation within R that I could use as a reference (obviously, aside the R manuals)? The reason I am asking is that I have a number of data frames/matrices containg genetic data. The data is in a character form, as in: V1 V2 V3 V4 V5 1 AA AG AA GG AG 2 AC AA AA GG AG 3 AA AG AA GG AG 4 AA AA AA GG AG 5 AA AA AA GG AA I need, to chop, subset, and variously manipulate this kind of data, sometimes keeping the data in its character format, sometimes converting it to numeric form (i.e. substitute each data point with the equivalent factor value). Since the data is ofthe quite big, I have to keep things memory efficient. This whole game is getting excedingly time consuming and frustrating, because I end up with random pieces of code that I save, patching a particular problem, but difficult to be 'abstracted' for a new task, so I get back close to square one annoyingly often. Cheers, Federico Calboli -- Federico C. F. Calboli Department of Epidemiology and Public Health Imperial College, St Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com
On Thursday May 4 2006 10:20, Federico Calboli wrote:> The reason I am asking is that I have a number of data frames/matrices > containg genetic data. The data is in a character form, as in:Take a look at the Bioconductor project: "Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data." http://www.bioconductor.org/> This whole game is getting excedingly time consuming and frustrating, > because I end up with random pieces of code that I save, patching a > particular problem, but difficult to be 'abstracted' for a new task, so > I get back close to square one annoyingly often.This sounds like a software engineering problem, not an R problem. Does Imperial have a computer science dept.? Maybe they could advise on software engineering techniques. Larry Howe
Federico Calboli wrote:> Hi All, > > Is there some document/manual about data manipulation within R that I > could use as a reference (obviously, aside the R manuals)? > > The reason I am asking is that I have a number of data frames/matrices > containg genetic data. The data is in a character form, as in: > > V1 V2 V3 V4 V5 > 1 AA AG AA GG AG > 2 AC AA AA GG AG > 3 AA AG AA GG AG > 4 AA AA AA GG AG > 5 AA AA AA GG AA > > I need, to chop, subset, and variously manipulate this kind of data, > sometimes keeping the data in its character format, sometimes converting > it to numeric form (i.e. substitute each data point with the equivalent > factor value). Since the data is ofthe quite big, I have to keep things > memory efficient. > > This whole game is getting excedingly time consuming and frustrating, > because I end up with random pieces of code that I save, patching a > particular problem, but difficult to be 'abstracted' for a new task, so > I get back close to square one annoyingly often. > > Cheers, > > Federico Calboli > >There is a large data manipulation section on the Alzola Harrell document available on CRAN under contributed docs, or a slightly more up to date version at biostat.mc.vanderbilt.edu -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University