Michael Glanville
2010-Mar-17 14:23 UTC
[R] Converting "factors" to "numeric" in a dataframe
I am currently trying to write a program that minimises the amount of work required for “auditable” qPCR data. At the moment I am using an Excel (.csv) spreadsheet as source data that has been transposed to the column format required for R to read. Unfortunately, this means I have* *to manually confirm the whole data set prior to doing any analysis, which is taking a considerable amount of time! My idea now is to read the raw data in directly and get R to do the transformation prior to analysis. The problem I now have is that, upon transposition, the data are converted to “character” in a matrix, rather than “factor” and “numeric” in a dataframe. I have succeeded in changing the matrix to a dataframe (via as.data.frame(object)), but this then converts all the data to “factor” which I can’t use for my analysis since, other than the column headings, I need the data to be numeric. I have tried coercing the data to numeric using the as() and as.numeric() commands, but this has no effect on the data format. I have no experience in programming and so am at a loss as to what to do: am I making a basic error in my programming or missing something essential (or both!)? I am using R version 2.9.0 at the moment, but this will change as soon as I have sorted this issue out. Below is the code I have put together, as you can see it is VERY brief but essential to allow my analysis to proceed: pcrdata<-read.csv("File_path",header=FALSE) pcrdata<-as.data.frame(t(pcrdata)) pcrdata[2:51]<-as.numeric(as.character(pcrdata)) Any help would be gratefully appreciated, Mike Glanville [[alternative HTML version deleted]]
Hi! I don't really understand why you do pcrdata<-as.data.frame(t(pcrdata)) Do you need to transpose the dataset? Because read.csv() creates a dataframe already. Something I found really useful recently is the package xlsReadWrite where the function read.xls() has an argument colClasses (read.table() and read.csv() have it too, but it never worked fine for me) which would allow you to specify the class of each column at reading. Is this what you were looking for or am I completely wrong? By the way, you could send us the output from str(pcrdata), I mean just after reading in the data Ivan Le 3/17/2010 15:23, Michael Glanville a écrit :> I am currently trying to write a program that minimises the amount of work > required for "auditable" qPCR data. At the moment I am using an Excel (.csv) > spreadsheet as source data that has been transposed to the column format > required for R to read. Unfortunately, this means I have* *to manually > confirm the whole data set prior to doing any analysis, which is taking a > considerable amount of time! My idea now is to read the raw data in directly > and get R to do the transformation prior to analysis. The problem I now have > is that, upon transposition, the data are converted to "character" in a > matrix, rather than "factor" and "numeric" in a dataframe. I have succeeded > in changing the matrix to a dataframe (via as.data.frame(object)), but this > then converts all the data to "factor" which I can't use for my analysis > since, other than the column headings, I need the data to be numeric. I have > tried coercing the data to numeric using the as() and as.numeric() commands, > but this has no effect on the data format. I have no experience in > programming and so am at a loss as to what to do: am I making a basic error > in my programming or missing something essential (or both!)? > > > > I am using R version 2.9.0 at the moment, but this will change as soon as I > have sorted this issue out. Below is the code I have put together, as you > can see it is VERY brief but essential to allow my analysis to proceed: > > > > pcrdata<-read.csv("File_path",header=FALSE) > > pcrdata<-as.data.frame(t(pcrdata)) > > pcrdata[2:51]<-as.numeric(as.character(pcrdata)) > > > > Any help would be gratefully appreciated, > > > > Mike Glanville > > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calandra@uni-hamburg.de ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php [[alternative HTML version deleted]]
Petr PIKAL
2010-Mar-17 15:13 UTC
[R] Odp: Converting "factors" to "numeric" in a dataframe
Hi r-help-bounces at r-project.org napsal dne 17.03.2010 15:23:34:> I am currently trying to write a program that minimises the amount ofwork> required for ?auditable? qPCR data. At the moment I am using an Excel(.csv)> spreadsheet as source data that has been transposed to the column format > required for R to read. Unfortunately, this means I have* *to manually > confirm the whole data set prior to doing any analysis, which is takinga> considerable amount of time! My idea now is to read the raw data indirectly> and get R to do the transformation prior to analysis. The problem I nowhave> is that, upon transposition, the data are converted to ?character? in a > matrix, rather than ?factor? and ?numeric? in a dataframe. I havesucceeded> in changing the matrix to a dataframe (via as.data.frame(object)), butthis> then converts all the data to ?factor? which I can?t use for my analysis > since, other than the column headings, I need the data to be numeric. Ihave> tried coercing the data to numeric using the as() and as.numeric()commands,> but this has no effect on the data format. I have no experience in > programming and so am at a loss as to what to do: am I making a basicerror> in my programming or missing something essential (or both!)? > > > > I am using R version 2.9.0 at the moment, but this will change as soonas I> have sorted this issue out. Below is the code I have put together, asyou> can see it is VERY brief but essential to allow my analysis to proceed: > > > > pcrdata<-read.csv("File_path",header=FALSE)^^^^^^^^^^^^^^^^ This is supposed to be data frame already. As you did not show us any of possible clues of data type like str(pcrdata) it is difficult to say. However from your description your original data are in columns which have numeric and character data together which is not possible. I believe that there are options for reading such data.> > pcrdata<-as.data.frame(t(pcrdata))OK. Here you say you get data in columns but they are all character.> > pcrdata[2:51]<-as.numeric(as.character(pcrdata)) >Here it depends whether they are all numeric or if some of them shall be character (factor). Functions like those above can not be used directly on data frames. You need to use apply. apply(pcrdata, 1, as.character) Exact sequence of required functions is impossible to guess without knowing structure of your objects. You shall also consult R intro and R data manuals. Regards Petr> > > Any help would be gratefully appreciated, > > > > Mike Glanville > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Hi as you did not provide data it is hard to say what is wrong. You can see that it is working on dates similar what you described. test <- data.frame(x=letters[1:10], y=rnorm(10), z=runif(10)) test x y z 1 a 0.09980806 0.32211567 2 b 0.70559139 0.32204076 3 c -1.50514354 0.28267338 ............ testnum<-data.frame(t(test[,-1])) names(testnum)<-test[,1] testnum a b c d e f g y 0.09980806 0.7055914 -1.5051435 0.4421971 0.1041789 -1.54683799 0.3405809 z 0.32211567 0.3220408 0.2826734 0.8819248 0.5189688 0.05171076 0.4583101 h i j y 2.3137394 -0.4953507 0.7668954 z 0.7515886 0.5876854 0.4192073 Regards Petr Michael Glanville <michael.glanville19 at googlemail.com> napsal dne 18.03.2010 12:37:21:> Thanks Petr, your suggestion has worked to a certain extent. The onlyissue is> that the sample names don't appear in the final dataframe. However, Iwill> persevere and see what I can do. > > Many thanks for you invaluable help, > > Mike> On 18 March 2010 11:23, Petr PIKAL <petr.pikal at precheza.cz> wrote: > Hi Michael > > > r-help-bounces at r-project.org napsal dne 18.03.2010 12:02:19: > > > Hi petr, > > > > Thanks for the reply. > > > > My original data is in "comma separated variable" (csv) format with > variable > > names in column 1 and numeric data in the remaining columns. The > "read.csv" > > command reads this data set into object name "pcrdata" as a dataframe > where > > the variable names and numeric data are conserved (as required). > However,> So why not transpose only numeric part, then set it to data frame andadd> column names from first column. > > Something like (untested) > > pcrdata<-read.csv("File_path",header=FALSE)> numdata<-t(pcrdata[,-1]) > numdata<-data.frame(numdata) > names(numdata) <- pcrdata[,1] > > Regards > Petr >