thr3ads.net - R help - [R] Fwd: which is faster "for" or "apply" [Dec 2014]

If this information is useful, please help other people find it:
Share via:

Karim Mezhoud

2014-Dec-31 16:24 UTC

[R] Fwd: which is faster "for" or "apply"

Concretely I request cbioportal through cgsdr package.
Depending of Cases and Genetic profiles I receive in general data.frame
with heterogeneous structure. The bad one if the returned data.frame is
composed by numeric and character columns. in this case numeric columns are
considered as  factor. It is the case when I explore/extract information
from Clinical Data (Age, gender., tumor stage..). In this case I need to
convert only numeric column and not character ones. I am using
grep("[0-9]*.[0-9]*",df[,i])!=0 {fun to convert}.

 But this heterogeneity  comes even with only supposed numeric data.frame
(gene expression). here an example


library(cgdsr)
GeneList <- c("DDR2", "HPGDS",
"MS4A2","SSUH2","MLH1" ,"MSH2",
"ATM"
,"ATR", "MDC1" ,"PARP1")
cgds<-CGDS("http://www.cbioportal.org/public-portal/")

str(getProfileData(cgds,GeneList,
"stad_tcga_methylation_hm27","stad_tcga_methylation_hm27"))

str(getProfileData(cgds,GeneList,
"stad_tcga_methylation_hm450","stad_tcga_methylation_hm450"))


With my computer I did not find the same structure (numeric vs factor).

Also I need to preserve row and column names ;)
So I am working to resolve these details depending on data of cbioportal...

Thank you


  ?__
 c/ /'_;~~~~kmezhoud
(*) \(*)   ?????  ??????
http://bioinformatics.tn/



On Wed, Dec 31, 2014 at 4:37 PM, Karim Mezhoud <kmezhoud at gmail.com>
wrote:
> Many Many Many thanks!
> it is a demonstrative lesson. I need time to  test all examples :)
> Thank you for your time and support.
> Happy and Healthy New Year
>
>   ?__
>  c/ /'_;~~~~kmezhoud
> (*) \(*)   ?????  ??????
> http://bioinformatics.tn/
>
>
>
> On Wed, Dec 31, 2014 at 2:38 PM, Martin Morgan <mtmorgan at
fredhutch.org>
> wrote:
>
>> On 12/31/2014 12:22 AM, Karim Mezhoud wrote:
>>
>>> Thanks,
>>> It seems for loop spends less time ;)
>>>
>>> with
>>> dim(DataFrame)
>>> [1] 338  70
>>>
>>> For loop has
>>>     user  system elapsed
>>>    0.012   0.000   0.012
>>>
>>> and apply has
>>>    user  system elapsed
>>>    0.020   0.000   0.021
>>>
>>
>> The timings are so short that the answer in terms of speed is 'it
does
>> not matter'.
>>
>> Here is a selection of approaches
>>
>> f0 <- function(df) {
>>     for (i in seq_along(df))
>>         df[,i] <- as.numeric(df[,i])
>>     df
>> }
>>
>> f0a <- function(df) {
>>     ## data.frame is a list-of-equal-length vectors; access each
>>     ## column with "[["
>>     for (i in seq_along(df))
>>         df[[i]] <- as.numeric(df[[i]])
>>     df
>> }
>>
>> f0c <- compiler::cmpfun(f0)  ## loops sometimes benefit from
compilation
>>
>> f1 <- function(df)
>>     as.data.frame(apply(df, 2, as.numeric))
>>
>> f2 <- function(df) {
>>     ## replace all columns of df with list-of-vectors
>>     df[] <- lapply(df, as.numeric)
>>     df
>> }
>>
>> f3 <- function(df) {
>>     ## coerce to matrix to avoid the explicit loop, use mode<- to
>>     ## change storage of elements
>>     m <- as.matrix(df)
>>     mode(m) <- "numeric"
>>     as.data.frame(m)
>> }
>>
>> f4 <- function(df) {
>>     ## if it's a matrix, why are we returning a data.frame?
>>     m <- as.matrix(df)
>>     mode(m) <- "numeric"
>>     m
>> }
>>
>> f4a <- function(df)
>>     ## unlist to single vector, coerce, then format as matrix
>>     matrix(as.numeric(unlist(df, use.names=FALSE)), nrow(df),
>>            dimnames=dimnames(df))
>>
>> It's important to test that different methods return the same
result
>> (perhaps allowing for differences in attributes such as row or column
>> names). The microbenchmark package repeats timings across multiple
trials
>> (default 100 times).
>>
>> library(microbenchmark)
>> test <- function(df) {
>>     stopifnot(
>>         identical(f0(df), f0a(df)),
>>         identical(f0(df), f0c(df)),
>>         identical(f0(df), f1(df)),
>>         identical(f0(df), f2(df)),
>>         identical(f0(df), f3(df)),
>>         identical(as.matrix(f0(df)), f4(df)),
>>         all.equal(f4(df), f4a(df), check.attributes=FALSE))
>>     microbenchmark(f0(df), f0a(df), f1(df), f2(df), f3(df), f4(df),
>> f4a(df))
>> }
>>
>> Here are some data sets
>>
>> m <- matrix(rnorm(338 * 70), 338)
>> df <- as.data.frame(m)
>> dfc <- as.data.frame(lapply(df, as.character),
stringsAsFactors=FALSE)
>> dff <- as.data.frame(lapply(df, as.character))
>>
>> and results
>>
>> > test(df)
>> Unit: microseconds
>>     expr      min        lq      mean    median        uq      max
neval
>>   f0(df) 6208.956 6270.5500 6367.4138 6306.7110 6362.2225 7731.281  
100
>>  f0a(df) 2917.973 2975.2090 3024.8623 3002.3805 3036.5365 3951.618  
100
>>  f0c(df) 6078.399 6150.1085 6264.0998 6188.3690 6244.5725 7684.116  
100
>>   f1(df) 2698.074 2743.2905 2821.8453 2769.3655 2805.5345 4033.229  
100
>>   f2(df) 1989.057 2041.0685 2066.1830 2055.0020 2083.8545 2267.732  
100
>>   f3(df) 1532.435 1572.9810 1609.7378 1597.6245 1624.2305 2003.584  
100
>>   f4(df)  808.593  828.5445  852.2626  847.5355  864.6665 1180.977  
100
>>  f4a(df)  422.657  437.2705  458.9845  455.2470  465.5815  695.443  
100
>> > test(dfc)
>> Unit: milliseconds
>>     expr       min        lq      mean    median        uq       max
neval
>>   f0(df) 11.416532 11.647858 11.915287 11.767647 12.016276 14.239622
>>  100
>>  f0a(df)  8.095709  8.211116  8.380638  8.289895  8.454948  9.529026  
100
>>  f0c(df) 11.339293 11.577811 11.772087 11.702341 11.896729 12.674766
>>  100
>>   f1(df)  8.227371  8.277147  8.422412  8.331403  8.490411  9.145499  
100
>>   f2(df)  6.907888  7.010828  7.162529  7.147198  7.239048  7.763758  
100
>>   f3(df)  6.608107  6.688232  6.845936  6.792066  6.892635  8.359274  
100
>>   f4(df)  5.859482  5.939680  6.046976  5.993804  6.105388  6.968601  
100
>>  f4a(df)  5.372214  5.460987  5.556687  5.521542  5.614482  6.107081  
100
>> > test(dff)
>> Error: identical(f0(df), f1(df)) is not TRUE
>>
>> Except when dealing with factors, the use of explicit loops is the
>> slowest. With factors, matrix-based methods coerce the level labels to
>> numeric, whereas vector-based methods coerce the underlying codes
(level
>> values) of the factor; obviously great care needs to be taken.
>>
>> > f0(dff)[1:5, 1:5]
>>    V1  V2  V3  V4  V5
>> 1 150 232 294  88  56
>> 2 159   8  89  59  10
>> 3 132 171  40 205 119
>> 4 214 273  26 262 216
>> 5 281  49 255  31 233
>> > f1(dff)[1:5, 1:5]
>>           V1          V2         V3         V4          V5
>> 1 -1.7092463 0.50234009  0.8492982 -0.5636901 -0.38545566
>> 2 -2.3020854 -0.05580931 -0.5963673 -0.3671748 -0.09408031
>> 3 -1.2915110 -2.46181533 -0.2470108 0.3301129 -1.06810225
>> 4  0.3065989 0.89263099 -0.1717432  0.7721411 0.35856334
>> 5  0.8795616 -0.43049898  0.4560515 -0.1722099  0.46125149
>>
>> In terms of 'best practice', I would represent my data in the
appropriate
>> data structure in the first place (as a matrix of appropriate type,
rather
>> than data.frame, so the entire coercion is irrelevant). If faced with a
>> data.frame with specific columns to coerce I would use the approach
>>
>>     cidx <- sapply(df, is.character)      # index of columns to
coerce
>>     df[cidx] <- lapply(df[cidx], as.numeric)
>>
>> which seems to be reasonably correct, expressive, compact, and speedy.
>>
>> Martin Morgan
>>
>>
>>
>>>    ?__
>>>   c/ /'_;~~~~kmezhoud
>>> (*) \(*)   ?????  ??????
>>> http://bioinformatics.tn/
>>>
>>>
>>>
>>> On Wed, Dec 31, 2014 at 8:54 AM, Berend Hasselman <bhh at
xs4all.nl> wrote:
>>>
>>>
>>>>  On 31-12-2014, at 08:40, Karim Mezhoud <kmezhoud at
gmail.com> wrote:
>>>>>
>>>>> Hi All,
>>>>> I would like to choice between these two data frame
convert. which is
>>>>> faster?
>>>>>
>>>>>    for(i in 1:ncol(DataFrame)){
>>>>>
>>>>>                     DataFrame[,i] <-
as.numeric(DataFrame[,i])
>>>>>                 }
>>>>>
>>>>>
>>>>> OR
>>>>>
>>>>> DataFrame <- as.data.frame(apply(DataFrame,2
,function(x)
>>>>> as.numeric(x)))
>>>>>
>>>>>
>>>>>
>>>> Try it and use system.time.
>>>>
>>>> Berend
>>>>
>>>>  Thanks
>>>>> Karim
>>>>>   ?__
>>>>> c/ /'_;~~~~kmezhoud
>>>>> (*) \(*)   ?????  ??????
>>>>> http://bioinformatics.tn/
>>>>>
>>>>>        [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>>
>>>> http://www.R-project.org/posting-guide.html
>>>>
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>
>>>>
>>>>
>>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
>
>
	[[alternative HTML version deleted]]

Karim Mezhoud

2014-Dec-31 16:51 UTC

head link

[R] Fwd: which is faster "for" or "apply"

Yes the last one this the best. But I need to test if returned data.frame
is with factor or character:
  cidx <- sapply(df, is.factor) or cidx <- sapply(df, is.character)
Thanks

  ?__
 c/ /'_;~~~~kmezhoud
(*) \(*)   ?????  ??????
http://bioinformatics.tn/



On Wed, Dec 31, 2014 at 5:24 PM, Karim Mezhoud <kmezhoud at gmail.com>
wrote:
> Concretely I request cbioportal through cgsdr package.
> Depending of Cases and Genetic profiles I receive in general data.frame
> with heterogeneous structure. The bad one if the returned data.frame is
> composed by numeric and character columns. in this case numeric columns are
> considered as  factor. It is the case when I explore/extract information
> from Clinical Data (Age, gender., tumor stage..). In this case I need to
> convert only numeric column and not character ones. I am using
> grep("[0-9]*.[0-9]*",df[,i])!=0 {fun to convert}.
>
>  But this heterogeneity  comes even with only supposed numeric data.frame
> (gene expression). here an example
>
>
> library(cgdsr)
> GeneList <- c("DDR2", "HPGDS",
"MS4A2","SSUH2","MLH1" ,"MSH2",
"ATM"
> ,"ATR", "MDC1" ,"PARP1")
> cgds<-CGDS("http://www.cbioportal.org/public-portal/")
>
> str(getProfileData(cgds,GeneList,
>
"stad_tcga_methylation_hm27","stad_tcga_methylation_hm27"))
>
> str(getProfileData(cgds,GeneList,
>
"stad_tcga_methylation_hm450","stad_tcga_methylation_hm450"))
>
>
> With my computer I did not find the same structure (numeric vs factor).
>
> Also I need to preserve row and column names ;)
> So I am working to resolve these details depending on data of cbioportal...
>
> Thank you
>
>
>   ?__
>  c/ /'_;~~~~kmezhoud
> (*) \(*)   ?????  ??????
> http://bioinformatics.tn/
>
>
>
> On Wed, Dec 31, 2014 at 4:37 PM, Karim Mezhoud <kmezhoud at
gmail.com> wrote:
>
>> Many Many Many thanks!
>> it is a demonstrative lesson. I need time to  test all examples :)
>> Thank you for your time and support.
>> Happy and Healthy New Year
>>
>>   ?__
>>  c/ /'_;~~~~kmezhoud
>> (*) \(*)   ?????  ??????
>> http://bioinformatics.tn/
>>
>>
>>
>> On Wed, Dec 31, 2014 at 2:38 PM, Martin Morgan <mtmorgan at
fredhutch.org>
>> wrote:
>>
>>> On 12/31/2014 12:22 AM, Karim Mezhoud wrote:
>>>
>>>> Thanks,
>>>> It seems for loop spends less time ;)
>>>>
>>>> with
>>>> dim(DataFrame)
>>>> [1] 338  70
>>>>
>>>> For loop has
>>>>     user  system elapsed
>>>>    0.012   0.000   0.012
>>>>
>>>> and apply has
>>>>    user  system elapsed
>>>>    0.020   0.000   0.021
>>>>
>>>
>>> The timings are so short that the answer in terms of speed is
'it does
>>> not matter'.
>>>
>>> Here is a selection of approaches
>>>
>>> f0 <- function(df) {
>>>     for (i in seq_along(df))
>>>         df[,i] <- as.numeric(df[,i])
>>>     df
>>> }
>>>
>>> f0a <- function(df) {
>>>     ## data.frame is a list-of-equal-length vectors; access each
>>>     ## column with "[["
>>>     for (i in seq_along(df))
>>>         df[[i]] <- as.numeric(df[[i]])
>>>     df
>>> }
>>>
>>> f0c <- compiler::cmpfun(f0)  ## loops sometimes benefit from
compilation
>>>
>>> f1 <- function(df)
>>>     as.data.frame(apply(df, 2, as.numeric))
>>>
>>> f2 <- function(df) {
>>>     ## replace all columns of df with list-of-vectors
>>>     df[] <- lapply(df, as.numeric)
>>>     df
>>> }
>>>
>>> f3 <- function(df) {
>>>     ## coerce to matrix to avoid the explicit loop, use mode<-
to
>>>     ## change storage of elements
>>>     m <- as.matrix(df)
>>>     mode(m) <- "numeric"
>>>     as.data.frame(m)
>>> }
>>>
>>> f4 <- function(df) {
>>>     ## if it's a matrix, why are we returning a data.frame?
>>>     m <- as.matrix(df)
>>>     mode(m) <- "numeric"
>>>     m
>>> }
>>>
>>> f4a <- function(df)
>>>     ## unlist to single vector, coerce, then format as matrix
>>>     matrix(as.numeric(unlist(df, use.names=FALSE)), nrow(df),
>>>            dimnames=dimnames(df))
>>>
>>> It's important to test that different methods return the same
result
>>> (perhaps allowing for differences in attributes such as row or
column
>>> names). The microbenchmark package repeats timings across multiple
trials
>>> (default 100 times).
>>>
>>> library(microbenchmark)
>>> test <- function(df) {
>>>     stopifnot(
>>>         identical(f0(df), f0a(df)),
>>>         identical(f0(df), f0c(df)),
>>>         identical(f0(df), f1(df)),
>>>         identical(f0(df), f2(df)),
>>>         identical(f0(df), f3(df)),
>>>         identical(as.matrix(f0(df)), f4(df)),
>>>         all.equal(f4(df), f4a(df), check.attributes=FALSE))
>>>     microbenchmark(f0(df), f0a(df), f1(df), f2(df), f3(df), f4(df),
>>> f4a(df))
>>> }
>>>
>>> Here are some data sets
>>>
>>> m <- matrix(rnorm(338 * 70), 338)
>>> df <- as.data.frame(m)
>>> dfc <- as.data.frame(lapply(df, as.character),
stringsAsFactors=FALSE)
>>> dff <- as.data.frame(lapply(df, as.character))
>>>
>>> and results
>>>
>>> > test(df)
>>> Unit: microseconds
>>>     expr      min        lq      mean    median        uq      max
neval
>>>   f0(df) 6208.956 6270.5500 6367.4138 6306.7110 6362.2225 7731.281 
100
>>>  f0a(df) 2917.973 2975.2090 3024.8623 3002.3805 3036.5365 3951.618 
100
>>>  f0c(df) 6078.399 6150.1085 6264.0998 6188.3690 6244.5725 7684.116 
100
>>>   f1(df) 2698.074 2743.2905 2821.8453 2769.3655 2805.5345 4033.229 
100
>>>   f2(df) 1989.057 2041.0685 2066.1830 2055.0020 2083.8545 2267.732 
100
>>>   f3(df) 1532.435 1572.9810 1609.7378 1597.6245 1624.2305 2003.584 
100
>>>   f4(df)  808.593  828.5445  852.2626  847.5355  864.6665 1180.977 
100
>>>  f4a(df)  422.657  437.2705  458.9845  455.2470  465.5815  695.443 
100
>>> > test(dfc)
>>> Unit: milliseconds
>>>     expr       min        lq      mean    median        uq      
max
>>> neval
>>>   f0(df) 11.416532 11.647858 11.915287 11.767647 12.016276
14.239622
>>>  100
>>>  f0a(df)  8.095709  8.211116  8.380638  8.289895  8.454948 
9.529026
>>>  100
>>>  f0c(df) 11.339293 11.577811 11.772087 11.702341 11.896729
12.674766
>>>  100
>>>   f1(df)  8.227371  8.277147  8.422412  8.331403  8.490411 
9.145499
>>>  100
>>>   f2(df)  6.907888  7.010828  7.162529  7.147198  7.239048 
7.763758
>>>  100
>>>   f3(df)  6.608107  6.688232  6.845936  6.792066  6.892635 
8.359274
>>>  100
>>>   f4(df)  5.859482  5.939680  6.046976  5.993804  6.105388 
6.968601
>>>  100
>>>  f4a(df)  5.372214  5.460987  5.556687  5.521542  5.614482 
6.107081
>>>  100
>>> > test(dff)
>>> Error: identical(f0(df), f1(df)) is not TRUE
>>>
>>> Except when dealing with factors, the use of explicit loops is the
>>> slowest. With factors, matrix-based methods coerce the level labels
to
>>> numeric, whereas vector-based methods coerce the underlying codes
(level
>>> values) of the factor; obviously great care needs to be taken.
>>>
>>> > f0(dff)[1:5, 1:5]
>>>    V1  V2  V3  V4  V5
>>> 1 150 232 294  88  56
>>> 2 159   8  89  59  10
>>> 3 132 171  40 205 119
>>> 4 214 273  26 262 216
>>> 5 281  49 255  31 233
>>> > f1(dff)[1:5, 1:5]
>>>           V1          V2         V3         V4          V5
>>> 1 -1.7092463 0.50234009  0.8492982 -0.5636901 -0.38545566
>>> 2 -2.3020854 -0.05580931 -0.5963673 -0.3671748 -0.09408031
>>> 3 -1.2915110 -2.46181533 -0.2470108 0.3301129 -1.06810225
>>> 4  0.3065989 0.89263099 -0.1717432  0.7721411 0.35856334
>>> 5  0.8795616 -0.43049898  0.4560515 -0.1722099  0.46125149
>>>
>>> In terms of 'best practice', I would represent my data in
the
>>> appropriate data structure in the first place (as a matrix of
appropriate
>>> type, rather than data.frame, so the entire coercion is
irrelevant). If
>>> faced with a data.frame with specific columns to coerce I would use
the
>>> approach
>>>
>>>     cidx <- sapply(df, is.character)      # index of columns to
coerce
>>>     df[cidx] <- lapply(df[cidx], as.numeric)
>>>
>>> which seems to be reasonably correct, expressive, compact, and
speedy.
>>>
>>> Martin Morgan
>>>
>>>
>>>
>>>>    ?__
>>>>   c/ /'_;~~~~kmezhoud
>>>> (*) \(*)   ?????  ??????
>>>> http://bioinformatics.tn/
>>>>
>>>>
>>>>
>>>> On Wed, Dec 31, 2014 at 8:54 AM, Berend Hasselman <bhh at
xs4all.nl>
>>>> wrote:
>>>>
>>>>
>>>>>  On 31-12-2014, at 08:40, Karim Mezhoud <kmezhoud at
gmail.com> wrote:
>>>>>>
>>>>>> Hi All,
>>>>>> I would like to choice between these two data frame
convert. which is
>>>>>> faster?
>>>>>>
>>>>>>    for(i in 1:ncol(DataFrame)){
>>>>>>
>>>>>>                     DataFrame[,i] <-
as.numeric(DataFrame[,i])
>>>>>>                 }
>>>>>>
>>>>>>
>>>>>> OR
>>>>>>
>>>>>> DataFrame <- as.data.frame(apply(DataFrame,2
,function(x)
>>>>>> as.numeric(x)))
>>>>>>
>>>>>>
>>>>>>
>>>>> Try it and use system.time.
>>>>>
>>>>> Berend
>>>>>
>>>>>  Thanks
>>>>>> Karim
>>>>>>   ?__
>>>>>> c/ /'_;~~~~kmezhoud
>>>>>> (*) \(*)   ?????  ??????
>>>>>> http://bioinformatics.tn/
>>>>>>
>>>>>>        [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>>
>>>>> http://www.R-project.org/posting-guide.html
>>>>>
>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>> posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>>>
>>>
>>> --
>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N.
>>> PO Box 19024 Seattle, WA 98109
>>>
>>> Location: Arnold Building M1 B861
>>> Phone: (206) 667-2793
>>>
>>
>>
>
	[[alternative HTML version deleted]]

Karim Mezhoud

2014-Dec-31 16:55 UTC

head link

[R] Fwd: which is faster "for" or "apply"

for both
cidx <- !(sapply(df, is.numeric))
df[cidx] <- lapply(df[cidx], as.numeric)


  ?__
 c/ /'_;~~~~kmezhoud
(*) \(*)   ?????  ??????
http://bioinformatics.tn/



On Wed, Dec 31, 2014 at 5:51 PM, Karim Mezhoud <kmezhoud at gmail.com>
wrote:
> Yes the last one this the best. But I need to test if returned data.frame
> is with factor or character:
>   cidx <- sapply(df, is.factor) or cidx <- sapply(df, is.character)
> Thanks
>
>   ?__
>  c/ /'_;~~~~kmezhoud
> (*) \(*)   ?????  ??????
> http://bioinformatics.tn/
>
>
>
> On Wed, Dec 31, 2014 at 5:24 PM, Karim Mezhoud <kmezhoud at
gmail.com> wrote:
>
>> Concretely I request cbioportal through cgsdr package.
>> Depending of Cases and Genetic profiles I receive in general data.frame
>> with heterogeneous structure. The bad one if the returned data.frame is
>> composed by numeric and character columns. in this case numeric columns
are
>> considered as  factor. It is the case when I explore/extract
information
>> from Clinical Data (Age, gender., tumor stage..). In this case I need
to
>> convert only numeric column and not character ones. I am using
>> grep("[0-9]*.[0-9]*",df[,i])!=0 {fun to convert}.
>>
>>  But this heterogeneity  comes even with only supposed numeric
data.frame
>> (gene expression). here an example
>>
>>
>> library(cgdsr)
>> GeneList <- c("DDR2", "HPGDS",
"MS4A2","SSUH2","MLH1" ,"MSH2",
"ATM"
>> ,"ATR", "MDC1" ,"PARP1")
>> cgds<-CGDS("http://www.cbioportal.org/public-portal/")
>>
>> str(getProfileData(cgds,GeneList,
>>
"stad_tcga_methylation_hm27","stad_tcga_methylation_hm27"))
>>
>> str(getProfileData(cgds,GeneList,
>>
"stad_tcga_methylation_hm450","stad_tcga_methylation_hm450"))
>>
>>
>> With my computer I did not find the same structure (numeric vs factor).
>>
>> Also I need to preserve row and column names ;)
>> So I am working to resolve these details depending on data of
>> cbioportal...
>>
>> Thank you
>>
>>
>>   ?__
>>  c/ /'_;~~~~kmezhoud
>> (*) \(*)   ?????  ??????
>> http://bioinformatics.tn/
>>
>>
>>
>> On Wed, Dec 31, 2014 at 4:37 PM, Karim Mezhoud <kmezhoud at
gmail.com>
>> wrote:
>>
>>> Many Many Many thanks!
>>> it is a demonstrative lesson. I need time to  test all examples :)
>>> Thank you for your time and support.
>>> Happy and Healthy New Year
>>>
>>>   ?__
>>>  c/ /'_;~~~~kmezhoud
>>> (*) \(*)   ?????  ??????
>>> http://bioinformatics.tn/
>>>
>>>
>>>
>>> On Wed, Dec 31, 2014 at 2:38 PM, Martin Morgan <mtmorgan at
fredhutch.org>
>>> wrote:
>>>
>>>> On 12/31/2014 12:22 AM, Karim Mezhoud wrote:
>>>>
>>>>> Thanks,
>>>>> It seems for loop spends less time ;)
>>>>>
>>>>> with
>>>>> dim(DataFrame)
>>>>> [1] 338  70
>>>>>
>>>>> For loop has
>>>>>     user  system elapsed
>>>>>    0.012   0.000   0.012
>>>>>
>>>>> and apply has
>>>>>    user  system elapsed
>>>>>    0.020   0.000   0.021
>>>>>
>>>>
>>>> The timings are so short that the answer in terms of speed is
'it does
>>>> not matter'.
>>>>
>>>> Here is a selection of approaches
>>>>
>>>> f0 <- function(df) {
>>>>     for (i in seq_along(df))
>>>>         df[,i] <- as.numeric(df[,i])
>>>>     df
>>>> }
>>>>
>>>> f0a <- function(df) {
>>>>     ## data.frame is a list-of-equal-length vectors; access
each
>>>>     ## column with "[["
>>>>     for (i in seq_along(df))
>>>>         df[[i]] <- as.numeric(df[[i]])
>>>>     df
>>>> }
>>>>
>>>> f0c <- compiler::cmpfun(f0)  ## loops sometimes benefit from
compilation
>>>>
>>>> f1 <- function(df)
>>>>     as.data.frame(apply(df, 2, as.numeric))
>>>>
>>>> f2 <- function(df) {
>>>>     ## replace all columns of df with list-of-vectors
>>>>     df[] <- lapply(df, as.numeric)
>>>>     df
>>>> }
>>>>
>>>> f3 <- function(df) {
>>>>     ## coerce to matrix to avoid the explicit loop, use
mode<- to
>>>>     ## change storage of elements
>>>>     m <- as.matrix(df)
>>>>     mode(m) <- "numeric"
>>>>     as.data.frame(m)
>>>> }
>>>>
>>>> f4 <- function(df) {
>>>>     ## if it's a matrix, why are we returning a data.frame?
>>>>     m <- as.matrix(df)
>>>>     mode(m) <- "numeric"
>>>>     m
>>>> }
>>>>
>>>> f4a <- function(df)
>>>>     ## unlist to single vector, coerce, then format as matrix
>>>>     matrix(as.numeric(unlist(df, use.names=FALSE)), nrow(df),
>>>>            dimnames=dimnames(df))
>>>>
>>>> It's important to test that different methods return the
same result
>>>> (perhaps allowing for differences in attributes such as row or
column
>>>> names). The microbenchmark package repeats timings across
multiple trials
>>>> (default 100 times).
>>>>
>>>> library(microbenchmark)
>>>> test <- function(df) {
>>>>     stopifnot(
>>>>         identical(f0(df), f0a(df)),
>>>>         identical(f0(df), f0c(df)),
>>>>         identical(f0(df), f1(df)),
>>>>         identical(f0(df), f2(df)),
>>>>         identical(f0(df), f3(df)),
>>>>         identical(as.matrix(f0(df)), f4(df)),
>>>>         all.equal(f4(df), f4a(df), check.attributes=FALSE))
>>>>     microbenchmark(f0(df), f0a(df), f1(df), f2(df), f3(df),
f4(df),
>>>> f4a(df))
>>>> }
>>>>
>>>> Here are some data sets
>>>>
>>>> m <- matrix(rnorm(338 * 70), 338)
>>>> df <- as.data.frame(m)
>>>> dfc <- as.data.frame(lapply(df, as.character),
stringsAsFactors=FALSE)
>>>> dff <- as.data.frame(lapply(df, as.character))
>>>>
>>>> and results
>>>>
>>>> > test(df)
>>>> Unit: microseconds
>>>>     expr      min        lq      mean    median        uq     
max neval
>>>>   f0(df) 6208.956 6270.5500 6367.4138 6306.7110 6362.2225
7731.281
>>>>  100
>>>>  f0a(df) 2917.973 2975.2090 3024.8623 3002.3805 3036.5365
3951.618
>>>>  100
>>>>  f0c(df) 6078.399 6150.1085 6264.0998 6188.3690 6244.5725
7684.116
>>>>  100
>>>>   f1(df) 2698.074 2743.2905 2821.8453 2769.3655 2805.5345
4033.229
>>>>  100
>>>>   f2(df) 1989.057 2041.0685 2066.1830 2055.0020 2083.8545
2267.732
>>>>  100
>>>>   f3(df) 1532.435 1572.9810 1609.7378 1597.6245 1624.2305
2003.584
>>>>  100
>>>>   f4(df)  808.593  828.5445  852.2626  847.5355  864.6665
1180.977   100
>>>>  f4a(df)  422.657  437.2705  458.9845  455.2470  465.5815 
695.443   100
>>>> > test(dfc)
>>>> Unit: milliseconds
>>>>     expr       min        lq      mean    median        uq     
max
>>>> neval
>>>>   f0(df) 11.416532 11.647858 11.915287 11.767647 12.016276
14.239622
>>>>  100
>>>>  f0a(df)  8.095709  8.211116  8.380638  8.289895  8.454948 
9.529026
>>>>  100
>>>>  f0c(df) 11.339293 11.577811 11.772087 11.702341 11.896729
12.674766
>>>>  100
>>>>   f1(df)  8.227371  8.277147  8.422412  8.331403  8.490411 
9.145499
>>>>  100
>>>>   f2(df)  6.907888  7.010828  7.162529  7.147198  7.239048 
7.763758
>>>>  100
>>>>   f3(df)  6.608107  6.688232  6.845936  6.792066  6.892635 
8.359274
>>>>  100
>>>>   f4(df)  5.859482  5.939680  6.046976  5.993804  6.105388 
6.968601
>>>>  100
>>>>  f4a(df)  5.372214  5.460987  5.556687  5.521542  5.614482 
6.107081
>>>>  100
>>>> > test(dff)
>>>> Error: identical(f0(df), f1(df)) is not TRUE
>>>>
>>>> Except when dealing with factors, the use of explicit loops is
the
>>>> slowest. With factors, matrix-based methods coerce the level
labels to
>>>> numeric, whereas vector-based methods coerce the underlying
codes (level
>>>> values) of the factor; obviously great care needs to be taken.
>>>>
>>>> > f0(dff)[1:5, 1:5]
>>>>    V1  V2  V3  V4  V5
>>>> 1 150 232 294  88  56
>>>> 2 159   8  89  59  10
>>>> 3 132 171  40 205 119
>>>> 4 214 273  26 262 216
>>>> 5 281  49 255  31 233
>>>> > f1(dff)[1:5, 1:5]
>>>>           V1          V2         V3         V4          V5
>>>> 1 -1.7092463 0.50234009  0.8492982 -0.5636901 -0.38545566
>>>> 2 -2.3020854 -0.05580931 -0.5963673 -0.3671748 -0.09408031
>>>> 3 -1.2915110 -2.46181533 -0.2470108 0.3301129 -1.06810225
>>>> 4  0.3065989 0.89263099 -0.1717432  0.7721411 0.35856334
>>>> 5  0.8795616 -0.43049898  0.4560515 -0.1722099  0.46125149
>>>>
>>>> In terms of 'best practice', I would represent my data
in the
>>>> appropriate data structure in the first place (as a matrix of
appropriate
>>>> type, rather than data.frame, so the entire coercion is
irrelevant). If
>>>> faced with a data.frame with specific columns to coerce I would
use the
>>>> approach
>>>>
>>>>     cidx <- sapply(df, is.character)      # index of columns
to coerce
>>>>     df[cidx] <- lapply(df[cidx], as.numeric)
>>>>
>>>> which seems to be reasonably correct, expressive, compact, and
speedy.
>>>>
>>>> Martin Morgan
>>>>
>>>>
>>>>
>>>>>    ?__
>>>>>   c/ /'_;~~~~kmezhoud
>>>>> (*) \(*)   ?????  ??????
>>>>> http://bioinformatics.tn/
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Dec 31, 2014 at 8:54 AM, Berend Hasselman <bhh
at xs4all.nl>
>>>>> wrote:
>>>>>
>>>>>
>>>>>>  On 31-12-2014, at 08:40, Karim Mezhoud <kmezhoud at
gmail.com> wrote:
>>>>>>>
>>>>>>> Hi All,
>>>>>>> I would like to choice between these two data frame
convert. which is
>>>>>>> faster?
>>>>>>>
>>>>>>>    for(i in 1:ncol(DataFrame)){
>>>>>>>
>>>>>>>                     DataFrame[,i] <-
as.numeric(DataFrame[,i])
>>>>>>>                 }
>>>>>>>
>>>>>>>
>>>>>>> OR
>>>>>>>
>>>>>>> DataFrame <- as.data.frame(apply(DataFrame,2
,function(x)
>>>>>>> as.numeric(x)))
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Try it and use system.time.
>>>>>>
>>>>>> Berend
>>>>>>
>>>>>>  Thanks
>>>>>>> Karim
>>>>>>>   ?__
>>>>>>> c/ /'_;~~~~kmezhoud
>>>>>>> (*) \(*)   ?????  ??????
>>>>>>> http://bioinformatics.tn/
>>>>>>>
>>>>>>>        [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>>
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>
>>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>         [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/
>>>>> posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>
>>>>>
>>>>
>>>> --
>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N.
>>>> PO Box 19024 Seattle, WA 98109
>>>>
>>>> Location: Arnold Building M1 B861
>>>> Phone: (206) 667-2793
>>>>
>>>
>>>
>>
>
	[[alternative HTML version deleted]]

William Dunlap

2014-Dec-31 17:39 UTC

head link

[R] Fwd: which is faster "for" or "apply"

> But this heterogeneity  comes even with only supposed numeric data.frame
> (gene expression). here an example
>
> ibrary(cgdsr)
> GeneList <- c("DDR2", "HPGDS",
"MS4A2","SSUH2","MLH1" ,"MSH2",
"ATM"
> ,"ATR", "MDC1" ,"PARP1")
> cgds<-CGDS("http://www.cbioportal.org/public-portal/")
>
> str(getProfileData(cgds,GeneList,
>
"stad_tcga_methylation_hm27","stad_tcga_methylation_hm27"))
>
> str(getProfileData(cgds,GeneList,
>
"stad_tcga_methylation_hm450","stad_tcga_methylation_hm450"))
>
> With my computer I did not find the same structure (numeric vs factor).
Can you show us what you got.  I am a bit surprised that you got any factors
because putting a trace on read.table shows that getProfileData calls it
with as.is=TRUE (meaning to not convert character columns to factors).  I
got
all numeric columns:
  > trace(read.table)
  > str(getProfileData(cgds,GeneList,
  +
"stad_tcga_methylation_hm27","stad_tcga_methylation_hm27"))
  trace: read.table(url, skip = 0, header = TRUE, as.is = TRUE, sep =
"\t",
      quote = "")
  'data.frame':   48 obs. of  10 variables:
   $ ATM  : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
   $ ATR  : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
   $ DDR2 : num  0.714 0.857 0.549 0.669 0.587 ...
   $ HPGDS: num  0.505 0.722 0.528 0.411 0.497 ...
   $ MDC1 : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
   $ MLH1 : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
   $ MS4A2: num  0.83 0.853 0.835 0.716 0.481 ...
   $ MSH2 : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
   $ PARP1: num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
   $ SSUH2: num  0.73 0.842 0.794 0.854 0.803 ...

  > str(getProfileData(cgds,GeneList,
  +
"stad_tcga_methylation_hm450","stad_tcga_methylation_hm450"))
  trace: read.table(url, skip = 0, header = TRUE, as.is = TRUE, sep =
"\t",
      quote = "")
  'data.frame':   338 obs. of  10 variables:
   $ ATM  : num  0.019 0.017 0.0168 0.015 0.014 ...
   $ ATR  : num  0.0356 0.0346 0.0231 0.0275 0.0285 ...
   $ DDR2 : num  0.81 0.786 0.596 0.861 0.646 ...
   $ HPGDS: num  0.576 0.528 0.703 0.781 0.622 ...
   $ MDC1 : num  0.189 0.265 0.201 0.199 0.249 ...
   $ MLH1 : num  0.404 0.0192 0.017 0.0124 0.0197 ...
   $ MS4A2: num  0.913 0.898 0.937 0.861 0.768 ...
   $ MSH2 : num  0.018 0.0184 0.016 0.0145 0.0168 ...
   $ PARP1: num  0.0191 0.0195 0.0146 0.0174 0.0181 ...
   $ SSUH2: num  0.848 0.874 0.644 0.621 0.652 ...

Perhaps some option or locale setting is causing input strings to be
interpretted as non-numbers.  (If you know all these columns should
be numeric, you could add colClasses=rep("numeric", length(GeneList))
to the call to read.table.  See which entries show up as NA and reread
with colClasses=rep("character",length(GeneList)) to see where they
came from).

It is almost always better to get the data input correctly rather than
trying
to fix it up latter.  If you must convert later, using apply(), which
converts
the data.frame to a matrix with a single class for all columns, often causes
problems.  sapply() may or may not convert its output to a matrix, depending
on what FUN returns.   Use lapply instead, with a function that uses the
class of its input
to decide what to do.  DataFrame[] <- lapply(DataFrame,
FUN=function(col)...)
will retain the class, row names, and column names of the data.frame.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Dec 31, 2014 at 8:24 AM, Karim Mezhoud <kmezhoud at gmail.com>
wrote:
> Concretely I request cbioportal through cgsdr package.
> Depending of Cases and Genetic profiles I receive in general data.frame
> with heterogeneous structure. The bad one if the returned data.frame is
> composed by numeric and character columns. in this case numeric columns are
> considered as  factor. It is the case when I explore/extract information
> from Clinical Data (Age, gender., tumor stage..). In this case I need to
> convert only numeric column and not character ones. I am using
> grep("[0-9]*.[0-9]*",df[,i])!=0 {fun to convert}.
>
>  But this heterogeneity  comes even with only supposed numeric data.frame
> (gene expression). here an example
>
>
> library(cgdsr)
> GeneList <- c("DDR2", "HPGDS",
"MS4A2","SSUH2","MLH1" ,"MSH2",
"ATM"
> ,"ATR", "MDC1" ,"PARP1")
> cgds<-CGDS("http://www.cbioportal.org/public-portal/")
>
> str(getProfileData(cgds,GeneList,
>
"stad_tcga_methylation_hm27","stad_tcga_methylation_hm27"))
>
> str(getProfileData(cgds,GeneList,
>
"stad_tcga_methylation_hm450","stad_tcga_methylation_hm450"))
>
>
> With my computer I did not find the same structure (numeric vs factor).
>
> Also I need to preserve row and column names ;)
> So I am working to resolve these details depending on data of cbioportal...
>
> Thank you
>
>
>   ?__
>  c/ /'_;~~~~kmezhoud
> (*) \(*)   ?????  ??????
> http://bioinformatics.tn/
>
>
>
> On Wed, Dec 31, 2014 at 4:37 PM, Karim Mezhoud <kmezhoud at
gmail.com> wrote:
>
> > Many Many Many thanks!
> > it is a demonstrative lesson. I need time to  test all examples :)
> > Thank you for your time and support.
> > Happy and Healthy New Year
> >
> >   ?__
> >  c/ /'_;~~~~kmezhoud
> > (*) \(*)   ?????  ??????
> > http://bioinformatics.tn/
> >
> >
> >
> > On Wed, Dec 31, 2014 at 2:38 PM, Martin Morgan <mtmorgan at
fredhutch.org>
> > wrote:
> >
> >> On 12/31/2014 12:22 AM, Karim Mezhoud wrote:
> >>
> >>> Thanks,
> >>> It seems for loop spends less time ;)
> >>>
> >>> with
> >>> dim(DataFrame)
> >>> [1] 338  70
> >>>
> >>> For loop has
> >>>     user  system elapsed
> >>>    0.012   0.000   0.012
> >>>
> >>> and apply has
> >>>    user  system elapsed
> >>>    0.020   0.000   0.021
> >>>
> >>
> >> The timings are so short that the answer in terms of speed is
'it does
> >> not matter'.
> >>
> >> Here is a selection of approaches
> >>
> >> f0 <- function(df) {
> >>     for (i in seq_along(df))
> >>         df[,i] <- as.numeric(df[,i])
> >>     df
> >> }
> >>
> >> f0a <- function(df) {
> >>     ## data.frame is a list-of-equal-length vectors; access each
> >>     ## column with "[["
> >>     for (i in seq_along(df))
> >>         df[[i]] <- as.numeric(df[[i]])
> >>     df
> >> }
> >>
> >> f0c <- compiler::cmpfun(f0)  ## loops sometimes benefit from
compilation
> >>
> >> f1 <- function(df)
> >>     as.data.frame(apply(df, 2, as.numeric))
> >>
> >> f2 <- function(df) {
> >>     ## replace all columns of df with list-of-vectors
> >>     df[] <- lapply(df, as.numeric)
> >>     df
> >> }
> >>
> >> f3 <- function(df) {
> >>     ## coerce to matrix to avoid the explicit loop, use mode<-
to
> >>     ## change storage of elements
> >>     m <- as.matrix(df)
> >>     mode(m) <- "numeric"
> >>     as.data.frame(m)
> >> }
> >>
> >> f4 <- function(df) {
> >>     ## if it's a matrix, why are we returning a data.frame?
> >>     m <- as.matrix(df)
> >>     mode(m) <- "numeric"
> >>     m
> >> }
> >>
> >> f4a <- function(df)
> >>     ## unlist to single vector, coerce, then format as matrix
> >>     matrix(as.numeric(unlist(df, use.names=FALSE)), nrow(df),
> >>            dimnames=dimnames(df))
> >>
> >> It's important to test that different methods return the same
result
> >> (perhaps allowing for differences in attributes such as row or
column
> >> names). The microbenchmark package repeats timings across multiple
> trials
> >> (default 100 times).
> >>
> >> library(microbenchmark)
> >> test <- function(df) {
> >>     stopifnot(
> >>         identical(f0(df), f0a(df)),
> >>         identical(f0(df), f0c(df)),
> >>         identical(f0(df), f1(df)),
> >>         identical(f0(df), f2(df)),
> >>         identical(f0(df), f3(df)),
> >>         identical(as.matrix(f0(df)), f4(df)),
> >>         all.equal(f4(df), f4a(df), check.attributes=FALSE))
> >>     microbenchmark(f0(df), f0a(df), f1(df), f2(df), f3(df),
f4(df),
> >> f4a(df))
> >> }
> >>
> >> Here are some data sets
> >>
> >> m <- matrix(rnorm(338 * 70), 338)
> >> df <- as.data.frame(m)
> >> dfc <- as.data.frame(lapply(df, as.character),
stringsAsFactors=FALSE)
> >> dff <- as.data.frame(lapply(df, as.character))
> >>
> >> and results
> >>
> >> > test(df)
> >> Unit: microseconds
> >>     expr      min        lq      mean    median        uq      max
neval
> >>   f0(df) 6208.956 6270.5500 6367.4138 6306.7110 6362.2225 7731.281
100
> >>  f0a(df) 2917.973 2975.2090 3024.8623 3002.3805 3036.5365 3951.618
100
> >>  f0c(df) 6078.399 6150.1085 6264.0998 6188.3690 6244.5725 7684.116
100
> >>   f1(df) 2698.074 2743.2905 2821.8453 2769.3655 2805.5345 4033.229
100
> >>   f2(df) 1989.057 2041.0685 2066.1830 2055.0020 2083.8545 2267.732
100
> >>   f3(df) 1532.435 1572.9810 1609.7378 1597.6245 1624.2305 2003.584
100
> >>   f4(df)  808.593  828.5445  852.2626  847.5355  864.6665 1180.977
100
> >>  f4a(df)  422.657  437.2705  458.9845  455.2470  465.5815  695.443
100
> >> > test(dfc)
> >> Unit: milliseconds
> >>     expr       min        lq      mean    median        uq      
max
> neval
> >>   f0(df) 11.416532 11.647858 11.915287 11.767647 12.016276
14.239622
> >>  100
> >>  f0a(df)  8.095709  8.211116  8.380638  8.289895  8.454948 
9.529026
>  100
> >>  f0c(df) 11.339293 11.577811 11.772087 11.702341 11.896729
12.674766
> >>  100
> >>   f1(df)  8.227371  8.277147  8.422412  8.331403  8.490411 
9.145499
>  100
> >>   f2(df)  6.907888  7.010828  7.162529  7.147198  7.239048 
7.763758
>  100
> >>   f3(df)  6.608107  6.688232  6.845936  6.792066  6.892635 
8.359274
>  100
> >>   f4(df)  5.859482  5.939680  6.046976  5.993804  6.105388 
6.968601
>  100
> >>  f4a(df)  5.372214  5.460987  5.556687  5.521542  5.614482 
6.107081
>  100
> >> > test(dff)
> >> Error: identical(f0(df), f1(df)) is not TRUE
> >>
> >> Except when dealing with factors, the use of explicit loops is the
> >> slowest. With factors, matrix-based methods coerce the level
labels to
> >> numeric, whereas vector-based methods coerce the underlying codes
(level
> >> values) of the factor; obviously great care needs to be taken.
> >>
> >> > f0(dff)[1:5, 1:5]
> >>    V1  V2  V3  V4  V5
> >> 1 150 232 294  88  56
> >> 2 159   8  89  59  10
> >> 3 132 171  40 205 119
> >> 4 214 273  26 262 216
> >> 5 281  49 255  31 233
> >> > f1(dff)[1:5, 1:5]
> >>           V1          V2         V3         V4          V5
> >> 1 -1.7092463 0.50234009  0.8492982 -0.5636901 -0.38545566
> >> 2 -2.3020854 -0.05580931 -0.5963673 -0.3671748 -0.09408031
> >> 3 -1.2915110 -2.46181533 -0.2470108 0.3301129 -1.06810225
> >> 4  0.3065989 0.89263099 -0.1717432  0.7721411 0.35856334
> >> 5  0.8795616 -0.43049898  0.4560515 -0.1722099  0.46125149
> >>
> >> In terms of 'best practice', I would represent my data in
the
> appropriate
> >> data structure in the first place (as a matrix of appropriate
type,
> rather
> >> than data.frame, so the entire coercion is irrelevant). If faced
with a
> >> data.frame with specific columns to coerce I would use the
approach
> >>
> >>     cidx <- sapply(df, is.character)      # index of columns to
coerce
> >>     df[cidx] <- lapply(df[cidx], as.numeric)
> >>
> >> which seems to be reasonably correct, expressive, compact, and
speedy.
> >>
> >> Martin Morgan
> >>
> >>
> >>
> >>>    ?__
> >>>   c/ /'_;~~~~kmezhoud
> >>> (*) \(*)   ?????  ??????
> >>> http://bioinformatics.tn/
> >>>
> >>>
> >>>
> >>> On Wed, Dec 31, 2014 at 8:54 AM, Berend Hasselman <bhh at
xs4all.nl>
> wrote:
> >>>
> >>>
> >>>>  On 31-12-2014, at 08:40, Karim Mezhoud <kmezhoud at
gmail.com> wrote:
> >>>>>
> >>>>> Hi All,
> >>>>> I would like to choice between these two data frame
convert. which is
> >>>>> faster?
> >>>>>
> >>>>>    for(i in 1:ncol(DataFrame)){
> >>>>>
> >>>>>                     DataFrame[,i] <-
as.numeric(DataFrame[,i])
> >>>>>                 }
> >>>>>
> >>>>>
> >>>>> OR
> >>>>>
> >>>>> DataFrame <- as.data.frame(apply(DataFrame,2
,function(x)
> >>>>> as.numeric(x)))
> >>>>>
> >>>>>
> >>>>>
> >>>> Try it and use system.time.
> >>>>
> >>>> Berend
> >>>>
> >>>>  Thanks
> >>>>> Karim
> >>>>>   ?__
> >>>>> c/ /'_;~~~~kmezhoud
> >>>>> (*) \(*)   ?????  ??????
> >>>>> http://bioinformatics.tn/
> >>>>>
> >>>>>        [[alternative HTML version deleted]]
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide
> >>>>>
> >>>> http://www.R-project.org/posting-guide.html
> >>>>
> >>>>> and provide commented, minimal, self-contained,
reproducible code.
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>         [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/
> >>> posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible
code.
> >>>
> >>>
> >>
> >> --
> >> Computational Biology / Fred Hutchinson Cancer Research Center
> >> 1100 Fairview Ave. N.
> >> PO Box 19024 Seattle, WA 98109
> >>
> >> Location: Arnold Building M1 B861
> >> Phone: (206) 667-2793
> >>
> >
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Karim Mezhoud

2014-Dec-31 17:55 UTC

head link

[R] Fwd: which is faster "for" or "apply"

Thanks, please find what I got:
> str(getProfileData(cgds,GeneList,"stad_tcga_methylation_hm27","stad_tcga_methylation_hm27"))
'data.frame':    48 obs. of  10 variables:
 $ ATM  : num  NA NA NA NA NA NA NA NA NA NA ...
 $ ATR  : num  NA NA NA NA NA NA NA NA NA NA ...
 $ DDR2 : num  0.714 0.857 0.549 0.669 0.587 ...
 $ HPGDS: num  0.505 0.722 0.528 0.411 0.497 ...
 $ MDC1 : num  NA NA NA NA NA NA NA NA NA NA ...
 $ MLH1 : num  NA NA NA NA NA NA NA NA NA NA ...
 $ MS4A2: num  0.83 0.853 0.835 0.716 0.481 ...
 $ MSH2 : num  NA NA NA NA NA NA NA NA NA NA ...
 $ PARP1: num  NA NA NA NA NA NA NA NA NA NA ...
 $ SSUH2: num  0.73 0.842 0.794 0.854 0.803 ...> str(getProfileData(cgds,GeneList,"stad_tcga_methylation_hm450","stad_tcga_methylation_hm450"))
'data.frame':    338 obs. of  10 variables:
 $ ATM  : Factor w/ 338 levels "0.01060883","0.01065690",..:
256 182 170
101 53 302 183 236 298 334 ...
  ..- attr(*, "names")= chr  "TCGA.BR.6452.01"
"TCGA.BR.6453.01"
"TCGA.BR.6454.01" "TCGA.BR.6455.01" ...
 $ ATR  : Factor w/ 338 levels "0.009422188",..: 271 265 165 215 222
304
176 170 228 277 ...
  ..- attr(*, "names")= chr  "TCGA.BR.6452.01"
"TCGA.BR.6453.01"
"TCGA.BR.6454.01" "TCGA.BR.6455.01" ...
 $ DDR2 : Factor w/ 338 levels "0.38369598","0.42008010",..:
197 161 25 291
40 38 155 85 177 180 ...
  ..- attr(*, "names")= chr  "TCGA.BR.6452.01"
"TCGA.BR.6453.01"
"TCGA.BR.6454.01" "TCGA.BR.6455.01" ...
 $ HPGDS: Factor w/ 338 levels "0.16077929","0.18867898",..:
85 56 208 281
116 67 132 119 152 49 ...
  ..- attr(*, "names")= chr  "TCGA.BR.6452.01"
"TCGA.BR.6453.01"
"TCGA.BR.6454.01" "TCGA.BR.6455.01" ...
 $ MDC1 : Factor w/ 338 levels "0.06105770","0.06532153",..:
162 267 185
180 253 220 108 230 239 271 ...
  ..- attr(*, "names")= chr  "TCGA.BR.6452.01"
"TCGA.BR.6453.01"
"TCGA.BR.6454.01" "TCGA.BR.6455.01" ...
 $ MLH1 : Factor w/ 338 levels "0.009031445",..: 299 194 160 45 198
224 115
167 287 165 ...
  ..- attr(*, "names")= chr  "TCGA.BR.6452.01"
"TCGA.BR.6453.01"
"TCGA.BR.6454.01" "TCGA.BR.6455.01" ...
 $ MS4A2: Factor w/ 338 levels
"0.31286204","0.438797860",..: 266 210 329
111 40 49 21 68 134 331 ...
  ..- attr(*, "names")= chr  "TCGA.BR.6452.01"
"TCGA.BR.6453.01"
"TCGA.BR.6454.01" "TCGA.BR.6455.01" ...
 $ MSH2 : Factor w/ 338 levels "0.009568869",..: 260 270 179 114 215
137
263 78 300 283 ...
  ..- attr(*, "names")= chr  "TCGA.BR.6452.01"
"TCGA.BR.6453.01"
"TCGA.BR.6454.01" "TCGA.BR.6455.01" ...
 $ PARP1: Factor w/ 338 levels "0.01110587","0.01208177",..:
249 260 65 191
219 204 32 132 130 225 ...
  ..- attr(*, "names")= chr  "TCGA.BR.6452.01"
"TCGA.BR.6453.01"
"TCGA.BR.6454.01" "TCGA.BR.6455.01" ...
 $ SSUH2: Factor w/ 338 levels
"0.17618607","0.184911562",..: 243 276 93 82
99 236 51 88 163 138 ...
  ..- attr(*, "names")= chr  "TCGA.BR.6452.01"
"TCGA.BR.6453.01"
"TCGA.BR.6454.01" "TCGA.BR.6455.01"
...>
  ?__
 c/ /'_;~~~~kmezhoud
(*) \(*)   ?????  ??????
http://bioinformatics.tn/



On Wed, Dec 31, 2014 at 6:39 PM, William Dunlap <wdunlap at tibco.com>
wrote:
> > But this heterogeneity  comes even with only supposed numeric
data.frame
> > (gene expression). here an example
> >
> > ibrary(cgdsr)
> > GeneList <- c("DDR2", "HPGDS",
"MS4A2","SSUH2","MLH1" ,"MSH2",
"ATM"
> > ,"ATR", "MDC1" ,"PARP1")
> > cgds<-CGDS("http://www.cbioportal.org/public-portal/")
> >
> > str(getProfileData(cgds,GeneList,
> >
"stad_tcga_methylation_hm27","stad_tcga_methylation_hm27"))
> >
> > str(getProfileData(cgds,GeneList,
> >
"stad_tcga_methylation_hm450","stad_tcga_methylation_hm450"))
> >
> > With my computer I did not find the same structure (numeric vs
factor).
>
> Can you show us what you got.  I am a bit surprised that you got any
> factors
> because putting a trace on read.table shows that getProfileData calls it
> with as.is=TRUE (meaning to not convert character columns to factors).  I
> got
> all numeric columns:
>   > trace(read.table)
>   > str(getProfileData(cgds,GeneList,
>   +
"stad_tcga_methylation_hm27","stad_tcga_methylation_hm27"))
>   trace: read.table(url, skip = 0, header = TRUE, as.is = TRUE, sep >
"\t",
>       quote = "")
>   'data.frame':   48 obs. of  10 variables:
>    $ ATM  : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
>    $ ATR  : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
>    $ DDR2 : num  0.714 0.857 0.549 0.669 0.587 ...
>    $ HPGDS: num  0.505 0.722 0.528 0.411 0.497 ...
>    $ MDC1 : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
>    $ MLH1 : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
>    $ MS4A2: num  0.83 0.853 0.835 0.716 0.481 ...
>    $ MSH2 : num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
>    $ PARP1: num  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ...
>    $ SSUH2: num  0.73 0.842 0.794 0.854 0.803 ...
>
>   > str(getProfileData(cgds,GeneList,
>   +
"stad_tcga_methylation_hm450","stad_tcga_methylation_hm450"))
>   trace: read.table(url, skip = 0, header = TRUE, as.is = TRUE, sep >
"\t",
>       quote = "")
>   'data.frame':   338 obs. of  10 variables:
>    $ ATM  : num  0.019 0.017 0.0168 0.015 0.014 ...
>    $ ATR  : num  0.0356 0.0346 0.0231 0.0275 0.0285 ...
>    $ DDR2 : num  0.81 0.786 0.596 0.861 0.646 ...
>    $ HPGDS: num  0.576 0.528 0.703 0.781 0.622 ...
>    $ MDC1 : num  0.189 0.265 0.201 0.199 0.249 ...
>    $ MLH1 : num  0.404 0.0192 0.017 0.0124 0.0197 ...
>    $ MS4A2: num  0.913 0.898 0.937 0.861 0.768 ...
>    $ MSH2 : num  0.018 0.0184 0.016 0.0145 0.0168 ...
>    $ PARP1: num  0.0191 0.0195 0.0146 0.0174 0.0181 ...
>    $ SSUH2: num  0.848 0.874 0.644 0.621 0.652 ...
>
> Perhaps some option or locale setting is causing input strings to be
> interpretted as non-numbers.  (If you know all these columns should
> be numeric, you could add colClasses=rep("numeric",
length(GeneList))
> to the call to read.table.  See which entries show up as NA and reread
> with colClasses=rep("character",length(GeneList)) to see where
they
> came from).
>
> It is almost always better to get the data input correctly rather than
> trying
> to fix it up latter.  If you must convert later, using apply(), which
> converts
> the data.frame to a matrix with a single class for all columns, often
> causes
> problems.  sapply() may or may not convert its output to a matrix,
> depending
> on what FUN returns.   Use lapply instead, with a function that uses the
> class of its input
> to decide what to do.  DataFrame[] <- lapply(DataFrame,
> FUN=function(col)...)
> will retain the class, row names, and column names of the data.frame.
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Wed, Dec 31, 2014 at 8:24 AM, Karim Mezhoud <kmezhoud at
gmail.com> wrote:
>
>> Concretely I request cbioportal through cgsdr package.
>> Depending of Cases and Genetic profiles I receive in general data.frame
>> with heterogeneous structure. The bad one if the returned data.frame is
>> composed by numeric and character columns. in this case numeric columns
>> are
>> considered as  factor. It is the case when I explore/extract
information
>> from Clinical Data (Age, gender., tumor stage..). In this case I need
to
>> convert only numeric column and not character ones. I am using
>> grep("[0-9]*.[0-9]*",df[,i])!=0 {fun to convert}.
>>
>>  But this heterogeneity  comes even with only supposed numeric
data.frame
>> (gene expression). here an example
>>
>>
>> library(cgdsr)
>> GeneList <- c("DDR2", "HPGDS",
"MS4A2","SSUH2","MLH1" ,"MSH2",
"ATM"
>> ,"ATR", "MDC1" ,"PARP1")
>> cgds<-CGDS("http://www.cbioportal.org/public-portal/")
>>
>> str(getProfileData(cgds,GeneList,
>>
"stad_tcga_methylation_hm27","stad_tcga_methylation_hm27"))
>>
>> str(getProfileData(cgds,GeneList,
>>
"stad_tcga_methylation_hm450","stad_tcga_methylation_hm450"))
>>
>>
>> With my computer I did not find the same structure (numeric vs factor).
>>
>> Also I need to preserve row and column names ;)
>> So I am working to resolve these details depending on data of
>> cbioportal...
>>
>> Thank you
>>
>>
>>   ?__
>>  c/ /'_;~~~~kmezhoud
>> (*) \(*)   ?????  ??????
>> http://bioinformatics.tn/
>>
>>
>>
>> On Wed, Dec 31, 2014 at 4:37 PM, Karim Mezhoud <kmezhoud at
gmail.com>
>> wrote:
>>
>> > Many Many Many thanks!
>> > it is a demonstrative lesson. I need time to  test all examples :)
>> > Thank you for your time and support.
>> > Happy and Healthy New Year
>> >
>> >   ?__
>> >  c/ /'_;~~~~kmezhoud
>> > (*) \(*)   ?????  ??????
>> > http://bioinformatics.tn/
>> >
>> >
>> >
>> > On Wed, Dec 31, 2014 at 2:38 PM, Martin Morgan <mtmorgan at
fredhutch.org>
>> > wrote:
>> >
>> >> On 12/31/2014 12:22 AM, Karim Mezhoud wrote:
>> >>
>> >>> Thanks,
>> >>> It seems for loop spends less time ;)
>> >>>
>> >>> with
>> >>> dim(DataFrame)
>> >>> [1] 338  70
>> >>>
>> >>> For loop has
>> >>>     user  system elapsed
>> >>>    0.012   0.000   0.012
>> >>>
>> >>> and apply has
>> >>>    user  system elapsed
>> >>>    0.020   0.000   0.021
>> >>>
>> >>
>> >> The timings are so short that the answer in terms of speed is
'it does
>> >> not matter'.
>> >>
>> >> Here is a selection of approaches
>> >>
>> >> f0 <- function(df) {
>> >>     for (i in seq_along(df))
>> >>         df[,i] <- as.numeric(df[,i])
>> >>     df
>> >> }
>> >>
>> >> f0a <- function(df) {
>> >>     ## data.frame is a list-of-equal-length vectors; access
each
>> >>     ## column with "[["
>> >>     for (i in seq_along(df))
>> >>         df[[i]] <- as.numeric(df[[i]])
>> >>     df
>> >> }
>> >>
>> >> f0c <- compiler::cmpfun(f0)  ## loops sometimes benefit
from
>> compilation
>> >>
>> >> f1 <- function(df)
>> >>     as.data.frame(apply(df, 2, as.numeric))
>> >>
>> >> f2 <- function(df) {
>> >>     ## replace all columns of df with list-of-vectors
>> >>     df[] <- lapply(df, as.numeric)
>> >>     df
>> >> }
>> >>
>> >> f3 <- function(df) {
>> >>     ## coerce to matrix to avoid the explicit loop, use
mode<- to
>> >>     ## change storage of elements
>> >>     m <- as.matrix(df)
>> >>     mode(m) <- "numeric"
>> >>     as.data.frame(m)
>> >> }
>> >>
>> >> f4 <- function(df) {
>> >>     ## if it's a matrix, why are we returning a
data.frame?
>> >>     m <- as.matrix(df)
>> >>     mode(m) <- "numeric"
>> >>     m
>> >> }
>> >>
>> >> f4a <- function(df)
>> >>     ## unlist to single vector, coerce, then format as matrix
>> >>     matrix(as.numeric(unlist(df, use.names=FALSE)), nrow(df),
>> >>            dimnames=dimnames(df))
>> >>
>> >> It's important to test that different methods return the
same result
>> >> (perhaps allowing for differences in attributes such as row or
column
>> >> names). The microbenchmark package repeats timings across
multiple
>> trials
>> >> (default 100 times).
>> >>
>> >> library(microbenchmark)
>> >> test <- function(df) {
>> >>     stopifnot(
>> >>         identical(f0(df), f0a(df)),
>> >>         identical(f0(df), f0c(df)),
>> >>         identical(f0(df), f1(df)),
>> >>         identical(f0(df), f2(df)),
>> >>         identical(f0(df), f3(df)),
>> >>         identical(as.matrix(f0(df)), f4(df)),
>> >>         all.equal(f4(df), f4a(df), check.attributes=FALSE))
>> >>     microbenchmark(f0(df), f0a(df), f1(df), f2(df), f3(df),
f4(df),
>> >> f4a(df))
>> >> }
>> >>
>> >> Here are some data sets
>> >>
>> >> m <- matrix(rnorm(338 * 70), 338)
>> >> df <- as.data.frame(m)
>> >> dfc <- as.data.frame(lapply(df, as.character),
stringsAsFactors=FALSE)
>> >> dff <- as.data.frame(lapply(df, as.character))
>> >>
>> >> and results
>> >>
>> >> > test(df)
>> >> Unit: microseconds
>> >>     expr      min        lq      mean    median        uq     
max
>> neval
>> >>   f0(df) 6208.956 6270.5500 6367.4138 6306.7110 6362.2225
7731.281
>>  100
>> >>  f0a(df) 2917.973 2975.2090 3024.8623 3002.3805 3036.5365
3951.618
>>  100
>> >>  f0c(df) 6078.399 6150.1085 6264.0998 6188.3690 6244.5725
7684.116
>>  100
>> >>   f1(df) 2698.074 2743.2905 2821.8453 2769.3655 2805.5345
4033.229
>>  100
>> >>   f2(df) 1989.057 2041.0685 2066.1830 2055.0020 2083.8545
2267.732
>>  100
>> >>   f3(df) 1532.435 1572.9810 1609.7378 1597.6245 1624.2305
2003.584
>>  100
>> >>   f4(df)  808.593  828.5445  852.2626  847.5355  864.6665
1180.977
>>  100
>> >>  f4a(df)  422.657  437.2705  458.9845  455.2470  465.5815 
695.443
>>  100
>> >> > test(dfc)
>> >> Unit: milliseconds
>> >>     expr       min        lq      mean    median        uq    
max
>> neval
>> >>   f0(df) 11.416532 11.647858 11.915287 11.767647 12.016276
14.239622
>> >>  100
>> >>  f0a(df)  8.095709  8.211116  8.380638  8.289895  8.454948 
9.529026
>>  100
>> >>  f0c(df) 11.339293 11.577811 11.772087 11.702341 11.896729
12.674766
>> >>  100
>> >>   f1(df)  8.227371  8.277147  8.422412  8.331403  8.490411 
9.145499
>>  100
>> >>   f2(df)  6.907888  7.010828  7.162529  7.147198  7.239048 
7.763758
>>  100
>> >>   f3(df)  6.608107  6.688232  6.845936  6.792066  6.892635 
8.359274
>>  100
>> >>   f4(df)  5.859482  5.939680  6.046976  5.993804  6.105388 
6.968601
>>  100
>> >>  f4a(df)  5.372214  5.460987  5.556687  5.521542  5.614482 
6.107081
>>  100
>> >> > test(dff)
>> >> Error: identical(f0(df), f1(df)) is not TRUE
>> >>
>> >> Except when dealing with factors, the use of explicit loops is
the
>> >> slowest. With factors, matrix-based methods coerce the level
labels to
>> >> numeric, whereas vector-based methods coerce the underlying
codes
>> (level
>> >> values) of the factor; obviously great care needs to be taken.
>> >>
>> >> > f0(dff)[1:5, 1:5]
>> >>    V1  V2  V3  V4  V5
>> >> 1 150 232 294  88  56
>> >> 2 159   8  89  59  10
>> >> 3 132 171  40 205 119
>> >> 4 214 273  26 262 216
>> >> 5 281  49 255  31 233
>> >> > f1(dff)[1:5, 1:5]
>> >>           V1          V2         V3         V4          V5
>> >> 1 -1.7092463 0.50234009  0.8492982 -0.5636901 -0.38545566
>> >> 2 -2.3020854 -0.05580931 -0.5963673 -0.3671748 -0.09408031
>> >> 3 -1.2915110 -2.46181533 -0.2470108 0.3301129 -1.06810225
>> >> 4  0.3065989 0.89263099 -0.1717432  0.7721411 0.35856334
>> >> 5  0.8795616 -0.43049898  0.4560515 -0.1722099  0.46125149
>> >>
>> >> In terms of 'best practice', I would represent my data
in the
>> appropriate
>> >> data structure in the first place (as a matrix of appropriate
type,
>> rather
>> >> than data.frame, so the entire coercion is irrelevant). If
faced with a
>> >> data.frame with specific columns to coerce I would use the
approach
>> >>
>> >>     cidx <- sapply(df, is.character)      # index of
columns to coerce
>> >>     df[cidx] <- lapply(df[cidx], as.numeric)
>> >>
>> >> which seems to be reasonably correct, expressive, compact, and
speedy.
>> >>
>> >> Martin Morgan
>> >>
>> >>
>> >>
>> >>>    ?__
>> >>>   c/ /'_;~~~~kmezhoud
>> >>> (*) \(*)   ?????  ??????
>> >>> http://bioinformatics.tn/
>> >>>
>> >>>
>> >>>
>> >>> On Wed, Dec 31, 2014 at 8:54 AM, Berend Hasselman <bhh
at xs4all.nl>
>> wrote:
>> >>>
>> >>>
>> >>>>  On 31-12-2014, at 08:40, Karim Mezhoud <kmezhoud
at gmail.com> wrote:
>> >>>>>
>> >>>>> Hi All,
>> >>>>> I would like to choice between these two data
frame convert. which
>> is
>> >>>>> faster?
>> >>>>>
>> >>>>>    for(i in 1:ncol(DataFrame)){
>> >>>>>
>> >>>>>                     DataFrame[,i] <-
as.numeric(DataFrame[,i])
>> >>>>>                 }
>> >>>>>
>> >>>>>
>> >>>>> OR
>> >>>>>
>> >>>>> DataFrame <- as.data.frame(apply(DataFrame,2
,function(x)
>> >>>>> as.numeric(x)))
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>> Try it and use system.time.
>> >>>>
>> >>>> Berend
>> >>>>
>> >>>>  Thanks
>> >>>>> Karim
>> >>>>>   ?__
>> >>>>> c/ /'_;~~~~kmezhoud
>> >>>>> (*) \(*)   ?????  ??????
>> >>>>> http://bioinformatics.tn/
>> >>>>>
>> >>>>>        [[alternative HTML version deleted]]
>> >>>>>
>> >>>>> ______________________________________________
>> >>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
>> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>>>> PLEASE do read the posting guide
>> >>>>>
>> >>>> http://www.R-project.org/posting-guide.html
>> >>>>
>> >>>>> and provide commented, minimal, self-contained,
reproducible code.
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>         [[alternative HTML version deleted]]
>> >>>
>> >>> ______________________________________________
>> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide http://www.R-project.org/
>> >>> posting-guide.html
>> >>> and provide commented, minimal, self-contained,
reproducible code.
>> >>>
>> >>>
>> >>
>> >> --
>> >> Computational Biology / Fred Hutchinson Cancer Research Center
>> >> 1100 Fairview Ave. N.
>> >> PO Box 19024 Seattle, WA 98109
>> >>
>> >> Location: Arnold Building M1 B861
>> >> Phone: (206) 667-2793
>> >>
>> >
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
	[[alternative HTML version deleted]]

R help - Dec 2014 - Fwd: which is faster "for" or "apply"

[R] Fwd: which is faster "for" or "apply"

[R] Fwd: which is faster "for" or "apply"

[R] Fwd: which is faster "for" or "apply"

[R] Fwd: which is faster "for" or "apply"

[R] Fwd: which is faster "for" or "apply"