thr3ads.net - R help - [R] Very slow using S4 classes [Sep 2011]

If this information is useful, please help other people find it:
Share via:

André Rossi

2011-Sep-10 15:08 UTC

[R] Very slow using S4 classes

Hi everybody!

I'm creating an object of a S4 class that has two slots: ListExamples, which
is a list, and idx, which is an integer (as the code below).

Then, I read a data.frame file with 10000 (ten thousands) of lines and 10
columns, do some pre-processing and, basically, I store each line as an
element of a list in the slot ListExamples of the S4 object. However, many
operations after this take a considerable time.

Can anyone explain me why dois it happen? Is it possible to speed up an
script that deals with a big number of data (it might be data.frame or
list)?

Thank you,

André Rossi

setClass("Buffer",
    representation=representation(
        Listexamples = "list",
        idx = "integer"
    )
)

	[[alternative HTML version deleted]]

Uwe Ligges

2011-Sep-10 15:37 UTC

head link

[R] Very slow using S4 classes

On 10.09.2011 17:08, Andr? Rossi wrote:> Hi everybody!
>
> I'm creating an object of a S4 class that has two slots: ListExamples,
which
> is a list, and idx, which is an integer (as the code below).
>
> Then, I read a data.frame file with 10000 (ten thousands) of lines and 10
> columns, do some pre-processing and, basically, I store each line as an
> element of a list in the slot ListExamples of the S4 object. However, many
> operations after this take a considerable time.
>
> Can anyone explain me why dois it happen? Is it possible to speed up an
> script that deals with a big number of data (it might be data.frame or
> list)?
0. This sounds inefficient even when not using S4 classes. You may want 
to reconsider how to solve your problem in another way.

1. A first step for speedup would be not to use S4 objects.

2. If you really want to use some S4 object to store the stuff, try to 
assign to a list (that is not wrapped within an S4 class) first and then 
assign the whole list into your Buffer object.

Best,
Uwe Ligges

>
> Thank you,
>
> Andr? Rossi
>
> setClass("Buffer",
>      representation=representation(
>          Listexamples = "list",
>          idx = "integer"
>      )
> )
>
> 	[[alternative HTML version deleted]]
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Martin Morgan

2011-Sep-10 17:18 UTC

head link

[R] Very slow using S4 classes

On 09/10/2011 08:08 AM, Andr? Rossi wrote:> Hi everybody!
>
> I'm creating an object of a S4 class that has two slots: ListExamples,
which
> is a list, and idx, which is an integer (as the code below).
>
> Then, I read a data.frame file with 10000 (ten thousands) of lines and 10
> columns, do some pre-processing and, basically, I store each line as an
> element of a list in the slot ListExamples of the S4 object. However, many
> operations after this take a considerable time.
>
> Can anyone explain me why dois it happen? Is it possible to speed up an
> script that deals with a big number of data (it might be data.frame or
> list)?
>
> Thank you,
>
> Andr? Rossi
>
> setClass("Buffer",
>      representation=representation(
>          Listexamples = "list",
>          idx = "integer"
>      )
> )
Hi Andr?,

Can you provide a simpler and more reproducible example, for instance

 > setClass("Buf",
representation=representation(lst="list"))
[1] "Buf"
 > b=new("Buf", lst=replicate(10000, list(10), simplify=FALSE))
 > system.time({ b at lst[[1]][[1]] = 2 })
    user  system elapsed
   0.005   0.000   0.005

Generally it sounds like you're modeling the rows as elements of 
Listofelements, but you're better served by modeling the columns (lst = 
replicate(10, integer(10000)), if all of your 10 columns were 
integer-valued, for instance). Also, S4 is providing some measure of 
type safety, and you're undermining that by having your class contain a 
'list'. I'd go after

setClass("Buffer",
          representation=representation(
            col1="integer",
            col2="character",
            col3="numeric"
            ## etc.
            ),
          validity=function(object) {
              nms <- slotNames(object)
              len <- sapply(nms, function(nm) length(slot(object, nm)))
              if (1L != length(unique(len)))
                  "slots must all be of same length"
              else TRUE
          })

Buffer <-
     function(col1, col2, col3, ...)
{
     new("Buffer", col1=col1, col2=col2, col3=col3, ...)
}

Let's see where the inefficiencies are before deciding that this is an 
S4 issue.

Martin
>
> 	[[alternative HTML version deleted]]
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

André Rossi

2011-Sep-12 14:20 UTC

head link

[R] Very slow using S4 classes

Dear Martin Morgan and Martin Maechler...

Here is an example of the computational time when a slot of a S4 class is of
another S4 class and when it is just one object. I'm sending you the data
file.

Thank you!

Best regards,

Andr? Rossi

############################################################

setClass("SupervisedExample",
    representation(
        attr.value = "ANY",
        target.value = "ANY"
))

setClass("StreamBuffer",
    representation=representation(
        examples = "list", #SupervisedExample
        max.length = "integer"
    ),
    prototype=list(
            max.length = as.integer(10000)
    )
)

b <- new("StreamBuffer")

load("~/Dropbox/dataList2.RData")

b at examples <- data #data is a list of SupervisedExample class.
> system.time({for (i in 1:100) b at examples[[1]]@attr.value[1] = 2 })   user  system elapsed
 16.837   0.108  18.244
> system.time({for (i in 1:100) data[[1]]@attr.value[1] = 2 })   user  system elapsed
  0.024   0.000   0.026

############################################################


2011/9/10 Martin Morgan <mtmorgan at fhcrc.org>
> On 09/10/2011 08:08 AM, Andr? Rossi wrote:
>
>> Hi everybody!
>>
>> I'm creating an object of a S4 class that has two slots:
ListExamples,
>> which
>> is a list, and idx, which is an integer (as the code below).
>>
>> Then, I read a data.frame file with 10000 (ten thousands) of lines and
10
>> columns, do some pre-processing and, basically, I store each line as an
>> element of a list in the slot ListExamples of the S4 object. However,
many
>> operations after this take a considerable time.
>>
>> Can anyone explain me why dois it happen? Is it possible to speed up an
>> script that deals with a big number of data (it might be data.frame or
>> list)?
>>
>> Thank you,
>>
>> Andr? Rossi
>>
>> setClass("Buffer",
>>     representation=representation(
>>         Listexamples = "list",
>>         idx = "integer"
>>     )
>> )
>>
>
> Hi Andr?,
>
> Can you provide a simpler and more reproducible example, for instance
>
> > setClass("Buf",
representation=representation(**lst="list"))
> [1] "Buf"
> > b=new("Buf", lst=replicate(10000, list(10), simplify=FALSE))
> > system.time({ b at lst[[1]][[1]] = 2 })
>   user  system elapsed
>  0.005   0.000   0.005
>
> Generally it sounds like you're modeling the rows as elements of
> Listofelements, but you're better served by modeling the columns (lst
> replicate(10, integer(10000)), if all of your 10 columns were
> integer-valued, for instance). Also, S4 is providing some measure of type
> safety, and you're undermining that by having your class contain a
'list'.
> I'd go after
>
> setClass("Buffer",
>         representation=representation(
>           col1="integer",
>           col2="character",
>           col3="numeric"
>           ## etc.
>           ),
>         validity=function(object) {
>             nms <- slotNames(object)
>             len <- sapply(nms, function(nm) length(slot(object, nm)))
>             if (1L != length(unique(len)))
>                 "slots must all be of same length"
>             else TRUE
>         })
>
> Buffer <-
>    function(col1, col2, col3, ...)
> {
>    new("Buffer", col1=col1, col2=col2, col3=col3, ...)
> }
>
> Let's see where the inefficiencies are before deciding that this is an
S4
> issue.
>
> Martin
>
>
>
>>        [[alternative HTML version deleted]]
>>
>>
>>
>>
>> ______________________________**________________
>> R-help at r-project.org mailing list
>>
https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>

Maybe Matching Threads

Search for more reasonably related threads

R help - Sep 2011 - Very slow using S4 classes

[R] Very slow using S4 classes

[R] Very slow using S4 classes

[R] Very slow using S4 classes

[R] Very slow using S4 classes

Maybe Matching Threads