Hi everybody!
I'm creating an object of a S4 class that has two slots: ListExamples, which
is a list, and idx, which is an integer (as the code below).
Then, I read a data.frame file with 10000 (ten thousands) of lines and 10
columns, do some pre-processing and, basically, I store each line as an
element of a list in the slot ListExamples of the S4 object. However, many
operations after this take a considerable time.
Can anyone explain me why dois it happen? Is it possible to speed up an
script that deals with a big number of data (it might be data.frame or
list)?
Thank you,
André Rossi
setClass("Buffer",
representation=representation(
Listexamples = "list",
idx = "integer"
)
)
[[alternative HTML version deleted]]
On 10.09.2011 17:08, Andr? Rossi wrote:> Hi everybody! > > I'm creating an object of a S4 class that has two slots: ListExamples, which > is a list, and idx, which is an integer (as the code below). > > Then, I read a data.frame file with 10000 (ten thousands) of lines and 10 > columns, do some pre-processing and, basically, I store each line as an > element of a list in the slot ListExamples of the S4 object. However, many > operations after this take a considerable time. > > Can anyone explain me why dois it happen? Is it possible to speed up an > script that deals with a big number of data (it might be data.frame or > list)?0. This sounds inefficient even when not using S4 classes. You may want to reconsider how to solve your problem in another way. 1. A first step for speedup would be not to use S4 objects. 2. If you really want to use some S4 object to store the stuff, try to assign to a list (that is not wrapped within an S4 class) first and then assign the whole list into your Buffer object. Best, Uwe Ligges> > Thank you, > > Andr? Rossi > > setClass("Buffer", > representation=representation( > Listexamples = "list", > idx = "integer" > ) > ) > > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On 09/10/2011 08:08 AM, Andr? Rossi wrote:> Hi everybody! > > I'm creating an object of a S4 class that has two slots: ListExamples, which > is a list, and idx, which is an integer (as the code below). > > Then, I read a data.frame file with 10000 (ten thousands) of lines and 10 > columns, do some pre-processing and, basically, I store each line as an > element of a list in the slot ListExamples of the S4 object. However, many > operations after this take a considerable time. > > Can anyone explain me why dois it happen? Is it possible to speed up an > script that deals with a big number of data (it might be data.frame or > list)? > > Thank you, > > Andr? Rossi > > setClass("Buffer", > representation=representation( > Listexamples = "list", > idx = "integer" > ) > )Hi Andr?, Can you provide a simpler and more reproducible example, for instance > setClass("Buf", representation=representation(lst="list")) [1] "Buf" > b=new("Buf", lst=replicate(10000, list(10), simplify=FALSE)) > system.time({ b at lst[[1]][[1]] = 2 }) user system elapsed 0.005 0.000 0.005 Generally it sounds like you're modeling the rows as elements of Listofelements, but you're better served by modeling the columns (lst = replicate(10, integer(10000)), if all of your 10 columns were integer-valued, for instance). Also, S4 is providing some measure of type safety, and you're undermining that by having your class contain a 'list'. I'd go after setClass("Buffer", representation=representation( col1="integer", col2="character", col3="numeric" ## etc. ), validity=function(object) { nms <- slotNames(object) len <- sapply(nms, function(nm) length(slot(object, nm))) if (1L != length(unique(len))) "slots must all be of same length" else TRUE }) Buffer <- function(col1, col2, col3, ...) { new("Buffer", col1=col1, col2=col2, col3=col3, ...) } Let's see where the inefficiencies are before deciding that this is an S4 issue. Martin> > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
Dear Martin Morgan and Martin Maechler...
Here is an example of the computational time when a slot of a S4 class is of
another S4 class and when it is just one object. I'm sending you the data
file.
Thank you!
Best regards,
Andr? Rossi
############################################################
setClass("SupervisedExample",
representation(
attr.value = "ANY",
target.value = "ANY"
))
setClass("StreamBuffer",
representation=representation(
examples = "list", #SupervisedExample
max.length = "integer"
),
prototype=list(
max.length = as.integer(10000)
)
)
b <- new("StreamBuffer")
load("~/Dropbox/dataList2.RData")
b at examples <- data #data is a list of SupervisedExample class.
> system.time({for (i in 1:100) b at examples[[1]]@attr.value[1] = 2 })
user system elapsed
16.837 0.108 18.244
> system.time({for (i in 1:100) data[[1]]@attr.value[1] = 2 })
user system elapsed
0.024 0.000 0.026
############################################################
2011/9/10 Martin Morgan <mtmorgan at fhcrc.org>
> On 09/10/2011 08:08 AM, Andr? Rossi wrote:
>
>> Hi everybody!
>>
>> I'm creating an object of a S4 class that has two slots:
ListExamples,
>> which
>> is a list, and idx, which is an integer (as the code below).
>>
>> Then, I read a data.frame file with 10000 (ten thousands) of lines and
10
>> columns, do some pre-processing and, basically, I store each line as an
>> element of a list in the slot ListExamples of the S4 object. However,
many
>> operations after this take a considerable time.
>>
>> Can anyone explain me why dois it happen? Is it possible to speed up an
>> script that deals with a big number of data (it might be data.frame or
>> list)?
>>
>> Thank you,
>>
>> Andr? Rossi
>>
>> setClass("Buffer",
>> representation=representation(
>> Listexamples = "list",
>> idx = "integer"
>> )
>> )
>>
>
> Hi Andr?,
>
> Can you provide a simpler and more reproducible example, for instance
>
> > setClass("Buf",
representation=representation(**lst="list"))
> [1] "Buf"
> > b=new("Buf", lst=replicate(10000, list(10), simplify=FALSE))
> > system.time({ b at lst[[1]][[1]] = 2 })
> user system elapsed
> 0.005 0.000 0.005
>
> Generally it sounds like you're modeling the rows as elements of
> Listofelements, but you're better served by modeling the columns (lst
> replicate(10, integer(10000)), if all of your 10 columns were
> integer-valued, for instance). Also, S4 is providing some measure of type
> safety, and you're undermining that by having your class contain a
'list'.
> I'd go after
>
> setClass("Buffer",
> representation=representation(
> col1="integer",
> col2="character",
> col3="numeric"
> ## etc.
> ),
> validity=function(object) {
> nms <- slotNames(object)
> len <- sapply(nms, function(nm) length(slot(object, nm)))
> if (1L != length(unique(len)))
> "slots must all be of same length"
> else TRUE
> })
>
> Buffer <-
> function(col1, col2, col3, ...)
> {
> new("Buffer", col1=col1, col2=col2, col3=col3, ...)
> }
>
> Let's see where the inefficiencies are before deciding that this is an
S4
> issue.
>
> Martin
>
>
>
>> [[alternative HTML version deleted]]
>>
>>
>>
>>
>> ______________________________**________________
>> R-help at r-project.org mailing list
>>
https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>