Hi everybody! I'm creating an object of a S4 class that has two slots: ListExamples, which is a list, and idx, which is an integer (as the code below). Then, I read a data.frame file with 10000 (ten thousands) of lines and 10 columns, do some pre-processing and, basically, I store each line as an element of a list in the slot ListExamples of the S4 object. However, many operations after this take a considerable time. Can anyone explain me why dois it happen? Is it possible to speed up an script that deals with a big number of data (it might be data.frame or list)? Thank you, André Rossi setClass("Buffer", representation=representation( Listexamples = "list", idx = "integer" ) ) [[alternative HTML version deleted]]
On 10.09.2011 17:08, Andr? Rossi wrote:> Hi everybody! > > I'm creating an object of a S4 class that has two slots: ListExamples, which > is a list, and idx, which is an integer (as the code below). > > Then, I read a data.frame file with 10000 (ten thousands) of lines and 10 > columns, do some pre-processing and, basically, I store each line as an > element of a list in the slot ListExamples of the S4 object. However, many > operations after this take a considerable time. > > Can anyone explain me why dois it happen? Is it possible to speed up an > script that deals with a big number of data (it might be data.frame or > list)?0. This sounds inefficient even when not using S4 classes. You may want to reconsider how to solve your problem in another way. 1. A first step for speedup would be not to use S4 objects. 2. If you really want to use some S4 object to store the stuff, try to assign to a list (that is not wrapped within an S4 class) first and then assign the whole list into your Buffer object. Best, Uwe Ligges> > Thank you, > > Andr? Rossi > > setClass("Buffer", > representation=representation( > Listexamples = "list", > idx = "integer" > ) > ) > > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On 09/10/2011 08:08 AM, Andr? Rossi wrote:> Hi everybody! > > I'm creating an object of a S4 class that has two slots: ListExamples, which > is a list, and idx, which is an integer (as the code below). > > Then, I read a data.frame file with 10000 (ten thousands) of lines and 10 > columns, do some pre-processing and, basically, I store each line as an > element of a list in the slot ListExamples of the S4 object. However, many > operations after this take a considerable time. > > Can anyone explain me why dois it happen? Is it possible to speed up an > script that deals with a big number of data (it might be data.frame or > list)? > > Thank you, > > Andr? Rossi > > setClass("Buffer", > representation=representation( > Listexamples = "list", > idx = "integer" > ) > )Hi Andr?, Can you provide a simpler and more reproducible example, for instance > setClass("Buf", representation=representation(lst="list")) [1] "Buf" > b=new("Buf", lst=replicate(10000, list(10), simplify=FALSE)) > system.time({ b at lst[[1]][[1]] = 2 }) user system elapsed 0.005 0.000 0.005 Generally it sounds like you're modeling the rows as elements of Listofelements, but you're better served by modeling the columns (lst = replicate(10, integer(10000)), if all of your 10 columns were integer-valued, for instance). Also, S4 is providing some measure of type safety, and you're undermining that by having your class contain a 'list'. I'd go after setClass("Buffer", representation=representation( col1="integer", col2="character", col3="numeric" ## etc. ), validity=function(object) { nms <- slotNames(object) len <- sapply(nms, function(nm) length(slot(object, nm))) if (1L != length(unique(len))) "slots must all be of same length" else TRUE }) Buffer <- function(col1, col2, col3, ...) { new("Buffer", col1=col1, col2=col2, col3=col3, ...) } Let's see where the inefficiencies are before deciding that this is an S4 issue. Martin> > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
Dear Martin Morgan and Martin Maechler... Here is an example of the computational time when a slot of a S4 class is of another S4 class and when it is just one object. I'm sending you the data file. Thank you! Best regards, Andr? Rossi ############################################################ setClass("SupervisedExample", representation( attr.value = "ANY", target.value = "ANY" )) setClass("StreamBuffer", representation=representation( examples = "list", #SupervisedExample max.length = "integer" ), prototype=list( max.length = as.integer(10000) ) ) b <- new("StreamBuffer") load("~/Dropbox/dataList2.RData") b at examples <- data #data is a list of SupervisedExample class.> system.time({for (i in 1:100) b at examples[[1]]@attr.value[1] = 2 })user system elapsed 16.837 0.108 18.244> system.time({for (i in 1:100) data[[1]]@attr.value[1] = 2 })user system elapsed 0.024 0.000 0.026 ############################################################ 2011/9/10 Martin Morgan <mtmorgan at fhcrc.org>> On 09/10/2011 08:08 AM, Andr? Rossi wrote: > >> Hi everybody! >> >> I'm creating an object of a S4 class that has two slots: ListExamples, >> which >> is a list, and idx, which is an integer (as the code below). >> >> Then, I read a data.frame file with 10000 (ten thousands) of lines and 10 >> columns, do some pre-processing and, basically, I store each line as an >> element of a list in the slot ListExamples of the S4 object. However, many >> operations after this take a considerable time. >> >> Can anyone explain me why dois it happen? Is it possible to speed up an >> script that deals with a big number of data (it might be data.frame or >> list)? >> >> Thank you, >> >> Andr? Rossi >> >> setClass("Buffer", >> representation=representation( >> Listexamples = "list", >> idx = "integer" >> ) >> ) >> > > Hi Andr?, > > Can you provide a simpler and more reproducible example, for instance > > > setClass("Buf", representation=representation(**lst="list")) > [1] "Buf" > > b=new("Buf", lst=replicate(10000, list(10), simplify=FALSE)) > > system.time({ b at lst[[1]][[1]] = 2 }) > user system elapsed > 0.005 0.000 0.005 > > Generally it sounds like you're modeling the rows as elements of > Listofelements, but you're better served by modeling the columns (lst > replicate(10, integer(10000)), if all of your 10 columns were > integer-valued, for instance). Also, S4 is providing some measure of type > safety, and you're undermining that by having your class contain a 'list'. > I'd go after > > setClass("Buffer", > representation=representation( > col1="integer", > col2="character", > col3="numeric" > ## etc. > ), > validity=function(object) { > nms <- slotNames(object) > len <- sapply(nms, function(nm) length(slot(object, nm))) > if (1L != length(unique(len))) > "slots must all be of same length" > else TRUE > }) > > Buffer <- > function(col1, col2, col3, ...) > { > new("Buffer", col1=col1, col2=col2, col3=col3, ...) > } > > Let's see where the inefficiencies are before deciding that this is an S4 > issue. > > Martin > > > >> [[alternative HTML version deleted]] >> >> >> >> >> ______________________________**________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 >