C W
2017-Oct-20 18:11 UTC
[R] What exactly is an dgCMatrix-class. There are so many attributes.
Dear R list, I came across dgCMatrix. I believe this class is associated with sparse matrix. I see there are 8 attributes to train$data, I am confused why are there so many, some are vectors, what do they do? Here's the R code: library(xgboost) data(agaricus.train, package='xgboost') data(agaricus.test, package='xgboost') train <- agaricus.train test <- agaricus.test attributes(train$data) Where is the data, is it in $p, $i, or $x? Thank you very much! [[alternative HTML version deleted]]
William Dunlap
2017-Oct-20 18:42 UTC
[R] What exactly is an dgCMatrix-class. There are so many attributes.
You should not really have worry about the internal structure of such a thing - just treat it like a matrix. E.g.,> train$data[1:3,1:3] # dots mean 0's3 x 3 sparse Matrix of class "dgCMatrix" cap-shape=bell cap-shape=conical cap-shape=convex [1,] . . 1 [2,] . . 1 [3,] 1 . .> dim(train$data)[1] 6513 126> p <- train$data %*% matrix(1:126, ncol=1) > dim(p)[1] 6513 1 If that doesn't work in some situation, convert it to a matrix with as.matrix. To see the details, in R, type class?dgCMatrix or help("dgCMatrix-class") In a browser seach window type R dgCmatrix You should not have to make use of those details in your code. Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Oct 20, 2017 at 11:11 AM, C W <tmrsg11 at gmail.com> wrote:> Dear R list, > > I came across dgCMatrix. I believe this class is associated with sparse > matrix. > > I see there are 8 attributes to train$data, I am confused why are there so > many, some are vectors, what do they do? > > Here's the R code: > > library(xgboost) > data(agaricus.train, package='xgboost') > data(agaricus.test, package='xgboost') > train <- agaricus.train > test <- agaricus.test > attributes(train$data) > > Where is the data, is it in $p, $i, or $x? > > Thank you very much! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
David Winsemius
2017-Oct-20 19:22 UTC
[R] What exactly is an dgCMatrix-class. There are so many attributes.
> On Oct 20, 2017, at 11:11 AM, C W <tmrsg11 at gmail.com> wrote: > > Dear R list, > > I came across dgCMatrix. I believe this class is associated with sparse > matrix.Yes. See: help('dgCMatrix-class', pack=Matrix) If Martin Maechler happens to respond to this you should listen to him rather than anything I write. Much of what the Matrix package does appears to be magical to one such as I.> > I see there are 8 attributes to train$data, I am confused why are there so > many, some are vectors, what do they do? > > Here's the R code: > > library(xgboost) > data(agaricus.train, package='xgboost') > data(agaricus.test, package='xgboost') > train <- agaricus.train > test <- agaricus.test > attributes(train$data) >I got a bit of an annoying surprise when I did something similar. It appearred to me that I did not need to load the xgboost library since all that was being asked was "where is the data" in an object that should be loaded from that library using the `data` function. The last command asking for the attributes filled up my console with a 100K length vector (actually 2 of such vectors). The `str` function returns a more useful result.> data(agaricus.train, package='xgboost') > train <- agaricus.train > names( attributes(train$data) )[1] "i" "p" "Dim" "Dimnames" "x" "factors" "class"> str(train$data)Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:143286] 2 6 8 11 18 20 21 24 28 32 ... ..@ p : int [1:127] 0 369 372 3306 5845 6489 6513 8380 8384 10991 ... ..@ Dim : int [1:2] 6513 126 ..@ Dimnames:List of 2 .. ..$ : NULL .. ..$ : chr [1:126] "cap-shape=bell" "cap-shape=conical" "cap-shape=convex" "cap-shape=flat" ... ..@ x : num [1:143286] 1 1 1 1 1 1 1 1 1 1 ... ..@ factors : list()> Where is the data, is it in $p, $i, or $x?So the "data" (meaning the values of the sparse matrix) are in the @x leaf. The values all appear to be the number 1. The @i leaf is the sequence of row locations for the values entries while the @p items are somehow connected with the columns (I think, since 127 and 126=number of columns from the @Dim leaf are only off by 1). Doing this > colSums(as.matrix(train$data)) cap-shape=bell cap-shape=conical 369 3 cap-shape=convex cap-shape=flat 2934 2539 cap-shape=knobbed cap-shape=sunken 644 24 cap-surface=fibrous cap-surface=grooves 1867 4 cap-surface=scaly cap-surface=smooth 2607 2035 cap-color=brown cap-color=buff 1816 # now snipping the rest of that output. Now this makes me think that the @p vector gives you the cumulative sum of number of items per column:> all( cumsum( colSums(as.matrix(train$data)) ) == train$data at p[-1] )[1] TRUE> > Thank you very much! > > [[alternative HTML version deleted]]Please read the Posting Guide. Your code was not mangled in this instance, but HTML code often arrives in an unreadable mess.> > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
C W
2017-Oct-20 19:51 UTC
[R] What exactly is an dgCMatrix-class. There are so many attributes.
Thank you for your responses. I guess I don't feel alone. I don't find the documentation go into any detail. I also find it surprising that,> object.size(train$data)1730904 bytes> object.size(as.matrix(train$data))6575016 bytes the dgCMatrix actually takes less memory, though it *looks* like the opposite. Cheers! On Fri, Oct 20, 2017 at 3:22 PM, David Winsemius <dwinsemius at comcast.net> wrote:> > > On Oct 20, 2017, at 11:11 AM, C W <tmrsg11 at gmail.com> wrote: > > > > Dear R list, > > > > I came across dgCMatrix. I believe this class is associated with sparse > > matrix. > > Yes. See: > > help('dgCMatrix-class', pack=Matrix) > > If Martin Maechler happens to respond to this you should listen to him > rather than anything I write. Much of what the Matrix package does appears > to be magical to one such as I. > > > > > I see there are 8 attributes to train$data, I am confused why are there > so > > many, some are vectors, what do they do? > > > > Here's the R code: > > > > library(xgboost) > > data(agaricus.train, package='xgboost') > > data(agaricus.test, package='xgboost') > > train <- agaricus.train > > test <- agaricus.test > > attributes(train$data) > > > > I got a bit of an annoying surprise when I did something similar. It > appearred to me that I did not need to load the xgboost library since all > that was being asked was "where is the data" in an object that should be > loaded from that library using the `data` function. The last command asking > for the attributes filled up my console with a 100K length vector (actually > 2 of such vectors). The `str` function returns a more useful result. > > > data(agaricus.train, package='xgboost') > > train <- agaricus.train > > names( attributes(train$data) ) > [1] "i" "p" "Dim" "Dimnames" "x" "factors" > "class" > > str(train$data) > Formal class 'dgCMatrix' [package "Matrix"] with 6 slots > ..@ i : int [1:143286] 2 6 8 11 18 20 21 24 28 32 ... > ..@ p : int [1:127] 0 369 372 3306 5845 6489 6513 8380 8384 10991 > ... > ..@ Dim : int [1:2] 6513 126 > ..@ Dimnames:List of 2 > .. ..$ : NULL > .. ..$ : chr [1:126] "cap-shape=bell" "cap-shape=conical" > "cap-shape=convex" "cap-shape=flat" ... > ..@ x : num [1:143286] 1 1 1 1 1 1 1 1 1 1 ... > ..@ factors : list() > > > Where is the data, is it in $p, $i, or $x? > > So the "data" (meaning the values of the sparse matrix) are in the @x > leaf. The values all appear to be the number 1. The @i leaf is the sequence > of row locations for the values entries while the @p items are somehow > connected with the columns (I think, since 127 and 126=number of columns > from the @Dim leaf are only off by 1). > > Doing this > colSums(as.matrix(train$data)) > cap-shape=bell cap-shape=conical > 369 3 > cap-shape=convex cap-shape=flat > 2934 2539 > cap-shape=knobbed cap-shape=sunken > 644 24 > cap-surface=fibrous cap-surface=grooves > 1867 4 > cap-surface=scaly cap-surface=smooth > 2607 2035 > cap-color=brown cap-color=buff > 1816 > # now snipping the rest of that output. > > > > Now this makes me think that the @p vector gives you the cumulative sum of > number of items per column: > > > all( cumsum( colSums(as.matrix(train$data)) ) == train$data at p[-1] ) > [1] TRUE > > > > > Thank you very much! > > > > [[alternative HTML version deleted]] > > Please read the Posting Guide. Your code was not mangled in this instance, > but HTML code often arrives in an unreadable mess. > > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posti > ng-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > 'Any technology distinguishable from magic is insufficiently advanced.' > -Gehm's Corollary to Clarke's Third Law > > > > > >[[alternative HTML version deleted]]
Apparently Analagous Threads
- What exactly is an dgCMatrix-class. There are so many attributes.
- What exactly is an dgCMatrix-class. There are so many attributes.
- What exactly is an dgCMatrix-class. There are so many attributes.
- What exactly is an dgCMatrix-class. There are so many attributes.
- What exactly is an dgCMatrix-class. There are so many attributes.