On 09/07/2021 5:51 p.m., Jeff Newmiller wrote:> "Strictly speaking", Greg is correct, Bert. > > https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects > > Lists in R are vectors. What we colloquially refer to as "vectors" are more precisely referred to as "atomic vectors". And without a doubt, this "vector" nature of lists is a key underlying concept that explains why adding a dim attribute creates a matrix that can hold data frames. It is also a stumbling block for programmers from other languages that have things like linked lists.I would also object to v3 (below) as "extracting" a column from d. "d[2]" doesn't extract anything, it "subsets" the data frame, so the result is a data frame, not what you get when you extract something from a data frame. People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly legal. That extracts the 3rd element (the number 3). The problem is that R has no way to represent a scalar number, only a vector of numbers, so x[[3]] gets promoted to a vector containing that number when it is returned and assigned to y. Lists are vectors of R objects, so if x is a list, x[[3]] is something that can be returned, and it is different from x[3]. Duncan Murdoch> > On July 9, 2021 2:36:19 PM PDT, Bert Gunter <bgunter.4567 at gmail.com> wrote: >> "1. a column, when extracted from a data frame, *is* a vector." >> Strictly speaking, this is false; it depends on exactly what is meant >> by "extracted." e.g.: >> >>> d <- data.frame(col1 = 1:3, col2 = letters[1:3]) >>> v1 <- d[,2] ## a vector >>> v2 <- d[[2]] ## the same, i.e >>> identical(v1,v2) >> [1] TRUE >>> v3 <- d[2] ## a data.frame >>> v1 >> [1] "a" "b" "c" ## a character vector >>> v3 >> col2 >> 1 a >> 2 b >> 3 c >>> is.vector(v1) >> [1] TRUE >>> is.vector(v3) >> [1] FALSE >>> class(v3) ## data.frame >> [1] "data.frame" >> ## but >>> is.list(v3) >> [1] TRUE >> >> which is simply explained in ?data.frame (where else?!) by: >> "A data frame is a **list** [emphasis added] of variables of the same >> number of rows with unique row names, given class "data.frame". If no >> variables are included, the row names determine the number of rows." >> >> "2. maybe your question is "is a given function for a vector, or for a >> data frame/matrix/array?". if so, i think the only way is reading >> the help information (?foo)." >> >> Indeed! Is this not what the Help system is for?! But note also that >> the S3 class system may somewhat blur the issue: foo() may work >> appropriately and differently for different (S3) classes of objects. A >> detailed explanation of this behavior can be found in appropriate >> resources or (more tersely) via ?UseMethod . >> >> "you might find reading ?"[" and ?"[.data.frame" useful" >> >> Not just 'useful" -- **essential** if you want to work in R, unless >> one gets this information via any of the numerous online tutorials, >> courses, or books that are available. The Help system is accurate and >> authoritative, but terse. I happen to like this mode of documentation, >> but others may prefer more extended expositions. I stand by this claim >> even if one chooses to use the "Tidyverse", data.table package, or >> other alternative frameworks for handling data. Again, others may >> disagree, but R is structured around these basics, and imo one remains >> ignorant of them at their peril. >> >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall at umich.edu> >> wrote: >>> >>> Kai, >>> >>>> one more question, how can I know if the function is for column >>>> manipulations or for vector? >>> >>> i still stumble around R code. but, i'd say the following (and look >>> forward to being corrected! :): >>> >>> 1. a column, when extracted from a data frame, *is* a vector. >>> >>> 2. maybe your question is "is a given function for a vector, or for >> a >>> data frame/matrix/array?". if so, i think the only way is >> reading >>> the help information (?foo). >>> >>> 3. sometimes, extracting the column as a vector from a data >> frame-like >>> object might be non-intuitive. you might find reading ?"[" and >>> ?"[.data.frame" useful (as well as ?"[.data.table" if you use >> that >>> package). also, the str() command can be helpful in >> understanding >>> what is happening. (the lobstr:: package's sxp() function, as >> well >>> as more verbose .Internal(inspect()) can also give you insight.) >>> >>> with the data.table:: package, for example, if "DT" is a >> data.table >>> object, with "x2" as a column, adding or leaving off quotation >> marks >>> for the column name can make all the difference between ending up >>> with a vector, or with a (much reduced) data table: >>> ---- >>>> is.vector(DT[, x2]) >>> [1] TRUE >>>> str(DT[, x2]) >>> num [1:9] 32 32 32 32 32 32 32 32 32 >>>> >>>> is.vector(DT[, "x2"]) >>> [1] FALSE >>>> str(DT[, "x2"]) >>> Classes ?data.table? and 'data.frame': 9 obs. of 1 variable: >>> $ x2: num 32 32 32 32 32 32 32 32 32 >>> - attr(*, ".internal.selfref")=<externalptr> >>> ---- >>> >>> a second level of indexing may or may not help, mostly depending >> on >>> the use of '[' versus of '[['. this can sometimes cause >> confusion >>> when you are learning the language. >>> ---- >>>> str(DT[, "x2"][1]) >>> Classes ?data.table? and 'data.frame': 1 obs. of 1 variable: >>> $ x2: num 32 >>> - attr(*, ".internal.selfref")=<externalptr> >>>> str(DT[, "x2"][[1]]) >>> num [1:9] 32 32 32 32 32 32 32 32 32 >>> ---- >>> >>> the tibble:: package (used in, e.g., the dplyr:: package) also >>> (always?) returns a single column as a non-vector. again, a >>> second indexing with double '[[]]' can produce a vector. >>> ---- >>>> DP <- tibble(DT) >>>> is.vector(DP[, "x2"]) >>> [1] FALSE >>>> is.vector(DP[, "x2"][[1]]) >>> [1] TRUE >>> ---- >>> >>> but, note that a list of lists is also a vector: >>>> is.vector(list(list(1), list(1,2,3))) >>> [1] TRUE >>>> str(list(list(1), list(1,2,3))) >>> List of 2 >>> $ :List of 1 >>> ..$ : num 1 >>> $ :List of 3 >>> ..$ : num 1 >>> ..$ : num 2 >>> ..$ : num 3 >>> >>> etc. >>> >>> hth. good luck learning! >>> >>> cheers, Greg >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >
OK, I stand somewhat chastised. But my point still is that what you get when you "extract" depends on how you define "extract." Do note that ?"[" yields a help file titled "Extract or Replace Parts of an object"; and afaics, the term "subset" is not explicitly used as Duncan prefers. The relevant part of the Help file says for "[" for recursive objects says: "Indexing by [ is similar to atomic vectors and selects a list of the specified element(s)." That a data.frame is a list is explicitly stated, as I noted; that lists are in fact vectors is also explicitly stated (?list says: "Almost all lists in R internally are Generic Vectors") but then one is stuck with: a data.frame is a list and therefore a vector, but is.vector(d3) is FALSE. The explanation is explicit again in ?is.vector ("is.vector returns TRUE if x is a vector of the specified mode having no attributes other than names. It returns FALSE otherwise."). But I would say these issues are sufficiently murky that my warning to be precise is not entirely inappropriate; unfortunately, I may have made them more so. Sigh.... Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Jul 9, 2021 at 3:05 PM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> > On 09/07/2021 5:51 p.m., Jeff Newmiller wrote: > > "Strictly speaking", Greg is correct, Bert. > > > > https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects > > > > Lists in R are vectors. What we colloquially refer to as "vectors" are more precisely referred to as "atomic vectors". And without a doubt, this "vector" nature of lists is a key underlying concept that explains why adding a dim attribute creates a matrix that can hold data frames. It is also a stumbling block for programmers from other languages that have things like linked lists. > > I would also object to v3 (below) as "extracting" a column from d. > "d[2]" doesn't extract anything, it "subsets" the data frame, so the > result is a data frame, not what you get when you extract something from > a data frame. > > People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly legal. > That extracts the 3rd element (the number 3). The problem is that R has > no way to represent a scalar number, only a vector of numbers, so x[[3]] > gets promoted to a vector containing that number when it is returned and > assigned to y. > > Lists are vectors of R objects, so if x is a list, x[[3]] is something > that can be returned, and it is different from x[3]. > > Duncan Murdoch > > > > > On July 9, 2021 2:36:19 PM PDT, Bert Gunter <bgunter.4567 at gmail.com> wrote: > >> "1. a column, when extracted from a data frame, *is* a vector." > >> Strictly speaking, this is false; it depends on exactly what is meant > >> by "extracted." e.g.: > >> > >>> d <- data.frame(col1 = 1:3, col2 = letters[1:3]) > >>> v1 <- d[,2] ## a vector > >>> v2 <- d[[2]] ## the same, i.e > >>> identical(v1,v2) > >> [1] TRUE > >>> v3 <- d[2] ## a data.frame > >>> v1 > >> [1] "a" "b" "c" ## a character vector > >>> v3 > >> col2 > >> 1 a > >> 2 b > >> 3 c > >>> is.vector(v1) > >> [1] TRUE > >>> is.vector(v3) > >> [1] FALSE > >>> class(v3) ## data.frame > >> [1] "data.frame" > >> ## but > >>> is.list(v3) > >> [1] TRUE > >> > >> which is simply explained in ?data.frame (where else?!) by: > >> "A data frame is a **list** [emphasis added] of variables of the same > >> number of rows with unique row names, given class "data.frame". If no > >> variables are included, the row names determine the number of rows." > >> > >> "2. maybe your question is "is a given function for a vector, or for a > >> data frame/matrix/array?". if so, i think the only way is reading > >> the help information (?foo)." > >> > >> Indeed! Is this not what the Help system is for?! But note also that > >> the S3 class system may somewhat blur the issue: foo() may work > >> appropriately and differently for different (S3) classes of objects. A > >> detailed explanation of this behavior can be found in appropriate > >> resources or (more tersely) via ?UseMethod . > >> > >> "you might find reading ?"[" and ?"[.data.frame" useful" > >> > >> Not just 'useful" -- **essential** if you want to work in R, unless > >> one gets this information via any of the numerous online tutorials, > >> courses, or books that are available. The Help system is accurate and > >> authoritative, but terse. I happen to like this mode of documentation, > >> but others may prefer more extended expositions. I stand by this claim > >> even if one chooses to use the "Tidyverse", data.table package, or > >> other alternative frameworks for handling data. Again, others may > >> disagree, but R is structured around these basics, and imo one remains > >> ignorant of them at their peril. > >> > >> Cheers, > >> Bert > >> > >> > >> Bert Gunter > >> > >> "The trouble with having an open mind is that people keep coming along > >> and sticking things into it." > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> > >> On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall at umich.edu> > >> wrote: > >>> > >>> Kai, > >>> > >>>> one more question, how can I know if the function is for column > >>>> manipulations or for vector? > >>> > >>> i still stumble around R code. but, i'd say the following (and look > >>> forward to being corrected! :): > >>> > >>> 1. a column, when extracted from a data frame, *is* a vector. > >>> > >>> 2. maybe your question is "is a given function for a vector, or for > >> a > >>> data frame/matrix/array?". if so, i think the only way is > >> reading > >>> the help information (?foo). > >>> > >>> 3. sometimes, extracting the column as a vector from a data > >> frame-like > >>> object might be non-intuitive. you might find reading ?"[" and > >>> ?"[.data.frame" useful (as well as ?"[.data.table" if you use > >> that > >>> package). also, the str() command can be helpful in > >> understanding > >>> what is happening. (the lobstr:: package's sxp() function, as > >> well > >>> as more verbose .Internal(inspect()) can also give you insight.) > >>> > >>> with the data.table:: package, for example, if "DT" is a > >> data.table > >>> object, with "x2" as a column, adding or leaving off quotation > >> marks > >>> for the column name can make all the difference between ending up > >>> with a vector, or with a (much reduced) data table: > >>> ---- > >>>> is.vector(DT[, x2]) > >>> [1] TRUE > >>>> str(DT[, x2]) > >>> num [1:9] 32 32 32 32 32 32 32 32 32 > >>>> > >>>> is.vector(DT[, "x2"]) > >>> [1] FALSE > >>>> str(DT[, "x2"]) > >>> Classes ?data.table? and 'data.frame': 9 obs. of 1 variable: > >>> $ x2: num 32 32 32 32 32 32 32 32 32 > >>> - attr(*, ".internal.selfref")=<externalptr> > >>> ---- > >>> > >>> a second level of indexing may or may not help, mostly depending > >> on > >>> the use of '[' versus of '[['. this can sometimes cause > >> confusion > >>> when you are learning the language. > >>> ---- > >>>> str(DT[, "x2"][1]) > >>> Classes ?data.table? and 'data.frame': 1 obs. of 1 variable: > >>> $ x2: num 32 > >>> - attr(*, ".internal.selfref")=<externalptr> > >>>> str(DT[, "x2"][[1]]) > >>> num [1:9] 32 32 32 32 32 32 32 32 32 > >>> ---- > >>> > >>> the tibble:: package (used in, e.g., the dplyr:: package) also > >>> (always?) returns a single column as a non-vector. again, a > >>> second indexing with double '[[]]' can produce a vector. > >>> ---- > >>>> DP <- tibble(DT) > >>>> is.vector(DP[, "x2"]) > >>> [1] FALSE > >>>> is.vector(DP[, "x2"][[1]]) > >>> [1] TRUE > >>> ---- > >>> > >>> but, note that a list of lists is also a vector: > >>>> is.vector(list(list(1), list(1,2,3))) > >>> [1] TRUE > >>>> str(list(list(1), list(1,2,3))) > >>> List of 2 > >>> $ :List of 1 > >>> ..$ : num 1 > >>> $ :List of 3 > >>> ..$ : num 1 > >>> ..$ : num 2 > >>> ..$ : num 3 > >>> > >>> etc. > >>> > >>> hth. good luck learning! > >>> > >>> cheers, Greg > >>> > >>> ______________________________________________ > >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > >
Thanks Bert, I'm reading some books now. But it takes me a while to get familiar R. Best, Kai On Friday, July 9, 2021, 03:06:11 PM PDT, Duncan Murdoch <murdoch.duncan at gmail.com> wrote: On 09/07/2021 5:51 p.m., Jeff Newmiller wrote:> "Strictly speaking", Greg is correct, Bert. > > https://cran.r-project.org/doc/manuals/r-release/R-lang.html#List-objects > > Lists in R are vectors. What we colloquially refer to as "vectors" are more precisely referred to as "atomic vectors". And without a doubt, this "vector" nature of lists is a key underlying concept that explains why adding a dim attribute creates a matrix that can hold data frames. It is also a stumbling block for programmers from other languages that have things like linked lists.I would also object to v3 (below) as "extracting" a column from d. "d[2]" doesn't extract anything, it "subsets" the data frame, so the result is a data frame, not what you get when you extract something from a data frame. People don't realize that "x <- 1:10; y <- x[[3]]" is perfectly legal. That extracts the 3rd element (the number 3).? The problem is that R has no way to represent a scalar number, only a vector of numbers, so x[[3]] gets promoted to a vector containing that number when it is returned and assigned to y. Lists are vectors of R objects, so if x is a list, x[[3]] is something that can be returned, and it is different from x[3]. Duncan Murdoch> > On July 9, 2021 2:36:19 PM PDT, Bert Gunter <bgunter.4567 at gmail.com> wrote: >> "1.? a column, when extracted from a data frame, *is* a vector." >> Strictly speaking, this is false; it depends on exactly what is meant >> by "extracted." e.g.: >> >>> d <- data.frame(col1 = 1:3, col2 = letters[1:3]) >>> v1 <- d[,2] ## a vector >>> v2 <- d[[2]] ## the same, i.e >>> identical(v1,v2) >> [1] TRUE >>> v3 <- d[2] ## a data.frame >>> v1 >> [1] "a" "b" "c"? ## a character vector >>> v3 >>? col2 >> 1? ? a >> 2? ? b >> 3? ? c >>> is.vector(v1) >> [1] TRUE >>> is.vector(v3) >> [1] FALSE >>> class(v3)? ## data.frame >> [1] "data.frame" >> ## but >>> is.list(v3) >> [1] TRUE >> >> which is simply explained in ?data.frame (where else?!) by: >> "A data frame is a **list** [emphasis added] of variables of the same >> number of rows with unique row names, given class "data.frame". If no >> variables are included, the row names determine the number of rows." >> >> "2.? maybe your question is "is a given function for a vector, or for a >>? ? data frame/matrix/array?".? if so, i think the only way is reading >>? ? the help information (?foo)." >> >> Indeed! Is this not what the Help system is for?! But note also that >> the S3 class system may somewhat blur the issue: foo() may work >> appropriately and differently for different (S3) classes of objects. A >> detailed explanation of this behavior can be found in appropriate >> resources or (more tersely) via ?UseMethod . >> >> "you might find reading ?"[" and? ?"[.data.frame" useful" >> >> Not just 'useful" -- **essential** if you want to work in R, unless >> one gets this information via any of the numerous online tutorials, >> courses, or books that are available. The Help system is accurate and >> authoritative, but terse. I happen to like this mode of documentation, >> but others may prefer more extended expositions. I stand by this claim >> even if one chooses to use the "Tidyverse", data.table package, or >> other alternative frameworks for handling data. Again, others may >> disagree, but R is structured around these basics, and imo one remains >> ignorant of them at their peril. >> >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> On Fri, Jul 9, 2021 at 11:57 AM Greg Minshall <minshall at umich.edu> >> wrote: >>> >>> Kai, >>> >>>> one more question, how can I know if the function is for column >>>> manipulations or for vector? >>> >>> i still stumble around R code.? but, i'd say the following (and look >>> forward to being corrected! :): >>> >>> 1.? a column, when extracted from a data frame, *is* a vector. >>> >>> 2.? maybe your question is "is a given function for a vector, or for >> a >>>? ? ? data frame/matrix/array?".? if so, i think the only way is >> reading >>>? ? ? the help information (?foo). >>> >>> 3.? sometimes, extracting the column as a vector from a data >> frame-like >>>? ? ? object might be non-intuitive.? you might find reading ?"[" and >>>? ? ? ?"[.data.frame" useful (as well as ?"[.data.table" if you use >> that >>>? ? ? package).? also, the str() command can be helpful in >> understanding >>>? ? ? what is happening.? (the lobstr:: package's sxp() function, as >> well >>>? ? ? as more verbose .Internal(inspect()) can also give you insight.) >>> >>>? ? ? with the data.table:: package, for example, if "DT" is a >> data.table >>>? ? ? object, with "x2" as a column, adding or leaving off quotation >> marks >>>? ? ? for the column name can make all the difference between ending up >>>? ? ? with a vector, or with a (much reduced) data table: >>> ---- >>>> is.vector(DT[, x2]) >>> [1] TRUE >>>> str(DT[, x2]) >>>? num [1:9] 32 32 32 32 32 32 32 32 32 >>>> >>>> is.vector(DT[, "x2"]) >>> [1] FALSE >>>> str(DT[, "x2"]) >>> Classes ?data.table? and 'data.frame':? 9 obs. of? 1 variable: >>>? $ x2: num? 32 32 32 32 32 32 32 32 32 >>>? - attr(*, ".internal.selfref")=<externalptr> >>> ---- >>> >>>? ? ? a second level of indexing may or may not help, mostly depending >> on >>>? ? ? the use of '[' versus of '[['.? this can sometimes cause >> confusion >>>? ? ? when you are learning the language. >>> ---- >>>> str(DT[, "x2"][1]) >>> Classes ?data.table? and 'data.frame':? 1 obs. of? 1 variable: >>>? $ x2: num 32 >>>? - attr(*, ".internal.selfref")=<externalptr> >>>> str(DT[, "x2"][[1]]) >>>? num [1:9] 32 32 32 32 32 32 32 32 32 >>> ---- >>> >>>? ? ? the tibble:: package (used in, e.g., the dplyr:: package) also >>>? ? ? (always?) returns a single column as a non-vector.? again, a >>>? ? ? second indexing with double '[[]]' can produce a vector. >>> ---- >>>> DP <- tibble(DT) >>>> is.vector(DP[, "x2"]) >>> [1] FALSE >>>> is.vector(DP[, "x2"][[1]]) >>> [1] TRUE >>> ---- >>> >>>? ? ? but, note that a list of lists is also a vector: >>>> is.vector(list(list(1), list(1,2,3))) >>> [1] TRUE >>>> str(list(list(1), list(1,2,3))) >>> List of 2 >>>? $ :List of 1 >>>? ? ..$ : num 1 >>>? $ :List of 3 >>>? ? ..$ : num 1 >>>? ? ..$ : num 2 >>>? ? ..$ : num 3 >>> >>>? ? ? etc. >>> >>> hth.? good luck learning! >>> >>> cheers, Greg >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]