@vi@e@gross m@iii@g oii gm@ii@com
2023-Jun-13 23:24 UTC
[R] Problem with filling dataframe's column
Bert, I stand corrected. What I said may have once been true but apparently the implementation seems to have changed at some level. I did not factor that in. Nevertheless, whether you use an index as a key or as an offset into an attached vector of labels, it seems to work the same and I think my comment applies well enough that changing a few labels instead of scanning lots of entries can sometimes be a good think. As far as I can tell, external interface seem the same for now. One issue with R for a long time was how they did not do something more like a Python dictionary and it looks like ? ABOVE From: Bert Gunter <bgunter.4567 at gmail.com> Sent: Tuesday, June 13, 2023 6:15 PM To: avi.e.gross at gmail.com Cc: javad bayat <j.bayat194 at gmail.com>; R-help at r-project.org Subject: Re: [R] Problem with filling dataframe's column Below. On Tue, Jun 13, 2023 at 2:18?PM <avi.e.gross at gmail.com <mailto:avi.e.gross at gmail.com> > wrote:> > > Javad, > > There may be nothing wrong with the methods people are showing you and if it satisfied you, great. > > But I note you have lots of data in over a quarter million rows. If much of the text data is redundant, and you want to simplify some operations such as changing some of the values to others I multiple ways, have you done any learning about an R feature very useful for dealing with categorical data called "factors"? > > If you have a vector or a column in a data.frame that contains text, then it can be replaced by a factor that often takes way less space as it stores a sort of dictionary of all the unique values and just records numbers like 1,2,3 to tell which one each item is.-- This is false. It used to be true a **long time ago**, but R has for quite a while used hashing/global string tables to avoid this problem. See here <https://stackoverflow.com/questions/50310092/why-does-r-use-factors-to-store-characters> for details/references. As a result, I think many would argue that working with strings *as strings,* not factors, if often a better default, though of course there are still situations where factors are useful (e.g. in ordering results by factor levels where the desired level order is not alphabetical). **I would appreciate correction/ clarification if my claims are wrong or misleading! ** In any case, please do check such claims before making them on this list. Cheers, Bert [[alternative HTML version deleted]]
Consider m <- list(foo=c(1,2),"B'ar"=as.matrix(1:4,2,2),"!*#"=c(FALSE,TRUE)) It is a collection of elements of different types/structures, accessible via string keys (and also by position). Entries can be added: m[["fred"]] <- 47 Entries can be removed: m[["!*#"]] <- NULL How much more like a Python dictionary do you need it to be? On Wed, 14 Jun 2023 at 11:25, <avi.e.gross at gmail.com> wrote:> Bert, > > I stand corrected. What I said may have once been true but apparently the > implementation seems to have changed at some level. > > I did not factor that in. > > Nevertheless, whether you use an index as a key or as an offset into an > attached vector of labels, it seems to work the same and I think my comment > applies well enough that changing a few labels instead of scanning lots of > entries can sometimes be a good think. As far as I can tell, external > interface seem the same for now. > > One issue with R for a long time was how they did not do something more > like a Python dictionary and it looks like ? > > ABOVE > > From: Bert Gunter <bgunter.4567 at gmail.com> > Sent: Tuesday, June 13, 2023 6:15 PM > To: avi.e.gross at gmail.com > Cc: javad bayat <j.bayat194 at gmail.com>; R-help at r-project.org > Subject: Re: [R] Problem with filling dataframe's column > > Below. > > > On Tue, Jun 13, 2023 at 2:18?PM <avi.e.gross at gmail.com <mailto: > avi.e.gross at gmail.com> > wrote: > > > > > > Javad, > > > > There may be nothing wrong with the methods people are showing you and > if it satisfied you, great. > > > > But I note you have lots of data in over a quarter million rows. If much > of the text data is redundant, and you want to simplify some operations > such as changing some of the values to others I multiple ways, have you > done any learning about an R feature very useful for dealing with > categorical data called "factors"? > > > > If you have a vector or a column in a data.frame that contains text, > then it can be replaced by a factor that often takes way less space as it > stores a sort of dictionary of all the unique values and just records > numbers like 1,2,3 to tell which one each item is. > > -- This is false. It used to be true a **long time ago**, but R has for > quite a while used hashing/global string tables to avoid this problem. See > here < > https://stackoverflow.com/questions/50310092/why-does-r-use-factors-to-store-characters> > for details/references. > As a result, I think many would argue that working with strings *as > strings,* not factors, if often a better default, though of course there > are still situations where factors are useful (e.g. in ordering results by > factor levels where the desired level order is not alphabetical). > > **I would appreciate correction/ clarification if my claims are wrong or > misleading! ** > > In any case, please do check such claims before making them on this list. > > Cheers, > Bert > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]