thr3ads.net - R help - [R] Problem with filling dataframe's column [Jun 2023]

If this information is useful, please help other people find it:
Share via:

@vi@e@gross m@iii@g oii gm@ii@com

2023-Jun-13 23:24 UTC

[R] Problem with filling dataframe's column

Bert,

I stand corrected. What I said may have once been true but apparently the
implementation seems to have changed at some level.

I did not factor that in.

Nevertheless, whether you use an index as a key or as an offset into an attached
vector of labels, it seems to work the same and I think my comment applies well
enough that changing a few labels instead of scanning lots of entries can
sometimes be a good think. As far as I can tell, external interface seem the
same for now.

One issue with R for a long time was how they did not do something more like a
Python dictionary and it looks like ?

ABOVE

From: Bert Gunter <bgunter.4567 at gmail.com> 
Sent: Tuesday, June 13, 2023 6:15 PM
To: avi.e.gross at gmail.com
Cc: javad bayat <j.bayat194 at gmail.com>; R-help at r-project.org
Subject: Re: [R] Problem with filling dataframe's column

Below.

On Tue, Jun 13, 2023 at 2:18?PM <avi.e.gross at gmail.com
<mailto:avi.e.gross at gmail.com> > wrote:>
>  
> Javad,
>
> There may be nothing wrong with the methods people are showing you and if
it satisfied you, great.
>
> But I note you have lots of data in over a quarter million rows. If much of
the text data is redundant, and you want to simplify some operations such as
changing some of the values to others I multiple ways, have you done any
learning about an R feature very useful for dealing with categorical data called
"factors"?
>
> If you have a vector or a column in a data.frame that contains text, then
it can be replaced by a factor that often takes way less space as it stores a
sort of dictionary of all the unique values and just records numbers like 1,2,3
to tell which one each item is. 
-- This is false. It used to be true a **long time ago**, but R has for quite a
while used hashing/global string tables to avoid this problem. See here
<https://stackoverflow.com/questions/50310092/why-does-r-use-factors-to-store-characters>
for details/references.
As a result, I think many would argue that working with strings *as strings,*
not factors, if often a better default, though of course there are still
situations where factors are useful (e.g. in ordering results by factor levels
where the desired level order is not alphabetical).

**I would appreciate correction/ clarification if my claims are wrong or
misleading! **

In any case, please do check such claims before making them on this list.

Cheers,
Bert

	[[alternative HTML version deleted]]

Richard O'Keefe

2023-Jun-15 02:34 UTC

head link

[R] Problem with filling dataframe's column

Consider

  m <-
list(foo=c(1,2),"B'ar"=as.matrix(1:4,2,2),"!*#"=c(FALSE,TRUE))

It is a collection of elements of different types/structures, accessible
via string keys (and also by position).  Entries can be added:

  m[["fred"]] <- 47

Entries can be removed:

  m[["!*#"]] <- NULL

How much more like a Python dictionary do you need it to be?



On Wed, 14 Jun 2023 at 11:25, <avi.e.gross at gmail.com> wrote:
> Bert,
>
> I stand corrected. What I said may have once been true but apparently the
> implementation seems to have changed at some level.
>
> I did not factor that in.
>
> Nevertheless, whether you use an index as a key or as an offset into an
> attached vector of labels, it seems to work the same and I think my comment
> applies well enough that changing a few labels instead of scanning lots of
> entries can sometimes be a good think. As far as I can tell, external
> interface seem the same for now.
>
> One issue with R for a long time was how they did not do something more
> like a Python dictionary and it looks like ?
>
> ABOVE
>
> From: Bert Gunter <bgunter.4567 at gmail.com>
> Sent: Tuesday, June 13, 2023 6:15 PM
> To: avi.e.gross at gmail.com
> Cc: javad bayat <j.bayat194 at gmail.com>; R-help at r-project.org
> Subject: Re: [R] Problem with filling dataframe's column
>
> Below.
>
>
> On Tue, Jun 13, 2023 at 2:18?PM <avi.e.gross at gmail.com <mailto:
> avi.e.gross at gmail.com> > wrote:
> >
> >
> > Javad,
> >
> > There may be nothing wrong with the methods people are showing you and
> if it satisfied you, great.
> >
> > But I note you have lots of data in over a quarter million rows. If
much
> of the text data is redundant, and you want to simplify some operations
> such as changing some of the values to others I multiple ways, have you
> done any learning about an R feature very useful for dealing with
> categorical data called "factors"?
> >
> > If you have a vector or a column in a data.frame that contains text,
> then it can be replaced by a factor that often takes way less space as it
> stores a sort of dictionary of all the unique values and just records
> numbers like 1,2,3 to tell which one each item is.
>
> -- This is false. It used to be true a **long time ago**, but R has for
> quite a while used hashing/global string tables to avoid this problem. See
> here <
>
https://stackoverflow.com/questions/50310092/why-does-r-use-factors-to-store-characters>
> for details/references.
> As a result, I think many would argue that working with strings *as
> strings,* not factors, if often a better default, though of course there
> are still situations where factors are useful (e.g. in ordering results by
> factor levels where the desired level order is not alphabetical).
>
> **I would appreciate correction/ clarification if my claims are wrong or
> misleading! **
>
> In any case, please do check such claims before making them on this list.
>
> Cheers,
> Bert
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Jun 2023 - Problem with filling dataframe's column

[R] Problem with filling dataframe's column

[R] Problem with filling dataframe's column