thr3ads.net - R devel - [Rd] transform [Aug 2024]

If this information is useful, please help other people find it:
Share via:

peter dalgaard

2024-Aug-27 09:55 UTC

[Rd] transform

Yes. A quirk, rather than a bug I'd say. One issue is that the internal
logic of transform() relies on

    e <- eval(substitute(list(...)), `_data`, parent.frame())
    tags <- names(e)

so untagged entries in ... will not be included. The other part is a direct
consequence of a quirk in data.frame:
> data.frame(head(airquality), y=data.frame(x=rnorm(6)))  Ozone Solar.R Wind Temp Month Day          x
1    41     190  7.4   67     5   1  0.3075402
2    36     118  8.0   72     5   2  0.7765265
3    12     149 12.6   74     5   3  0.3909341
4    18     313 11.5   62     5   4  0.4733170
5    NA      NA 14.3   56     5   5 -0.6947709
6    28      NA 14.9   66     5   6  0.1126040

whereas (the wisdom of this escapes me)
> data.frame(head(airquality), y=data.frame(x=rnorm(6),z=rnorm(6)))  Ozone Solar.R Wind Temp Month Day        y.x         y.z
1    41     190  7.4   67     5   1 -0.9250228  0.46483406
2    36     118  8.0   72     5   2 -0.5035793  0.28822668
...

On the whole, I think that transform was never designed (nor documented) to take
data frame arguments, so caveat emptor.

- Peter

> On 24 Aug 2024, at 16:41 , Gabor Grothendieck <ggrothendieck at
gmail.com> wrote:
> 
> One oddity in transform that I recently noticed.  It seems that to include
> a one-column data frame in the arguments one must name it even though the
> name is ignored.  If the data frame has more than one column then it must
> also be named but in that case it is not ignored and the names are made up
of
> a combination of that name and the data frame's names.  I would have
thought
> that if we did not want a combination of names we would just not name the
> argument.
> 
>  # ignores second argument returning BOD unchanged
>  transform(BOD, data.frame(y = 1:6)) |> names()
>  ## [1] "Time"   "demand"
> 
>  # ignores second argument returning BOD unchanged
>  transform(BOD, data.frame(y = 1:6, z = 6:1)) |> names()
>  ## [1] "Time"   "demand"
> 
>  # with one column in data frame it adds the column and names it y ignoring
x
>  transform(BOD, x = data.frame(y = 1:6)) |> names()
>  ## [1] "Time"   "demand" "y"
> 
>  # with multiple columns in data frame it uses x.y and x.z as names
>  transform(BOD, data.frame(y = 1:6, z = 6:1)) |> names()
>  ## [1] "Time"   "demand" "x.y"   
"x.z"
> 
> 
> -- 
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Gabor Grothendieck

2024-Aug-27 12:45 UTC

head link

[Rd] transform

It could be enhanced to handle data frame argos.  Unnamed args are
currently just ignored so adding such would be backwards compatible.
Any interest in this?

On Tue, Aug 27, 2024 at 5:55?AM peter dalgaard <pdalgd at gmail.com>
wrote:>
> Yes. A quirk, rather than a bug I'd say. One issue is that the internal
logic of transform() relies on
>
>     e <- eval(substitute(list(...)), `_data`, parent.frame())
>     tags <- names(e)
>
> so untagged entries in ... will not be included. The other part is a direct
consequence of a quirk in data.frame:
>
> > data.frame(head(airquality), y=data.frame(x=rnorm(6)))
>   Ozone Solar.R Wind Temp Month Day          x
> 1    41     190  7.4   67     5   1  0.3075402
> 2    36     118  8.0   72     5   2  0.7765265
> 3    12     149 12.6   74     5   3  0.3909341
> 4    18     313 11.5   62     5   4  0.4733170
> 5    NA      NA 14.3   56     5   5 -0.6947709
> 6    28      NA 14.9   66     5   6  0.1126040
>
> whereas (the wisdom of this escapes me)
>
> > data.frame(head(airquality), y=data.frame(x=rnorm(6),z=rnorm(6)))
>   Ozone Solar.R Wind Temp Month Day        y.x         y.z
> 1    41     190  7.4   67     5   1 -0.9250228  0.46483406
> 2    36     118  8.0   72     5   2 -0.5035793  0.28822668
> ...
>
> On the whole, I think that transform was never designed (nor documented) to
take data frame arguments, so caveat emptor.
>
> - Peter
>
>
> > On 24 Aug 2024, at 16:41 , Gabor Grothendieck <ggrothendieck at
gmail.com> wrote:
> >
> > One oddity in transform that I recently noticed.  It seems that to
include
> > a one-column data frame in the arguments one must name it even though
the
> > name is ignored.  If the data frame has more than one column then it
must
> > also be named but in that case it is not ignored and the names are
made up of
> > a combination of that name and the data frame's names.  I would
have thought
> > that if we did not want a combination of names we would just not name
the
> > argument.
> >
> >  # ignores second argument returning BOD unchanged
> >  transform(BOD, data.frame(y = 1:6)) |> names()
> >  ## [1] "Time"   "demand"
> >
> >  # ignores second argument returning BOD unchanged
> >  transform(BOD, data.frame(y = 1:6, z = 6:1)) |> names()
> >  ## [1] "Time"   "demand"
> >
> >  # with one column in data frame it adds the column and names it y
ignoring x
> >  transform(BOD, x = data.frame(y = 1:6)) |> names()
> >  ## [1] "Time"   "demand" "y"
> >
> >  # with multiple columns in data frame it uses x.y and x.z as names
> >  transform(BOD, data.frame(y = 1:6, z = 6:1)) |> names()
> >  ## [1] "Time"   "demand" "x.y"   
"x.z"
> >
> >
> > --
> > Statistics & Software Consulting
> > GKX Group, GKX Associates Inc.
> > tel: 1-877-GKX-GROUP
> > email: ggrothendieck at gmail.com
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Sebastian Meyer

2024-Aug-27 13:16 UTC

head link

[Rd] transform

Am 27.08.24 um 11:55 schrieb peter dalgaard:> Yes. A quirk, rather than a bug I'd say. One issue is that the internal
logic of transform() relies on
> 
>      e <- eval(substitute(list(...)), `_data`, parent.frame())
>      tags <- names(e)
> 
> so untagged entries in ... will not be included.
... unless at least one is tagged:

R> transform(BOD, 0:5, 1:6)
   Time demand
1    1    8.3
2    2   10.3
3    3   19.0
4    4   16.0
5    5   15.6
6    7   19.8

R> transform(BOD, 0:5, 1:6, foo = 1)
   Time demand 0:5 1:6 foo
1    1    8.3   0   1   1
2    2   10.3   1   2   1
3    3   19.0   2   3   1
4    4   16.0   3   4   1
5    5   15.6   4   5   1
6    7   19.8   5   6   1

But as transform.data.frame is only documented for tagged vector 
expressions, all examples provided in this thread were formal misuses.
(It might make sense to warn about untagged entries.)

Personally, I'd be quite confused about what to expect from syntax like

     transform(BOD, data.frame(y = 1:6))

as really no transformation is specified. Looks like cbind() or 
data.frame() was meant.

	Sebastian

> The other part is a direct consequence of a quirk in data.frame:
> 
>> data.frame(head(airquality), y=data.frame(x=rnorm(6)))
>    Ozone Solar.R Wind Temp Month Day          x
> 1    41     190  7.4   67     5   1  0.3075402
> 2    36     118  8.0   72     5   2  0.7765265
> 3    12     149 12.6   74     5   3  0.3909341
> 4    18     313 11.5   62     5   4  0.4733170
> 5    NA      NA 14.3   56     5   5 -0.6947709
> 6    28      NA 14.9   66     5   6  0.1126040
> 
> whereas (the wisdom of this escapes me)
> 
>> data.frame(head(airquality), y=data.frame(x=rnorm(6),z=rnorm(6)))
>    Ozone Solar.R Wind Temp Month Day        y.x         y.z
> 1    41     190  7.4   67     5   1 -0.9250228  0.46483406
> 2    36     118  8.0   72     5   2 -0.5035793  0.28822668
> ...
> 
> On the whole, I think that transform was never designed (nor documented) to
take data frame arguments, so caveat emptor.
> 
> - Peter
> 
> 
>> On 24 Aug 2024, at 16:41 , Gabor Grothendieck <ggrothendieck at
gmail.com> wrote:
>>
>> One oddity in transform that I recently noticed.  It seems that to
include
>> a one-column data frame in the arguments one must name it even though
the
>> name is ignored.  If the data frame has more than one column then it
must
>> also be named but in that case it is not ignored and the names are made
up of
>> a combination of that name and the data frame's names.  I would
have thought
>> that if we did not want a combination of names we would just not name
the
>> argument.
>>
>>   # ignores second argument returning BOD unchanged
>>   transform(BOD, data.frame(y = 1:6)) |> names()
>>   ## [1] "Time"   "demand"
>>
>>   # ignores second argument returning BOD unchanged
>>   transform(BOD, data.frame(y = 1:6, z = 6:1)) |> names()
>>   ## [1] "Time"   "demand"
>>
>>   # with one column in data frame it adds the column and names it y
ignoring x
>>   transform(BOD, x = data.frame(y = 1:6)) |> names()
>>   ## [1] "Time"   "demand" "y"
>>
>>   # with multiple columns in data frame it uses x.y and x.z as names
>>   transform(BOD, data.frame(y = 1:6, z = 6:1)) |> names()
>>   ## [1] "Time"   "demand" "x.y"   
"x.z"
>>
>>
>> -- 
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Seemingly Similar Threads

Search for more possibly parallel threads

R devel - Aug 2024 - transform

[Rd] transform

[Rd] transform

[Rd] transform

Seemingly Similar Threads