thr3ads.net - R help - [R] Bug in print for data frames? [Oct 2023]

If this information is useful, please help other people find it:
Share via:

Rui Barradas

2023-Oct-26 10:42 UTC

[R] Bug in print for data frames?

?s 07:18 de 25/10/2023, Christian Asseburg escreveu:> Hi! I came across this unexpected behaviour in R. First I thought it was a
bug in the assignment operator <- but now I think it's maybe a bug in the
way data frames are being printed. What do you think?
> 
> Using R 4.3.1:
> 
>> x <- data.frame(A = 1, B = 2, C = 3)
>> y <- data.frame(A = 1)
>> x
>    A B C
> 1 1 2 3
>> x$B <- y$A # works as expected
>> x
>    A B C
> 1 1 1 3
>> x$C <- y[1] # makes C disappear
>> x
>    A B A
> 1 1 1 1
>> str(x)
> 'data.frame':   1 obs. of  3 variables:
>   $ A: num 1
>   $ B: num 1
>   $ C:'data.frame':      1 obs. of  1 variable:
>    ..$ A: num 1
> 
> Why does the print(x) not show "C" as the name of the third
element? I did mess up the data frame (and this was a mistake on my part), but
finding the bug was harder because print(x) didn't show the C any longer.
> 
> Thanks. With best wishes -
> 
> . . . Christian
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.Hello,

To expand on the good answers already given, I will present two other 
example data sets.

Example 1. Imagine that instead of assigning just one column from y to 
x$C you assign two columns. The result is a data.frame column. See what 
is displayed as the columns names.
And unlike what happens with `[`, when asssigning columns 1:2, the 
operator `[[` doesn't work. You will have to extract the columns y$A and 
y$B one by one.



x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1, B = 4)
str(y)
#> 'data.frame':    1 obs. of  2 variables:
#>  $ A: num 1
#>  $ B: num 4

x$C <- y[1:2]
x
#>   A B C.A C.B
#> 1 1 2   1   4

str(x)
#> 'data.frame':    1 obs. of  3 variables:
#>  $ A: num 1
#>  $ B: num 2
#>  $ C:'data.frame':   1 obs. of  2 variables:
#>   ..$ A: num 1
#>   ..$ B: num 4

x[[1:2]]  # doesn't work
#> Error in .subset2(x, i, exact = exact): subscript out of bounds



Example 2. Sometimes it is usefull to get a result like this first and 
then correct the resulting df. For instance, when computing more than 
one summary statistics.

str(agg)  below shows that the result summary stats is a matrix, so you 
have a column-matrix. And once again the displayed names reflect that.

The trick to make the result a df is to extract all but the last column 
as a sub-df, extract the last column's values as a matrix (which it is) 
and then cbind the two together.

cbind is a generic function. Since the first argument to cbind is a 
sub-df, the method called is cbind.data.frame and the result is a df.



df1 <- data.frame(A = rep(c("a", "b", "c"),
5L), X = 1:30)

# the anonymous function computes more than one summary statistics
# note that it returns a named vector
agg <- aggregate(X ~ A, df1, \(x) c(Mean = mean(x), S = sd(x)))
agg
#>   A    X.Mean       X.S
#> 1 a 14.500000  9.082951
#> 2 b 15.500000  9.082951
#> 3 c 16.500000  9.082951

# similar effect as in the OP, The difference is that the last
# column is a matrix, not a data.frame
str(agg)
#> 'data.frame':    3 obs. of  2 variables:
#>  $ A: chr  "a" "b" "c"
#>  $ X: num [1:3, 1:2] 14.5 15.5 16.5 9.08 9.08 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "Mean" "S"

# nc is just a convenience, avoids repeated calls to ncol
nc <- ncol(agg)
cbind(agg[-nc], agg[[nc]])
#>   A Mean        S
#> 1 a 14.5 9.082951
#> 2 b 15.5 9.082951
#> 3 c 16.5 9.082951

# all is well
cbind(agg[-nc], agg[[nc]]) |> str()
#> 'data.frame':    3 obs. of  3 variables:
#>  $ A   : chr  "a" "b" "c"
#>  $ Mean: num  14.5 15.5 16.5
#>  $ S   : num  9.08 9.08 9.08



If the anonymous function hadn't returned a named vetor, the new column 
names would have been "1". "2", try it.


Hope this helps,

Rui Barradas



-- 
Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a
de v?rus.
www.avg.com

Christian Asseburg

2023-Oct-26 12:03 UTC

head link

[R] Bug in print for data frames?

Dear R users! Thank you for your excellent replies. I didn't know that the
print.data.frame expands matrix-like values in this way. Why doesn't it call
the column in my example C.A? I understand that something like that happens when
the data.frame in position three has multiple columns. But your answers have
helped me understand this better.

Ebert,Timothy Aaron

2023-Oct-26 12:32 UTC

head link

[R] Bug in print for data frames?

The "problem" goes away if you use

x$C <- y[1,]

If you have another row in your x, say:
x <- data.frame(A=c(1,4), B=c(2,5), C=c(3,6))

then your code
x$C <- y[1]
returns an error.

If y has the same number of rows as x$C then R has the same outcome as in your
example.

It looks like your code tells R to replace all of column C (including the name)
with all of vector y.

Maybe unexpected, but not a bug. It is consistent.


-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Rui Barradas
Sent: Thursday, October 26, 2023 6:43 AM
To: Christian Asseburg <rhelp at moin.fi>; r-help at r-project.org
Subject: Re: [R] Bug in print for data frames?

[External Email]

?s 07:18 de 25/10/2023, Christian Asseburg escreveu:> Hi! I came across this unexpected behaviour in R. First I thought it was a
bug in the assignment operator <- but now I think it's maybe a bug in the
way data frames are being printed. What do you think?
>
> Using R 4.3.1:
>
>> x <- data.frame(A = 1, B = 2, C = 3)
>> y <- data.frame(A = 1)
>> x
>    A B C
> 1 1 2 3
>> x$B <- y$A # works as expected
>> x
>    A B C
> 1 1 1 3
>> x$C <- y[1] # makes C disappear
>> x
>    A B A
> 1 1 1 1
>> str(x)
> 'data.frame':   1 obs. of  3 variables:
>   $ A: num 1
>   $ B: num 1
>   $ C:'data.frame':      1 obs. of  1 variable:
>    ..$ A: num 1
>
> Why does the print(x) not show "C" as the name of the third
element? I did mess up the data frame (and this was a mistake on my part), but
finding the bug was harder because print(x) didn't show the C any longer.
>
> Thanks. With best wishes -
>
> . . . Christian
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat/
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu
> %7C237aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84
> %7C0%7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sda
> ta=fgR6iFifXQpRCv0WqIu4S%2Bnctg%2F0v6j7AXftxrfQGPk%3D&reserved=0
> PLEASE do read the posting guide
> http://www.r/
> -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C23
> 7aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84%7C0%
> 7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FN
> CYM6%2FbpqThk76Zug%2Bm5x8o1Y2S1Z1S0ajAzPePIms%3D&reserved=0
> and provide commented, minimal, self-contained, reproducible code.Hello,

To expand on the good answers already given, I will present two other example
data sets.

Example 1. Imagine that instead of assigning just one column from y to x$C you
assign two columns. The result is a data.frame column. See what is displayed as
the columns names.
And unlike what happens with `[`, when asssigning columns 1:2, the operator `[[`
doesn't work. You will have to extract the columns y$A and y$B one by one.



x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1, B = 4)
str(y)
#> 'data.frame':    1 obs. of  2 variables:
#>  $ A: num 1
#>  $ B: num 4

x$C <- y[1:2]
x
#>   A B C.A C.B
#> 1 1 2   1   4

str(x)
#> 'data.frame':    1 obs. of  3 variables:
#>  $ A: num 1
#>  $ B: num 2
#>  $ C:'data.frame':   1 obs. of  2 variables:
#>   ..$ A: num 1
#>   ..$ B: num 4

x[[1:2]]  # doesn't work
#> Error in .subset2(x, i, exact = exact): subscript out of bounds



Example 2. Sometimes it is usefull to get a result like this first and then
correct the resulting df. For instance, when computing more than one summary
statistics.

str(agg)  below shows that the result summary stats is a matrix, so you have a
column-matrix. And once again the displayed names reflect that.

The trick to make the result a df is to extract all but the last column as a
sub-df, extract the last column's values as a matrix (which it is) and then
cbind the two together.

cbind is a generic function. Since the first argument to cbind is a sub-df, the
method called is cbind.data.frame and the result is a df.



df1 <- data.frame(A = rep(c("a", "b", "c"),
5L), X = 1:30)

# the anonymous function computes more than one summary statistics # note that
it returns a named vector agg <- aggregate(X ~ A, df1, \(x) c(Mean = mean(x),
S = sd(x))) agg
#>   A    X.Mean       X.S
#> 1 a 14.500000  9.082951
#> 2 b 15.500000  9.082951
#> 3 c 16.500000  9.082951

# similar effect as in the OP, The difference is that the last # column is a
matrix, not a data.frame
str(agg)
#> 'data.frame':    3 obs. of  2 variables:
#>  $ A: chr  "a" "b" "c"
#>  $ X: num [1:3, 1:2] 14.5 15.5 16.5 9.08 9.08 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "Mean" "S"

# nc is just a convenience, avoids repeated calls to ncol nc <- ncol(agg)
cbind(agg[-nc], agg[[nc]])
#>   A Mean        S
#> 1 a 14.5 9.082951
#> 2 b 15.5 9.082951
#> 3 c 16.5 9.082951

# all is well
cbind(agg[-nc], agg[[nc]]) |> str()
#> 'data.frame':    3 obs. of  3 variables:
#>  $ A   : chr  "a" "b" "c"
#>  $ Mean: num  14.5 15.5 16.5
#>  $ S   : num  9.08 9.08 9.08



If the anonymous function hadn't returned a named vetor, the new column
names would have been "1". "2", try it.


Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a
de v?rus.
http://www.avg.com/

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

R help - Oct 2023 - Bug in print for data frames?

[R] Bug in print for data frames?

[R] Bug in print for data frames?

[R] Bug in print for data frames?