Deramus, Thomas Patrick
2024-Apr-18 16:31 UTC
[R] Tidyverse/dplyr solution for filling values of a tibble/dataframe from a column with a nested list.
Hi experts.
I have a tibble? with a column containing a nested list
(<list<list<double>>>? data type to be specific).
Looks something like the following (but in R/Arrow? format):
ID
Nestedvals
001
[[1]](1,0.1)[[2]](2,0.2)[[3]](3,0.3)[[4]](4,0.4)[[5]](5,0.5)
002
[[1]](1,0.1)[[2]](2,0.2)[[3]](3,0.3)[[4]](4,0.4)
003
[[1]](1,0.1)[[2]](2,0.2)[[3]](3,0.3)
004
[[1]](1,0.1)[[2]](2,0.2)
005
[[1]](1,0.1)
Basically, each list contains a set of doubles, with the first indicating a
specific index (based on the 0 beginning python? index), and a certain value
(e.g. 0.5).
What I would like to do is generate set of columns based on the rang of unique
indexes of each nested list. e.g.:
col_1, col_2, col_3, col_4, col_5?
Which I have done with the following:
tibble[paste0("col_", 1:5)] <- 0
?And then replace each 0 with the value (second number in the nested list),
based on the index (first number in each nested list), for each row of the
tibble.
I wrote a function to split each nested list:
nestsplit <- function(x, y) {?
`unlist(lapply(x, [[`, y))?
}?
And then generate unique columns with the column names (by index) and values of
interest to append to the tibble?:
tibble <-?
tibble |> rowwise() |> mutate(index_names = list(paste0(?
"col_", as.character(nestsplit(nestedvals, 1))?
)),?
index_values = list(nestsplit(nestedvals, 2)))?
?
But I would like to see if there is an efficient, tidyverse/dplyr?-based
solution to individually assign these values rather than writing a loop to
assign each of them by row.
So that an output like this:
ID
Nestedvals
col_1
col_2
col_3
col_4
col_5
001
<Nested list of 5 pairs of values>
0
0
0
0
0
002
<Nested list of 4 pairs of values>
0
0
0
0
0
003
<Nested list of 3 pairs of values>
0
0
0
0
0
004
<Nested list of 2 pairs of values>
0
0
0
0
0
005
<Nested list of 1 pair of values>
0
0
0
0
0
Looks instead like the following:
ID
Nestedvals
col_1
col_2
col_3
col_4
col_5
001
<Nested list of 5 pairs of values>
0.1
0.2
0.3
0.4
0.5
002
<Nested list of 4 pairs of values>
0.1
0.2
0.3
0.4
0
003
<Nested list of 3 pairs of values>
0.1
0.2
0.3
0
0
004
<Nested list of 2 pairs of values>
0.1
0.2
0
0
0
005
<Nested list of 1 pair of values>
0.1
0
0
0
0
-------------------------------------------------------------------------------------------------------------------------
I would love to give an example to simulate the exact nature of the data, but
I'm unfortunately not sure how to recreate this class for an
example:> typeof(tibble$var)?
[1] "list"?> class(tibble$var)?
[1] "arrow_list" "vctrs_list_of" "vctrs_vctr"
"list"?
The closest I have ever been able to get is with:
tibble(ID = c("001", "002", "003",
"004", "005"), nestedvals =
list(list(c(1,0.1),c(2,0.2),c(3,0.3),c(4,0.4),c(5,0.5)),list(c(1,0.1),c(2,0.2),c(3,0.3),c(4,0.4)),list(c(1,0.1),c(2,0.2),c(3,0.3)),list(c(1,0.1),c(2,0.2)),list(c(1,0.1))))
Which gives a list? datatype instead of <list<list<double>>>
The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Mass General Brigham Compliance
HelpLine at https://www.massgeneralbrigham.org/complianceline
<https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted). If you do not wish to
continue communication over unencrypted e-mail, please notify the sender of this
message immediately. Continuing to send or respond to e-mail after receiving
this message means you understand and accept this risk and wish to continue to
communicate over unencrypted e-mail.
[[alternative HTML version deleted]]
Ivan Krylov
2024-Apr-19 07:16 UTC
[R] Tidyverse/dplyr solution for filling values of a tibble/dataframe from a column with a nested list.
? Thu, 18 Apr 2024 16:31:46 +0000 "Deramus, Thomas Patrick" <tderamus at mgb.org> ?????:> Basically, each list contains a set of doubles, with the first > indicating a specific index (based on the 0 beginning python? index), > and a certain value (e.g. 0.5). > > What I would like to do is generate set of columns based on the rang > of unique indexes of each nested list. e.g.: col_1, col_2, col_3, > col_4, col_5?It's possible to golf it down to something like the following: newcol <- t(sapply(tibble$nestedvals, \(x) { x <- simplify2array(x) ret <- numeric(5) ret[x[1,]] <- x[2,] ret })) ...which you can then rename and cbind() to your tibble. But the problem remains that the desired data structure has to be generated row by row and then transformed back into a column-oriented data structure. Do you need a sparse matrix? spec <- do.call(cbind, Map( \(row, cols) rbind(row, simplify2array(cols)), seq_along(tibble$nestedvals), tibble$nestedvals )) sparse <- Matrix::sparseMatrix(spec[1,], spec[2,], x = spec[3,]) -- Best regards, Ivan