Deramus, Thomas Patrick
2024-Apr-18 16:31 UTC
[R] Tidyverse/dplyr solution for filling values of a tibble/dataframe from a column with a nested list.
Hi experts. I have a tibble? with a column containing a nested list (<list<list<double>>>? data type to be specific). Looks something like the following (but in R/Arrow? format): ID Nestedvals 001 [[1]](1,0.1)[[2]](2,0.2)[[3]](3,0.3)[[4]](4,0.4)[[5]](5,0.5) 002 [[1]](1,0.1)[[2]](2,0.2)[[3]](3,0.3)[[4]](4,0.4) 003 [[1]](1,0.1)[[2]](2,0.2)[[3]](3,0.3) 004 [[1]](1,0.1)[[2]](2,0.2) 005 [[1]](1,0.1) Basically, each list contains a set of doubles, with the first indicating a specific index (based on the 0 beginning python? index), and a certain value (e.g. 0.5). What I would like to do is generate set of columns based on the rang of unique indexes of each nested list. e.g.: col_1, col_2, col_3, col_4, col_5? Which I have done with the following: tibble[paste0("col_", 1:5)] <- 0 ?And then replace each 0 with the value (second number in the nested list), based on the index (first number in each nested list), for each row of the tibble. I wrote a function to split each nested list: nestsplit <- function(x, y) {? `unlist(lapply(x, [[`, y))? }? And then generate unique columns with the column names (by index) and values of interest to append to the tibble?: tibble <-? tibble |> rowwise() |> mutate(index_names = list(paste0(? "col_", as.character(nestsplit(nestedvals, 1))? )),? index_values = list(nestsplit(nestedvals, 2)))? ? But I would like to see if there is an efficient, tidyverse/dplyr?-based solution to individually assign these values rather than writing a loop to assign each of them by row. So that an output like this: ID Nestedvals col_1 col_2 col_3 col_4 col_5 001 <Nested list of 5 pairs of values> 0 0 0 0 0 002 <Nested list of 4 pairs of values> 0 0 0 0 0 003 <Nested list of 3 pairs of values> 0 0 0 0 0 004 <Nested list of 2 pairs of values> 0 0 0 0 0 005 <Nested list of 1 pair of values> 0 0 0 0 0 Looks instead like the following: ID Nestedvals col_1 col_2 col_3 col_4 col_5 001 <Nested list of 5 pairs of values> 0.1 0.2 0.3 0.4 0.5 002 <Nested list of 4 pairs of values> 0.1 0.2 0.3 0.4 0 003 <Nested list of 3 pairs of values> 0.1 0.2 0.3 0 0 004 <Nested list of 2 pairs of values> 0.1 0.2 0 0 0 005 <Nested list of 1 pair of values> 0.1 0 0 0 0 ------------------------------------------------------------------------------------------------------------------------- I would love to give an example to simulate the exact nature of the data, but I'm unfortunately not sure how to recreate this class for an example:> typeof(tibble$var)?[1] "list"?> class(tibble$var)?[1] "arrow_list" "vctrs_list_of" "vctrs_vctr" "list"? The closest I have ever been able to get is with: tibble(ID = c("001", "002", "003", "004", "005"), nestedvals = list(list(c(1,0.1),c(2,0.2),c(3,0.3),c(4,0.4),c(5,0.5)),list(c(1,0.1),c(2,0.2),c(3,0.3),c(4,0.4)),list(c(1,0.1),c(2,0.2),c(3,0.3)),list(c(1,0.1),c(2,0.2)),list(c(1,0.1)))) Which gives a list? datatype instead of <list<list<double>>> The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> . Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail. [[alternative HTML version deleted]]
Ivan Krylov
2024-Apr-19 07:16 UTC
[R] Tidyverse/dplyr solution for filling values of a tibble/dataframe from a column with a nested list.
? Thu, 18 Apr 2024 16:31:46 +0000 "Deramus, Thomas Patrick" <tderamus at mgb.org> ?????:> Basically, each list contains a set of doubles, with the first > indicating a specific index (based on the 0 beginning python? index), > and a certain value (e.g. 0.5). > > What I would like to do is generate set of columns based on the rang > of unique indexes of each nested list. e.g.: col_1, col_2, col_3, > col_4, col_5?It's possible to golf it down to something like the following: newcol <- t(sapply(tibble$nestedvals, \(x) { x <- simplify2array(x) ret <- numeric(5) ret[x[1,]] <- x[2,] ret })) ...which you can then rename and cbind() to your tibble. But the problem remains that the desired data structure has to be generated row by row and then transformed back into a column-oriented data structure. Do you need a sparse matrix? spec <- do.call(cbind, Map( \(row, cols) rbind(row, simplify2array(cols)), seq_along(tibble$nestedvals), tibble$nestedvals )) sparse <- Matrix::sparseMatrix(spec[1,], spec[2,], x = spec[3,]) -- Best regards, Ivan