thr3ads.net - search: "open

2024 Dec 11

1

Cores hang when calling mcapply

...ient and faster for large-scale operations in R. An alternate way would be data.table?s `dcast()` can handle large data more efficiently, and data.table?s in-memory operations often reduce overhead compared to tidyverse pivoting functions. Also - consider using data.table?s `fread()` or `arrow::open_dataset()` directly with `as.data.table()` to keep everything in a data.table format. For example, you can do a large `dcast()` operation to create presence/absence columns by group. If your categories are extremely large, consider an approach that processes categories in segments as I mentioned earlier -...

Cores hang when calling mcapply

2024 Dec 11

1

Cores hang when calling mcapply

...cient and faster for large-scale operations in R. An alternate way would be data.table?s `dcast()` can handle large data more efficiently, and data.table?s in-memory operations often reduce overhead compared to tidyverse pivoting functions. Also - consider using data.table?s `fread()` or `arrow::open_dataset()` directly with `as.data.table()` to keep everything in a data.table format. For example, you can do a large `dcast()` operation to create presence/absence columns by group. If your categories are extremely large, consider an approach that processes categories in segments as I mentioned earlier -...

Cores hang when calling mcapply

2024 Dec 11

1

Cores hang when calling mcapply

...cale operations in R. > > > An alternate way would be data.table?s `dcast()` can handle large data more efficiently, and data.table?s in-memory operations often reduce overhead compared to tidyverse pivoting functions. > > Also - consider using data.table?s `fread()` or `arrow::open_dataset()` directly with `as.data.table()` to keep everything in a data.table format. For example, you can do a large `dcast()` operation to create presence/absence columns by group. If your categories are extremely large, consider an approach that processes categories in segments as I mentioned earlier -?...

Cores hang when calling mcapply

2024 Dec 11

1

Cores hang when calling mcapply

...; > > > > > An alternate way would be data.table?s `dcast()` can handle large data more efficiently, and data.table?s in-memory operations often reduce overhead compared to tidyverse pivoting functions. > > > > Also - consider using data.table?s `fread()` or `arrow::open_dataset()` directly with `as.data.table()` to keep everything in a data.table format. For example, you can do a large `dcast()` operation to create presence/absence columns by group. If your categories are extremely large, consider an approach that processes categories in segments as I mentioned earlier -?...

Cores hang when calling mcapply

2024 Dec 12

1

Cores hang when calling mcapply

Hi Gregg. Just wanted to follow up on the solution you proposed. I had to make some adjustments to get exactly what I wanted, but it works, and takes about 15 minutes on our server configuration: temp <- ??????open_dataset( ????????????sources = input_files, ????????????format = 'csv', ????????????unify_schema = TRUE, ????????????col_types = schema( ????????????"ID_Key" = string(), ????????????"column1" = string(), ????????????"column2" = string() ????????????) ??????) |> as_t...

Cores hang when calling mcapply

2024 Dec 11

2

Cores hang when calling mcapply

...df <- df |> group_by(ID_Key) |> summarise(across(c(starts_with("column1_name_"),starts_with("column2_name_"),), ~ sum(.x, na.rm = TRUE))) |> ungroup() return(df) } and splitting up the data into a list of 110k individual dataframes based on Key_ID temp <- open_dataset( sources = input_files, format = 'csv', unify_schema = TRUE, col_types = schema( "ID_Key" = string(), "column1" = string(), "column1" = string() ) ) |> as_tibble() keeptabs <- split(temp, temp$ID_Key) I used a...

Cores hang when calling mcapply

2024 Dec 11

1

Cores hang when calling mcapply

...cient and faster for large-scale operations in R. An alternate way would be data.table?s `dcast()` can handle large data more efficiently, and data.table?s in-memory operations often reduce overhead compared to tidyverse pivoting functions. Also - consider using data.table?s `fread()` or `arrow::open_dataset()` directly with `as.data.table()` to keep everything in a data.table format. For example, you can do a large `dcast()` operation to create presence/absence columns by group. If your categories are extremely large, consider an approach that processes categories in segments as I mentioned earlier -...

Cores hang when calling mcapply

2024 Dec 12

1

Cores hang when calling mcapply

...<tderamus at mgb.org> wrote: > Hi Gregg. > > Just wanted to follow up on the solution you proposed. > > I had to make some adjustments to get exactly what I wanted, but it works, and takes about 15 minutes on our server configuration: > > ? ? temp <- > ??????open_dataset( > ????????????sources = input_files, > ????????????format = 'csv', > ????????????unify_schema = TRUE, > ????????????col_types = schema( > ????????????"ID_Key" = string(), > ????????????"column1" = string(), > ????????????"column2" = string...

search for: open_dataset