thr3ads.net - search: "sdcols"

lattice xyplot with cumsum() function inside

2024 Sep 28

1

lattice xyplot with cumsum() function inside

...ate("2024-01-01"), by = 1, length.out = 50), xgroup = "A", x = runif(50, 0, 1)) mydt <- rbindlist(list(mydt, data.table(date = mydt$date, xgroup = "B", x = runif(50, 0, 3)))) mydt[, `:=`(xcumsum = cumsum(x)), by = .(xgroup)] mydt[, lapply(.SD, sum), by = .(xgroup), .SDcols = c("x")] # xgroup x # <char> <num> #1: A 26.00455 #2: B 71.55405 #For xgroup = "B", line starts at the sum of all previous x values including xgroup = "A" #Intended result is to separate cumsum(x) for groups "A" and &quot...

deleting invariant rows and cols in a matrix

2002 May 11

1

deleting invariant rows and cols in a matrix

Greetings, I couldn't find any existing function that would allow me to scan a matrix and eliminate invariant rows and columns so I have started to write a simple routine from scratch. The following code fails because the array index goes out of bounds for obvious reasons you'll see shortly. Start with some data x <- read.table("myex.dat",header=T) x v1 v2 v3 v4 v5 id 1

How to use `[` without evaluating the arguments.

2020 Sep 24

1

How to use `[` without evaluating the arguments.

...which(colnames(colData) %in% colIDs) lockBinding('colIDs', internals) # Assemble the pseudo row and column names for the LongTable .pasteColons <- function(...) paste(..., collapse=':') rowData[, `:=`(.rownames=mapply(.pasteColons, transpose(.SD))), .SDcols=internals$rowIDs] colData[, `:=`(.colnames=mapply(.pasteColons, transpose(.SD))), .SDcols=internals$colIDs] return(.LongTable(rowData=rowData, colData=colData, assays=assays, metadata=metadata, .intern=internals)) } I have also defined a subset...

loop in a data.table

2013 Mar 13

3

loop in a data.table

Hi everyone, I have a data.table called "data" with many columns which I want to group by column1 using data.table, given how fast it is. The problem with looping a data.table is that data.table does not like quotations to define the column names (e.g. "col2" instead of col2). I found a way around which is to use get("col2"), which works fine but the

Cores hang when calling mcapply

2024 Dec 11

1

Cores hang when calling mcapply

...ID_Key (since they share same columns now) > final_dt <- rbindlist(list(out1, out2), use.names = TRUE, fill = TRUE) > > # Step E: If needed, summarize across ID_Key to sum presence > indicators > final_result <- final_dt[, lapply(.SD, sum, na.rm = TRUE), by = > ID_Key, .SDcols = setdiff(names(final_dt), "ID_Key")] > > # note that final_result should now contain summed presence/absence > (0/1) indicators. Hope this helps! gregg somewhereinArizona -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type...

Cores hang when calling mcapply

2024 Dec 12

1

Cores hang when calling mcapply

...out1[, (col) := 0] for (col in out2_missing) out2[, (col) := 0] setcolorder(out1, all_cols) setcolorder(out2, all_cols) final_dt <- rbindlist(list(out1, out2), use.names = TRUE, fill = TRUE) final_result <- as_tibble(final_dt[, lapply(.SD, sum, na.rm = TRUE), by = ID_Key, .SDcols = setdiff(names(final_dt), "ID_Key")]) Worth noting however: * I unfortunately had to keep the multicore parameters for the janitor package to use make_clean_names() because it just took to long to run it on the full dataframe, but deploying data.table CONSIDERABLY reduces the time...

Cores hang when calling mcapply

2024 Dec 11

1

Cores hang when calling mcapply

...by ID_Key (since they share same columns now) > final_dt <- rbindlist(list(out1, out2), use.names = TRUE, fill = TRUE) > > # Step E: If needed, summarize across ID_Key to sum presence > indicators > final_result <- final_dt[, lapply(.SD, sum, na.rm = TRUE), by = > ID_Key, .SDcols = setdiff(names(final_dt), "ID_Key")] > > # note that final_result should now contain summed presence/absence > (0/1) indicators. Hope this helps! gregg somewhereinArizona The information in this e-mail is intended only for the ...{{dropped:14}}

Cores hang when calling mcapply

2024 Dec 11

1

Cores hang when calling mcapply

...ist(list(out1, out2), use.names = TRUE, fill = TRUE) > > > > > > # Step E: If needed, summarize across ID_Key to sum presence > > > > > indicators > > > final_result <- final_dt[, lapply(.SD, sum, na.rm = TRUE), by = > > > > > ID_Key, .SDcols = setdiff(names(final_dt), "ID_Key")] > > > > > > # note that final_result should now contain summed presence/absence > > > > > (0/1) indicators. > > > > > > > > > > Hope this helps! > > gregg > > somewhe...

Cores hang when calling mcapply

2024 Dec 12

1

Cores hang when calling mcapply

...? for (col in out2_missing) out2[, (col) := 0] > ? ? setcolorder(out1, all_cols) > ? ? setcolorder(out2, all_cols) > ? ? final_dt <- rbindlist(list(out1, out2), use.names = TRUE, fill = TRUE) > ? ? final_result <- as_tibble(final_dt[, lapply(.SD, sum, na.rm = TRUE), by = ID_Key, .SDcols = setdiff(names(final_dt), "ID_Key")]) > > > Worth noting however: > > > - I unfortunately had to keep the `multicore`?parameters for the `janitor`?package to use `make_clean_names()`?`because it just took to long? to run it on the full dataframe, but` deployin...

Cores hang when calling mcapply

2024 Dec 11

1

Cores hang when calling mcapply

...ow) > > final_dt <- rbindlist(list(out1, out2), use.names = TRUE, fill = TRUE) > > > > # Step E: If needed, summarize across ID_Key to sum presence > > > indicators > > final_result <- final_dt[, lapply(.SD, sum, na.rm = TRUE), by = > > > ID_Key, .SDcols = setdiff(names(final_dt), "ID_Key")] > > > > # note that final_result should now contain summed presence/absence > > > (0/1) indicators. > > > > > Hope this helps! > gregg > somewhereinArizona > > The information in this e-mail...

Cores hang when calling mcapply

2024 Dec 11

1

Cores hang when calling mcapply

...by ID_Key (since they share same columns now) > final_dt <- rbindlist(list(out1, out2), use.names = TRUE, fill = TRUE) > > # Step E: If needed, summarize across ID_Key to sum presence > indicators > final_result <- final_dt[, lapply(.SD, sum, na.rm = TRUE), by = > ID_Key, .SDcols = setdiff(names(final_dt), "ID_Key")] > > # note that final_result should now contain summed presence/absence > (0/1) indicators. Hope this helps! gregg somewhereinArizona The information in this e-mail is intended only for the person to whom it is addressed. If you believ...

aggregate() runs out of memory

2012 Sep 14

3

aggregate() runs out of memory

I have a large data.frame Z (2,424,185,944 bytes, 10,256,441 rows, 17 columns). I want to get the result of table(aggregate(Z$V1, FUN = length, by = list(id=Z$V2))$x) alas, aggregate has been running for ~30 minute, RSS is 14G, VIRT is 24.3G, and no end in sight. both V1 and V2 are characters (not factors). Is there anything I could do to speed this up? Thanks. -- Sam Steingold

Cores hang when calling mcapply

2024 Dec 11

2

Cores hang when calling mcapply

Hi R users. Apologies for the lack of concrete examples because the dataset is large, and it being so I believe is the issue. I multiple, very large datasets for which I need to generate 0/1 absence/presence columns Some include over 200M rows, with two columns that need presence/absence columns based on the strings contained within them, as an example, one set has ~29k unique values and the

search for: sdcols