Displaying 13 results from an estimated 13 matches for "sdcol".
Did you mean:
dcol
2024 Sep 28
1
lattice xyplot with cumsum() function inside
...ate("2024-01-01"), by = 1,
length.out = 50), xgroup = "A", x = runif(50, 0, 1))
mydt <- rbindlist(list(mydt, data.table(date = mydt$date, xgroup = "B", x = runif(50, 0, 3))))
mydt[, `:=`(xcumsum = cumsum(x)), by = .(xgroup)]
mydt[, lapply(.SD, sum), by = .(xgroup), .SDcols = c("x")]
# xgroup x
# <char> <num>
#1: A 26.00455
#2: B 71.55405
#For xgroup = "B", line starts at the sum of all previous x values
including xgroup = "A"
#Intended result is to separate cumsum(x) for groups "A" and &quo...
2002 May 11
1
deleting invariant rows and cols in a matrix
...stp != 1){
stp.row <- rep(0,nrow(clean))
stp.col <- rep(0,ncol(clean))
# Start with rows
for (i in 1:nrow(clean)){
sdrow <- sd(clean[i,])
if (sdrow==0) clean <- clean[i * -1,]
if (sdrow==0) stp.row[i] <- 1
}
# Next check columns
for (j in 1:ncol(clean)){
sdcol <- sd(clean[,j])
if (sdcol==0) clean <- clean[,j * -1]
if (sdcol==0) stp.col[j] <- 1
}
# Do we need to continue with the process?
if (sum(stp.row)==0 && sum(stp.col)==0) stp <- 1
}
# Output cleaned data to new dataset name
cleaned <<- clean
}
---- end R c...
2020 Sep 24
1
How to use `[` without evaluating the arguments.
...which(colnames(colData) %in% colIDs)
lockBinding('colIDs', internals)
# Assemble the pseudo row and column names for the LongTable
.pasteColons <- function(...) paste(..., collapse=':')
rowData[, `:=`(.rownames=mapply(.pasteColons, transpose(.SD))), .SDcols=internals$rowIDs]
colData[, `:=`(.colnames=mapply(.pasteColons, transpose(.SD))), .SDcols=internals$colIDs]
return(.LongTable(rowData=rowData, colData=colData,
assays=assays, metadata=metadata,
.intern=internals))
}
I have also defined a subset...
2013 Mar 13
3
loop in a data.table
Hi everyone,
I have a data.table called "data" with many columns which I want to
group by column1 using data.table, given how fast it is.
The problem with looping a data.table is that data.table does not like
quotations to define the column names (e.g. "col2" instead of col2).
I found a way around which is to use get("col2"), which works fine but
the
2024 Dec 11
1
Cores hang when calling mcapply
...ID_Key (since they share same columns now)
> final_dt <- rbindlist(list(out1, out2), use.names = TRUE, fill = TRUE)
>
> # Step E: If needed, summarize across ID_Key to sum presence
> indicators
> final_result <- final_dt[, lapply(.SD, sum, na.rm = TRUE), by =
> ID_Key, .SDcols = setdiff(names(final_dt), "ID_Key")]
>
> # note that final_result should now contain summed presence/absence
> (0/1) indicators.
Hope this helps!
gregg
somewhereinArizona
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Typ...
2024 Dec 12
1
Cores hang when calling mcapply
...out1[, (col) := 0]
for (col in out2_missing) out2[, (col) := 0]
setcolorder(out1, all_cols)
setcolorder(out2, all_cols)
final_dt <- rbindlist(list(out1, out2), use.names = TRUE, fill = TRUE)
final_result <- as_tibble(final_dt[, lapply(.SD, sum, na.rm = TRUE), by = ID_Key, .SDcols = setdiff(names(final_dt), "ID_Key")])
Worth noting however:
*
I unfortunately had to keep the multicore parameters for the janitor package to use make_clean_names() because it just took to long to run it on the full dataframe, but deploying data.table CONSIDERABLY reduces the time...
2024 Dec 11
1
Cores hang when calling mcapply
...by ID_Key (since they share same columns now)
> final_dt <- rbindlist(list(out1, out2), use.names = TRUE, fill = TRUE)
>
> # Step E: If needed, summarize across ID_Key to sum presence
> indicators
> final_result <- final_dt[, lapply(.SD, sum, na.rm = TRUE), by =
> ID_Key, .SDcols = setdiff(names(final_dt), "ID_Key")]
>
> # note that final_result should now contain summed presence/absence
> (0/1) indicators.
Hope this helps!
gregg
somewhereinArizona
The information in this e-mail is intended only for the ...{{dropped:14}}
2024 Dec 11
1
Cores hang when calling mcapply
...ist(list(out1, out2), use.names = TRUE, fill = TRUE)
> > >
> > > # Step E: If needed, summarize across ID_Key to sum presence
> >
> > > indicators
> > > final_result <- final_dt[, lapply(.SD, sum, na.rm = TRUE), by =
> >
> > > ID_Key, .SDcols = setdiff(names(final_dt), "ID_Key")]
> > >
> > > # note that final_result should now contain summed presence/absence
> >
> > > (0/1) indicators.
> >
> >
> >
> >
> > Hope this helps!
> > gregg
> > somewh...
2024 Dec 12
1
Cores hang when calling mcapply
...? for (col in out2_missing) out2[, (col) := 0]
> ? ? setcolorder(out1, all_cols)
> ? ? setcolorder(out2, all_cols)
> ? ? final_dt <- rbindlist(list(out1, out2), use.names = TRUE, fill = TRUE)
> ? ? final_result <- as_tibble(final_dt[, lapply(.SD, sum, na.rm = TRUE), by = ID_Key, .SDcols = setdiff(names(final_dt), "ID_Key")])
>
>
> Worth noting however:
>
>
> - I unfortunately had to keep the `multicore`?parameters for the `janitor`?package to use `make_clean_names()`?`because it just took to long? to run it on the full dataframe, but` deployi...
2024 Dec 11
1
Cores hang when calling mcapply
...ow)
> > final_dt <- rbindlist(list(out1, out2), use.names = TRUE, fill = TRUE)
> >
> > # Step E: If needed, summarize across ID_Key to sum presence
>
> > indicators
> > final_result <- final_dt[, lapply(.SD, sum, na.rm = TRUE), by =
>
> > ID_Key, .SDcols = setdiff(names(final_dt), "ID_Key")]
> >
> > # note that final_result should now contain summed presence/absence
>
> > (0/1) indicators.
>
>
>
>
> Hope this helps!
> gregg
> somewhereinArizona
>
> The information in this e-mail...
2024 Dec 11
1
Cores hang when calling mcapply
...by ID_Key (since they share same columns now)
> final_dt <- rbindlist(list(out1, out2), use.names = TRUE, fill = TRUE)
>
> # Step E: If needed, summarize across ID_Key to sum presence
> indicators
> final_result <- final_dt[, lapply(.SD, sum, na.rm = TRUE), by =
> ID_Key, .SDcols = setdiff(names(final_dt), "ID_Key")]
>
> # note that final_result should now contain summed presence/absence
> (0/1) indicators.
Hope this helps!
gregg
somewhereinArizona
The information in this e-mail is intended only for the person to whom it is addressed. If you belie...
2012 Sep 14
3
aggregate() runs out of memory
I have a large data.frame Z (2,424,185,944 bytes, 10,256,441 rows, 17 columns).
I want to get the result of
table(aggregate(Z$V1, FUN = length, by = list(id=Z$V2))$x)
alas, aggregate has been running for ~30 minute, RSS is 14G, VIRT is
24.3G, and no end in sight.
both V1 and V2 are characters (not factors).
Is there anything I could do to speed this up?
Thanks.
--
Sam Steingold
2024 Dec 11
2
Cores hang when calling mcapply
Hi R users.
Apologies for the lack of concrete examples because the dataset is large, and it being so I believe is the issue.
I multiple, very large datasets for which I need to generate 0/1 absence/presence columns
Some include over 200M rows, with two columns that need presence/absence columns based on the strings contained within them, as an example, one set has ~29k unique values and the