thr3ads.net - similar to: "str on large data.frame is slow on factors with many levels"

Displaying 20 results from an estimated 1000 matches similar to: "str on large data.frame is slow on factors with many levels"

write.matrix.csr data conversion

2012 Aug 27

write.matrix.csr data conversion

> write.matrix.csr(mx, y = y, file = file) > table(y) 0 1 5194394 23487 $ cut -d' ' -f1 f | sort | uniq -c 23487 2 5194394 1 i.e., 0 is written as 1 and 1 is written as 2. why? is there a way to disable this? -- Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000 http://www.childpsy.net/ http://palestinefacts.org

entropy package: how to compute mutual information?

2012 Feb 13

entropy package: how to compute mutual information?

suppose I have two factor vectors: x <- as.factor(c("a","b","a","c","b","c")) y <- as.factor(c("b","a","a","c","c","b")) I can compute their entropies: entropy(table(x)) [1] 1.098612 using library(entropy) but it is not clear how to compute their mutual information

sum(hist$density) == 2 ?!

2012 Mar 14

sum(hist$density) == 2 ?!

> x <- rnorm(1000) > h <- hist(x,plot=FALSE) > sum(h$density) [1] 2 ----------------------------- shouldn't it be 1?! > h <- hist(x,plot=FALSE, breaks=(-4:4)) > sum(h$density) [1] 1 ----------------------------- now it's 1. why?! -- Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000 http://www.childpsy.net/ http://www.memritv.org

matrix.csr %*% matrix --> matrix

2012 Aug 27

matrix.csr %*% matrix --> matrix

When a sparse matrix is multiplied by a regular one, the result is usually not sparse. However, when matrix.csr is multiplied by a regular matrix in R, a matrix.csr is produced. Is there a way to avoid this? Thanks! -- Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000 http://www.childpsy.net/ http://palestinefacts.org http://truepeace.org

recover lost global function

2012 Apr 04

recover lost global function

Since R has the same namespace for functions and variables, > c <- 1 kills the global function, which can be restored by > c <- get("c",mode="function") Is there a way to prevent R from overriding globals or at least warning when I do that or at least warning when I replace a functional value with non-functional? thanks. -- Sam Steingold (http://sds.podval.org/)

LiblineaR: read/write model files?

2012 Jul 13

LiblineaR: read/write model files?

How do I read/write liblinear models to files? E.g., if I train a model using the command line interface, I might want to load it into R to look the histogram of the weights. Or I might want to train a model in R and then apply it using a command line interface. -- Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000 http://www.childpsy.net/

apply --> data.frame

2012 Aug 30

apply --> data.frame

Is there a way for an apply-type function to return a data frame? the closest thing I think of is foo <- as.data.frame(sapply(...)) names(foo) <- c(....) is there a more "elegant" way? Thanks! -- Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000 http://www.childpsy.net/ http://palestinefacts.org http://dhimmi.com http://honestreporting.com

aggregate help

2012 Sep 20

aggregate help

I want to count attributes of IDs: --8<---------------cut here---------------start------------->8--- z <- data.frame(id=c(10,20,10,30,10,20), a1=c("a","b","a","c","b","b"), a2=c("x","y","x","z","z","y"),

create a data frame with the given column names

2011 Feb 16

create a data frame with the given column names

how do I create a data frame with the given column names _NOT KNOWN IN ADVANCE_? i.e., I have a vector of strings for names and I want to get an _EMPTY_ data frame with these column names. is it at all possible? -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.3 (Final) http://openvotingconsortium.org http://pmw.org.il http://memri.org http://mideasttruth.com

select rows with identical columns from a data frame

2013 Jan 18

select rows with identical columns from a data frame

I have a data frame with several columns. I want to select the rows with no NAs (as with complete.cases) and all columns identical. E.g., for --8<---------------cut here---------------start------------->8--- > f <- data.frame(a=c(1,NA,NA,4),b=c(1,NA,3,40),c=c(1,NA,5,40)) > f a b c 1 1 1 1 2 NA NA NA 3 NA 3 5 4 4 40 40 --8<---------------cut

cedta decided 'igraph' wasn't data.table aware

2013 Apr 21

cedta decided 'igraph' wasn't data.table aware

Hi, what does this mean? --8<---------------cut here---------------start------------->8--- > graph <- graph.data.frame(merged[!v,], vertices=ve, directed=FALSE) cedta decided 'igraph' wasn't data.table aware cedta decided 'igraph' wasn't data.table aware cedta decided 'igraph' wasn't data.table aware cedta decided 'igraph' wasn't

per-vertex statistics of edge weights

2012 Aug 15

per-vertex statistics of edge weights

I have a graph with edge and vertex weights, stored in two data frames: --8<---------------cut here---------------start------------->8--- vertices <- data.frame(vertex=c("a","b","c","d"),weight=c(1,2,1,3)) edges <-

time series from timed data

2011 Mar 18

time series from timed data

Hi, I have data with multiple sub-second entries: 2011/03/15 09:32:15.035619,-0.403103,1.09664,48.6,126.92,117.32 2011/03/15 09:32:15.069331,-0.39851,1.09874,48.6,126.92,117.32 2011/03/15 09:32:15.289135,-0.402463,1.10084,48.59,126.92,117.32 2011/03/15 09:32:15.296110,-0.450244,1.10063,48.59,126.92,117.32 2011/03/15 09:32:15.451358,-0.438813,1.10273,48.59,126.93,117.32 2011/03/15

igraph: decompose.graph: Error: protect(): protection stack overflow

2012 Mar 20

igraph: decompose.graph: Error: protect(): protection stack overflow

I just got this error: > library(igraph) > comp <- decompose.graph(gr) Error: protect(): protection stack overflow Error: protect(): protection stack overflow > what can I do? the digraph is, indeed, large (300,000 vertexes), but there are very many very small components (which I would rather not discard). PS. the doc for decompose.graph does not say which mode is the default. --

merge a list of data frames

2012 Sep 06

merge a list of data frames

I have a list of data frames: > str(data) List of 4 $ :'data.frame': 700773 obs. of 3 variables: ..$ V1: chr [1:700773] "200130446465779" "200070050127778" "200030633708779" "200010587002779" ... ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ... ..$ V3: num [1:700773] 1 1 1 1 1 ... $ :'data.frame': 700773 obs. of 3 variables: ..$

the value of the last expression

2012 Feb 10

the value of the last expression

Is there an analogue of common lisp "*" variable which contains the value of the last expression? E.g., in lisp: > (+ 1 2) 3 > * 3 I wish I could recover the value of the last expression without re-evaluating it. thanks -- Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000 http://www.childpsy.net/ http://camera.org http://ffii.org

how to concatenate factor vectors?

2012 Oct 18

how to concatenate factor vectors?

How do I concatenate two vectors of factors? --8<---------------cut here---------------start------------->8--- > a <- factor(5:1,levels=1:9) > b <- factor(9:1,levels=1:9) > str(c(a,b)) int [1:14] 5 4 3 2 1 9 8 7 6 5 ... > str(unlist(list(a,b),use.names=FALSE)) Factor w/ 9 levels "1","2","3","4",..: 5 4 3 2 1 9 8 7 6 5 ...

removing NA from a data frame

2006 Mar 17

removing NA from a data frame

Hi, It appears that deal does not support missing values (NA), so I need to remove them (NAs) from my data frame. how do I do this? (I am very new to R, so a detailed step-by-step explanation with code samples would be nice). Some columns (variables) have quite a few NAs, so I would rather drop the whole column than sacrifice all the rows (observations) which have NA in that column. How do I

when to use `which'?

2011 Jul 12

when to use `which'?

when do I need to use which()? > a <- c(1,2,3,4,5,6) > a [1] 1 2 3 4 5 6 > a[a==4] [1] 4 > a[which(a==4)] [1] 4 > which(a==4) [1] 4 > a[which(a>2)] [1] 3 4 5 6 > a[a>2] [1] 3 4 5 6 > seems unnecessary... -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031 http://jihadwatch.org http://palestinefacts.org http://mideasttruth.com

"unsparse" a vector

2012 Feb 08

"unsparse" a vector

Suppose I have a vector of strings: c("A1B2","A3C4","B5","C6A7B8") [1] "A1B2" "A3C4" "B5" "C6A7B8" where each string is a sequence of <column><value> pairs (fixed width, in this example both value and name are 1 character, in reality the column name is 6 chars and value is 2 digits). I need to

similar to: str on large data.frame is slow on factors with many levels