thr3ads.net - similar to: "cor() on sets of vectors"

Displaying 20 results from an estimated 1000 matches similar to: "cor() on sets of vectors"

entropy package: how to compute mutual information?

2012 Feb 13

entropy package: how to compute mutual information?

suppose I have two factor vectors: x <- as.factor(c("a","b","a","c","b","c")) y <- as.factor(c("b","a","a","c","c","b")) I can compute their entropies: entropy(table(x)) [1] 1.098612 using library(entropy) but it is not clear how to compute their mutual information

the value of the last expression

2012 Feb 10

the value of the last expression

Is there an analogue of common lisp "*" variable which contains the value of the last expression? E.g., in lisp: > (+ 1 2) 3 > * 3 I wish I could recover the value of the last expression without re-evaluating it. thanks -- Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000 http://www.childpsy.net/ http://camera.org http://ffii.org

list to matrix?

2012 Dec 04

list to matrix?

How do I convert a list to a matrix? --8<---------------cut here---------------start------------->8--- list(c(50000, 101), c(1e+05, 46), c(150000, 31), c(2e+05, 17), c(250000, 19), c(3e+05, 11), c(350000, 12), c(4e+05, 25), c(450000, 19), c(5e+05, 16)) as.matrix(a) [,1] [1,] Numeric,2 [2,] Numeric,2 [3,] Numeric,2 [4,] Numeric,2 [5,] Numeric,2 [6,] Numeric,2 [7,]

"unsparse" a vector

2012 Feb 08

"unsparse" a vector

Suppose I have a vector of strings: c("A1B2","A3C4","B5","C6A7B8") [1] "A1B2" "A3C4" "B5" "C6A7B8" where each string is a sequence of <column><value> pairs (fixed width, in this example both value and name are 1 character, in reality the column name is 6 chars and value is 2 digits). I need to

plot means ?

2011 Jul 11

plot means ?

Hi, I need this plot: given: x,y - numerical vectors of length N plot xi vs mean(yj such that |xj - xi|<epsilon) (running mean?) alternatively, discretize X as if for histogram plotting and plot mean y over the center of the histogram group. is there a simple way? thanks! -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031 http://thereligionofpeace.com

LiblineaR: read/write model files?

2012 Jul 13

LiblineaR: read/write model files?

How do I read/write liblinear models to files? E.g., if I train a model using the command line interface, I might want to load it into R to look the histogram of the weights. Or I might want to train a model in R and then apply it using a command line interface. -- Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000 http://www.childpsy.net/

count.fields inconsistent with read.table?

2012 Feb 24

count.fields inconsistent with read.table?

Hi, batch is a vector of lines returned by readLines from a NL-line-terminated file, here is the relevant section: ========================================================= AA BB CC DD EE FF GG H H JJ KK LL MM ========================================================= as you can see, a line is corrupt; two CRLF's are inserted. This is okay, I drop the bad lines, at least I hope I do:

non-consing count

2013 Jan 04

non-consing count

Hi, to count vector elements with some property, the standard idiom seems to be length(which): --8<---------------cut here---------------start------------->8--- x <- c(1,1,0,0,0) count.0 <- length(which(x == 0)) --8<---------------cut here---------------end--------------->8--- however, this approach allocates and discards 2 vectors: a logical vector of length=length(x) and an

naiveBayes: slow predict, weird results

2012 Feb 10

naiveBayes: slow predict, weird results

I did this: nb <- naiveBayes(users, platform) pl <- predict(nb,users) nrow(users) ==> 314781 ncol(users) ==> 109 1. naiveBayes() was quite fast (~20 seconds), while predict() was slow (tens of minutes). why? 2. the predict results were completely off the mark (quite the opposite of the expected overfitting). suffice it to show the tables: pl: android blackberry ipad

removing NA from a data frame

2006 Mar 17

removing NA from a data frame

Hi, It appears that deal does not support missing values (NA), so I need to remove them (NAs) from my data frame. how do I do this? (I am very new to R, so a detailed step-by-step explanation with code samples would be nice). Some columns (variables) have quite a few NAs, so I would rather drop the whole column than sacrifice all the rows (observations) which have NA in that column. How do I

drop zero slots from table?

2012 Sep 19

drop zero slots from table?

I find myself doing --8<---------------cut here---------------start------------->8--- tab <- table(...) tab <- tab[tab > 0] tab <- sort(tab,decreasing=TRUE) --8<---------------cut here---------------end--------------->8--- all the time. I am wondering if the "drop 0" (and maybe even sort?) can be effected by some magic argument to table() which I fail to discover

as.data.frame(do.call(rbind,lapply)) produces something weird

2012 Nov 09

as.data.frame(do.call(rbind,lapply)) produces something weird

The following code: --8<---------------cut here---------------start------------->8--- > myfun <- function (x) list(x=x,y=x*x) > z <- as.data.frame(do.call(rbind,lapply(1:3,function(x) c(a=paste("a",x,sep=""),as.list(unlist(list(b=myfun(x),c=myfun(x*x*x)))))))) > z a b.x b.y c.x c.y 1 a1 1 1 1 1 2 a2 2 4 8 64 3 a3 3 9 27 729

aggregate() runs out of memory

2012 Sep 14

aggregate() runs out of memory

I have a large data.frame Z (2,424,185,944 bytes, 10,256,441 rows, 17 columns). I want to get the result of table(aggregate(Z$V1, FUN = length, by = list(id=Z$V2))$x) alas, aggregate has been running for ~30 minute, RSS is 14G, VIRT is 24.3G, and no end in sight. both V1 and V2 are characters (not factors). Is there anything I could do to speed this up? Thanks. -- Sam Steingold

summary for factors is not very informative

2011 Feb 15

summary for factors is not very informative

summary() for a factor prints: ColName SNDK : 72 VXX : 36 MWW : 30 ACI : 28 FRO : 28 (Other):1801 it would have been much more useful if it additionally printed frequency stats as if by summary(aggregate(frame$ColName,by=list(frame$ColName),FUN=length)$x) -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.3 (Final) http://jihadwatch.org

recover lost global function

2012 Apr 04

recover lost global function

Since R has the same namespace for functions and variables, > c <- 1 kills the global function, which can be restored by > c <- get("c",mode="function") Is there a way to prevent R from overriding globals or at least warning when I do that or at least warning when I replace a functional value with non-functional? thanks. -- Sam Steingold (http://sds.podval.org/)

all.equal: subscript out of bounds

2011 Feb 15

all.equal: subscript out of bounds

When I do > all(all$X.Time == all$Y.Time); [1] TRUE as expected, but > all.equal(all$X.Time,all$Y.Time); Error in target[[i]] : subscript out of bounds why? thanks! -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.3 (Final) http://mideasttruth.com http://honestreporting.com http://dhimmi.com http://jihadwatch.org http://pmw.org.il http://ffii.org The dark past once was the

cedta decided 'igraph' wasn't data.table aware

2013 Apr 21

cedta decided 'igraph' wasn't data.table aware

Hi, what does this mean? --8<---------------cut here---------------start------------->8--- > graph <- graph.data.frame(merged[!v,], vertices=ve, directed=FALSE) cedta decided 'igraph' wasn't data.table aware cedta decided 'igraph' wasn't data.table aware cedta decided 'igraph' wasn't data.table aware cedta decided 'igraph' wasn't

plot with a regression line(s)

2012 Apr 04

plot with a regression line(s)

I am sure a common need is to plot a scatterplot with some fitted line(s) and maybe save to a file. I have this: plot.glm <- function (x, y, file = NULL, xlab = deparse(substitute(x)), ylab = deparse(substitute(y)), main = NULL) { m <- glm(y ~ x) if (!is.null(file)) pdf(file = file) plot(x, y, xlab = xlab, ylab = ylab, main = main) lines(x, y =

igraph: decompose.graph: Error: protect(): protection stack overflow

2012 Mar 20

igraph: decompose.graph: Error: protect(): protection stack overflow

I just got this error: > library(igraph) > comp <- decompose.graph(gr) Error: protect(): protection stack overflow Error: protect(): protection stack overflow > what can I do? the digraph is, indeed, large (300,000 vertexes), but there are very many very small components (which I would rather not discard). PS. the doc for decompose.graph does not say which mode is the default. --

cannot turn some columns in a data frame into factors

2006 May 11

cannot turn some columns in a data frame into factors

Hi, I have a data frame df and a list of names of columns that I want to turn into factors: df.names <- attr(df,"names") sapply(factors, function (name) { pos <- match(name,df.names) if (is.na(pos)) stop(paste(name,": no such column\n")) df[[pos]] <- factor(df[[pos]]) cat(name,"(",pos,"):",is.factor(df[[pos]]),"\n")

similar to: cor() on sets of vectors