thr3ads.net - similar to: "binning a vector"

Displaying 20 results from an estimated 4000 matches similar to: "binning a vector"

2004 May 10

R versus SAS: lm performance

Hello, A collegue of mine has compared the runtime of a linear model + anova in SAS and S+. He got the same results, but SAS took a bit more than a minute whereas S+ took 17 minutes. I've tried it in R (1.9.0) and it took 15 min. Neither machine run out of memory, and I assume that all machines have similar hardware, but the S+ and SAS machines are on windows whereas the R machine is Redhat

storage of lm objects in a database

2004 May 13

storage of lm objects in a database

Hello, I'd like to use DBI to store lm objects in a database. I've to analyze many of linear models and I cannot store them in a single R-session (not enough memory). Also it'd be nice to have them persistent. Maybe it's possible to create a compact binary representation of the object (the kind of format created created by "save"), so that one doesn't need to write

help with memory greedy storage

2004 May 14

help with memory greedy storage

Hello, I've a problem with a self written routine taking a lot of memory (>1.2Gb). Maybe you can suggest some enhancements, I'm pretty sure that my implementation is not optimal ... I'm creating many linear models and store coefficients, anova p-values ... all I need in different lists which are then finally returned in a list (list of lists). The input is a matrix with 84 rows

sub data frame by expression

2003 Oct 17

sub data frame by expression

Hi All, I've the following data frame with 54 rows and 4 colums: > x Ratio Dose Time Batch R.010mM.04h.NEW 0.02 010mM 04h NEW R.010mM.04h.NEW.1 0.07 010mM 04h NEW ... R.010mM.24h.NEW.2 0.06 010mM 24h NEW R.010mM.04h.OLD 0.19 010mM 04h OLD ... R.010mM.04h.OLD.1 0.49 010mM 04h OLD R.100mM.24h.OLD 0.40 100mM 24h OLD I'd

all values from a data frame

2003 Sep 05

all values from a data frame

Hello, I've a data frame with 15 colums and 6000 rows, and I need the data in a single vector of size 90000 for ttest. Is there such a conversion function in R, or would I have to write my own loop over the colums? thanks for your help + kind regards Arne

data import problem

2006 Mar 08

data import problem

Dear All, I'm trying to read a text data file that contains several records separated by a blank line. Each record starts with a row that contains it's ID and the number of rows for the records (two columns), then the data table itself, e.g. 123 5 89.1791 1.1024 90.5735 1.1024 92.5666 1.1024 95.0725 1.1024 101.2070 1.1024 321 3 60.1601 1.1024 64.8023 1.1024 70.0593

p-values for classification

2005 Jul 01

p-values for classification

Dear All, I'm classifying some data with various methods (binary classification). I'm interpreting the results via a confusion matrix from which I calculate the sensitifity and the fdr. The classifiers are trained on 575 data points and my test set has 50 data points. I'd like to calculate p-values for obtaining <=fdr and >=sensitifity for each classifier. I was thinking about

graphics and 'layout' question

2006 Sep 15

graphics and 'layout' question

Hello, I got stuck with a graphics question: I've 3 figures that I present on a single page (window) via 'layout'. The layout is layout(matrix(c(1,1,2,3), 2, 2, byrow=TRUE)); so that the frst plot spans the both columns in row one. Now I'd like to magnify the fist figure so that it takes 20% more vertical space (i.e. more space for the y-axis). How would I do this in R?

unbalanced design for anova with low number of replicates

2004 Jun 28

unbalanced design for anova with low number of replicates

Hello, I'm wondering what's the best way to analyse an unbalanced design with a low number of replicates. I'm not a statistician, and I'm looking for some direction for this problem. I've a 2 factor design: Factor batch with 3 levels, and factor dose within each batch with 5 levels. Dose level 1 in batch one is replicated 4 times, level 3 is replicated only 2 times. all

number point under-flow

2004 Feb 04

number point under-flow

Hello, I've come across the following situation in R-1.8.1 (compile + running under RedHat 7.1): > phyper(24, 514, 5961-514, 53, lower.tail=T) [1] 1 > phyper(24, 514, 5961-514, 53, lower.tail=F) [1] -1.037310e-11 I'd expect the later to be 0 or some very small positive number. Is this a number under-flow of the calculation? Do you think I'm safe if I just set the result to 0

splitting very long character string

2006 Nov 01

splitting very long character string

Hello, I've a very long character array (>500k characters) that need to split by '\n' resulting in an array of about 60k numbers. The help on strsplit says to use perl=TRUE to get better formance, but still it takes several minutes to split this string. The massive string is the return value of a call to xmlElementsByTagName from the XML library and looks like this: ... 12345

RandomForest question

2005 Jul 21

RandomForest question

Hello, I'm trying to find out the optimal number of splits (mtry parameter) for a randomForest classification. The classification is binary and there are 32 explanatory variables (mostly factors with each up to 4 levels but also some numeric variables) and 575 cases. I've seen that although there are only 32 explanatory variables the best classification performance is reached when

updating via CRAN and http

2003 Oct 08

updating via CRAN and http

Hello, thanks for the tips on updating packages for 1.8.0. The updating is a real problem for me, since I've to do it sort of manually using my web-browser or wget. I'm behind a firewall that requires http/ftp authentification (username and passwd) for every request it sends to a server outside our intranet. Therefore all the nice tools for automatic updating (cran, cpan ...) don't

Boxplot, space to axis

2004 Sep 30

Boxplot, space to axis

Hello, I've crearted a boxplot with 84 boxes. So fat everything is as I expect, but there is quite some space between the 1st box and axis 2 and the last box and axis 4. Since 84 boxes get very slim anyway I'd like to discribute as much of the horizontal space over the x-axis. Maybe I've forgotten about a graphics parameter? Thanks for your help, Arne

paste dimnames problem

2004 Apr 27

paste dimnames problem

Hello, I've the following list n: > n [[1]] [1] "NEW" "OLD" "PRG" [[2]] [1] "04h" "24h" [[3]] [1] "000mM" "010mM" "025mM" "050mM" "100mM" where n <- dimnames(some.multidim.array) I'm trying to define a generic function that generates meaningful names from this list, e.g.

svm and scaling input

2005 Jun 28

svm and scaling input

Dear All, I've a question about scaling the input variables for an analysis with svm (package e1071). Most of my variables are factors with 4 to 6 levels but there are also some numeric variables. I'm not familiar with the math behind svms, so my assumtions maybe completely wrong ... or obvious. Will the svm automatically expand the factors into a binary matrix? If I add numeric

Jonckheere-Terpstra test

2003 Oct 05

Jonckheere-Terpstra test

Hello, can anybody here explain what a Jonckheere-Terpstra test is and whether it is implemented in R? I just know it's a non-parametric test, otherwise I've no clue about it ;-( . Are there alternatives to this test? thanks for help, Arne

calculating IC50

2006 Feb 02

calculating IC50

Hello, I was wondering if there is an R-package to automatically calculate the IC50 value (concentration of a substrance that inhibits cell growth to 50%) for some measurements. kind regards, Arne [[alternative HTML version deleted]]

significance in difference of proportions

2003 Nov 27

significance in difference of proportions

Hello, I'm looking for some guidance with the following problem: I've 2 samples A (111 items) and B (10 items) drawn from the same unknown population. Witihn A I find 9 "positives" and in B 0 positives. I'd like to know if the 2 samples A and B are different, ie is there a way to find out whether the number of "positives" is significantly different in A and B?

Principal component analysis

2002 Dec 09

Principal component analysis

Dear R users, I'm trying to cluster 30 gene chips using principal component analysis in package mva.prcomp. Each chip is a point with 1,000 dimensions. PCA is probably just one of several methods to cluster the 30 chips. However, I don't know how to run prcomp, and I don't know how to interpret it's output. If there are 30 data points in 1,000 dimensions each, do I have to

similar to: binning a vector