thr3ads.net - similar to: "Processing logic for Huge Data set"

Displaying 20 results from an estimated 9000 matches similar to: "Processing logic for Huge Data set"

2004 Apr 20

Rank - Descending order

Dear All, Is there any simple way to way to produce "rank", for a given list, but in a descending order? E.G: x = list(a=c(1,5,2,4)); rank(x$a); produces 1,4,2,3 However I am looking for a way to generate (4,1,3,2). It would be particularly nice if the proposed solution has all the niceties of rank function (like NA handling and ties.method functionality) TIA Manoj

Multiple lapply get-around

2004 Aug 30

Multiple lapply get-around

I am faced with a situation wherein I have to use multiple lapply's. The pseudo-code could be approximated to something as below: For each X from i=1 to n For each Y based on j=1 to m For each F from 1 to f Do some calculation based on Fij Store Xi,Yj = Fij End For F End for Y End for X Is there anyway to optimize the processing logic further? I *guess* using the multiple lapply

Multiple regression

2004 Jun 09

Multiple regression

Hi, I am trying to do multiple regression on a set of data using backward stepwise regression....however backward stepwise regression is critised for overfitting data. To actually observe the bias and to come up with a better method to use..Could you all stats experts kindly give me pointers to any alternative procedure (or references) to use over backward stepwise regression from your

Performing regression using R & C

2004 Nov 19

Performing regression using R & C

Dear All, Is it possible to perform OLS using C code? I am trying to optimize a n-period "moving window" OLS on a huge dataset hence was wondering if such a thing is possible. Ideally the solution that I am looking for would involve a C-code accepting two float arrays and returning back computed parameters such as t-stat, coefficient etc. I have glanced thru the FAQ's and tried

Statistical analysis of huge datasets.

2003 Aug 07

Statistical analysis of huge datasets.

Dear R-users, I am faced with the problem of analyzing a huge dataset (+ 2 million records, +150 variables) which does not fit into memory. I would like to know if there are pre-packaged tools (in the spirit of Insigthful I-Miner, for instance) aimed at subsampling or splitting the dataset into data-frameable subdatasets, applying functions record-wise, etc. Thank you very much for your

CVnn2 + nnet question

2004 Jun 14

CVnn2 + nnet question

Hi, I am trying to determine the number of units in the hidden layer and the decay rate using the CVnn2 script found in MASS directory (reference: pg 348,MASS-4). The model that I am using is in the form of Y ~ X1 + X2 + X3... + X11 and the underlying data is time-series in nature. I found the MASS and nnet package extremely useful (many thanks to the contributors). However I am getting

Systems of equations in glm?

2002 May 30

Systems of equations in glm?

I have a student that I'm encouraging to use R rather than SAS or Stata and within just 2 weeks he has come up with a question that stumps me. What does a person do about endogeneity in generalized linear models? Suppose Y1 and Y2 are 5 category ordinal dependent variables. I see that MASS has polr for estimation of models like that, as long as they are independent. But what if the

Generate a serie of new vars that correlate with existingvar

2007 Apr 05

Generate a serie of new vars that correlate with existingvar

Hello, list why not add the smart proposal by Greg Snow as a built-in function in {stats}, just changing the "x234" and "newc" lines to allow for more distributions to be generated ? Or do I miss an already existing function to do that ? Regards. Olivier # slight modification of the original code by Greg Snow [mailto:Greg.Snow at intermountainmail.org] # on April 04, 2007

Party package: varimp(..., conditional=TRUE) error: term 1 would require 9e+12 columns

2011 Oct 14

Party package: varimp(..., conditional=TRUE) error: term 1 would require 9e+12 columns

I would like to build a forest of regression trees to see how well some covariates predict a response variable and to examine the importance of the covariates. I have a small number of covariates (8) and large number of records (27368). The response and all of the covariates are continuous variables. A cursory examination of the covariates does not suggest they are correlated in a simple fashion

using aggregate with survey-design and survey functions

2003 Sep 20

using aggregate with survey-design and survey functions

Hi R users, I am trying to use the aggregate function with a survey design object and survey functions, but get the following error. I think I am incorrectly using the syntax somehow, and it may not be possible to access variables directly by name in a survey-design object. Am I right? How do I fix this problem? I have used aggregate with "mean" and "weighted.mean", and

Using 'dimname names' in aperm() and apply()

2010 Jul 29

Using 'dimname names' in aperm() and apply()

I think that the "dimname names" of tables and arrays could make aperm() and apply() (and probably some other functions) easier to use. (dimname names are, for example, created by table() ) The use would be something like: -- x <-table( from=sample(3,100,rep=T), to=sample(5,100,rep=T)) trans <- x / apply(x,"from",sum) y <- aperm( trans,

type conversion with apply or not

2010 Jun 08

type conversion with apply or not

Folks, i thought it should be straightforward but after a few hours poking around, I decided it's best to post my question on this list. I have a data frame consisting of a (large) number of date columns, which are read in from a csv file as character string. I want to convert them to Date type. Following is an example, where the first column is of integer type, while the rest are type

Change of parsing parameters to functions between 0.63.1 and 0.63.3 ?

1999 Mar 24

Change of parsing parameters to functions between 0.63.1 and 0.63.3 ?

Hi, I wonder whether the mechanism of parsing parameters to functions has changed between 0.63.1 and 0.63.3? The following code yeils different results in R 0.63.1 (Version 0.63.1 (Dec 5, 1998)) and R 0.63.3. cave<-function(x,a,b) { return(c(mean(x[a],na.rm=T),mean(x[b],na.rm=T))) } datx <- data.frame(rbind(c(1,2,3,4),c(4,5,6,7)))

Change of parsing parameters to functions between 0.63.1 and 0.63.3 ?

1999 Mar 24

Change of parsing parameters to functions between 0.63.1 and 0.63.3 ?

fetching columns from another file

2010 Jan 24

fetching columns from another file

Hi! All, I am trying to fetch rows from a data frame which matches to first 2 columns of another data frame. Here is the example what I am trying to do: > ptable=read.table(file="All.txt",header=T,sep="\t") > ptable=as.matrix(ptable) > dim(ptable) [1] 9275 6 > head(ptable) Gene1 Gene2 PCC PCC3 PCC23 PCC123 [1,]

How to obtain coefficient standard error from the result of polr?

2007 Jun 04

How to obtain coefficient standard error from the result of polr?

Hi - I am using polr. I can get a result from polr fit by calling result.plr <- polr(formula, data=mydata, method="probit"); However, from the 'result.plr', how can I access standard error of the estimated coefficients as well as the t statistics for each one of them? What I would like to do ultimately is to see which coefficients are not significant and try to refit the

Strange parametrization in polr

2004 Jan 08

Strange parametrization in polr

In Venables \& Ripley 3rd edition (p. 231) the proportional odds model is described as: logit(p<=k) = zeta_k + eta but polr apparently thinks there is a minus in front of eta, as is apprent below. Is this a bug og a feature I have overlooked? Here is the naked code for reproduction, below the results. ------------------------------------------------------------------------ --- version

No P.values in polr summary

2013 Oct 18

No P.values in polr summary

Hi everyone, If I compute a "Ordered Logistic or Probit Regression" with the polr function from MASS package. the summary give me : coefficients, Standard error and Tvalue.. but not directly the p.value. I can compute "manualy" the Pvalue, but Is there a way to directly obtain the pa.value, and I wonder why the p.valeu is not directly calculated, is there a reason? exemple

Subpopulations in Complex Surveys

2003 Feb 19

Subpopulations in Complex Surveys

Hi, is there a way to analyze subpopulations (e.g. women over 50, those who answered "yes" to a particular question) in a survey using Survey package? Other packages (e.g. Stata, SUDAAN) do this with a subpopulation option to identify the subpopulation for which the analysis shoud be done. I did not see this option in the Survey package. Is there another way to do this?

SPlus to R

2011 Oct 05

SPlus to R

I'm trying to convert an S-Plus program to R. Since I'm a SAS programmer I'm not facile is either S-Plus or R, so I need some help. All I did was convert the underscores in S-Plus to the assignment operator <-. Here are the first few lines of the S-Plus file: sshc _ function(rc, nc, d, method, alpha=0.05, power=0.8, tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2),

similar to: Processing logic for Huge Data set