similar to: Processing logic for Huge Data set

Displaying 20 results from an estimated 9000 matches similar to: "Processing logic for Huge Data set"

2004 Apr 20
2
Rank - Descending order
Dear All, Is there any simple way to way to produce "rank", for a given list, but in a descending order? E.G: x = list(a=c(1,5,2,4)); rank(x$a); produces 1,4,2,3 However I am looking for a way to generate (4,1,3,2). It would be particularly nice if the proposed solution has all the niceties of rank function (like NA handling and ties.method functionality) TIA Manoj
2004 Aug 30
3
Multiple lapply get-around
I am faced with a situation wherein I have to use multiple lapply's. The pseudo-code could be approximated to something as below: For each X from i=1 to n For each Y based on j=1 to m For each F from 1 to f Do some calculation based on Fij Store Xi,Yj = Fij End For F End for Y End for X Is there anyway to optimize the processing logic further? I *guess* using the multiple lapply
2004 Jun 09
1
Multiple regression
Hi, I am trying to do multiple regression on a set of data using backward stepwise regression....however backward stepwise regression is critised for overfitting data. To actually observe the bias and to come up with a better method to use..Could you all stats experts kindly give me pointers to any alternative procedure (or references) to use over backward stepwise regression from your
2004 Nov 19
2
Performing regression using R & C
Dear All, Is it possible to perform OLS using C code? I am trying to optimize a n-period "moving window" OLS on a huge dataset hence was wondering if such a thing is possible. Ideally the solution that I am looking for would involve a C-code accepting two float arrays and returning back computed parameters such as t-stat, coefficient etc. I have glanced thru the FAQ's and tried
2003 Aug 07
2
Statistical analysis of huge datasets.
Dear R-users, I am faced with the problem of analyzing a huge dataset (+ 2 million records, +150 variables) which does not fit into memory. I would like to know if there are pre-packaged tools (in the spirit of Insigthful I-Miner, for instance) aimed at subsampling or splitting the dataset into data-frameable subdatasets, applying functions record-wise, etc. Thank you very much for your
2004 Jun 14
2
CVnn2 + nnet question
Hi, I am trying to determine the number of units in the hidden layer and the decay rate using the CVnn2 script found in MASS directory (reference: pg 348,MASS-4). The model that I am using is in the form of Y ~ X1 + X2 + X3... + X11 and the underlying data is time-series in nature. I found the MASS and nnet package extremely useful (many thanks to the contributors). However I am getting
2002 May 30
2
Systems of equations in glm?
I have a student that I'm encouraging to use R rather than SAS or Stata and within just 2 weeks he has come up with a question that stumps me. What does a person do about endogeneity in generalized linear models? Suppose Y1 and Y2 are 5 category ordinal dependent variables. I see that MASS has polr for estimation of models like that, as long as they are independent. But what if the
2007 Apr 05
1
Generate a serie of new vars that correlate with existingvar
Hello, list why not add the smart proposal by Greg Snow as a built-in function in {stats}, just changing the "x234" and "newc" lines to allow for more distributions to be generated ? Or do I miss an already existing function to do that ? Regards. Olivier # slight modification of the original code by Greg Snow [mailto:Greg.Snow at intermountainmail.org] # on April 04, 2007
2011 Oct 14
1
Party package: varimp(..., conditional=TRUE) error: term 1 would require 9e+12 columns
I would like to build a forest of regression trees to see how well some covariates predict a response variable and to examine the importance of the covariates. I have a small number of covariates (8) and large number of records (27368). The response and all of the covariates are continuous variables. A cursory examination of the covariates does not suggest they are correlated in a simple fashion
2003 Sep 20
4
using aggregate with survey-design and survey functions
Hi R users, I am trying to use the aggregate function with a survey design object and survey functions, but get the following error. I think I am incorrectly using the syntax somehow, and it may not be possible to access variables directly by name in a survey-design object. Am I right? How do I fix this problem? I have used aggregate with "mean" and "weighted.mean", and
2010 Jul 29
1
Using 'dimname names' in aperm() and apply()
I think that the "dimname names" of tables and arrays could make aperm() and apply() (and probably some other functions) easier to use. (dimname names are, for example, created by table() ) The use would be something like: -- x <-table( from=sample(3,100,rep=T), to=sample(5,100,rep=T)) trans <- x / apply(x,"from",sum) y <- aperm( trans,
2010 Jun 08
2
type conversion with apply or not
Folks, i thought it should be straightforward but after a few hours poking around, I decided it's best to post my question on this list. I have a data frame consisting of a (large) number of date columns, which are read in from a csv file as character string. I want to convert them to Date type. Following is an example, where the first column is of integer type, while the rest are type
1999 Mar 24
2
Change of parsing parameters to functions between 0.63.1 and 0.63.3 ?
Hi, I wonder whether the mechanism of parsing parameters to functions has changed between 0.63.1 and 0.63.3? The following code yeils different results in R 0.63.1 (Version 0.63.1 (Dec 5, 1998)) and R 0.63.3. cave<-function(x,a,b) { return(c(mean(x[a],na.rm=T),mean(x[b],na.rm=T))) } datx <- data.frame(rbind(c(1,2,3,4),c(4,5,6,7)))
1999 Mar 24
2
Change of parsing parameters to functions between 0.63.1 and 0.63.3 ?
Hi, I wonder whether the mechanism of parsing parameters to functions has changed between 0.63.1 and 0.63.3? The following code yeils different results in R 0.63.1 (Version 0.63.1 (Dec 5, 1998)) and R 0.63.3. cave<-function(x,a,b) { return(c(mean(x[a],na.rm=T),mean(x[b],na.rm=T))) } datx <- data.frame(rbind(c(1,2,3,4),c(4,5,6,7)))
2010 Jan 24
2
fetching columns from another file
Hi! All, I am trying to fetch rows from a data frame which matches to first 2 columns of another data frame. Here is the example what I am trying to do: > ptable=read.table(file="All.txt",header=T,sep="\t") > ptable=as.matrix(ptable) > dim(ptable) [1] 9275 6 > head(ptable) Gene1 Gene2 PCC PCC3 PCC23 PCC123 [1,]
2007 Jun 04
2
How to obtain coefficient standard error from the result of polr?
Hi - I am using polr. I can get a result from polr fit by calling result.plr <- polr(formula, data=mydata, method="probit"); However, from the 'result.plr', how can I access standard error of the estimated coefficients as well as the t statistics for each one of them? What I would like to do ultimately is to see which coefficients are not significant and try to refit the
2004 Jan 08
3
Strange parametrization in polr
In Venables \& Ripley 3rd edition (p. 231) the proportional odds model is described as: logit(p<=k) = zeta_k + eta but polr apparently thinks there is a minus in front of eta, as is apprent below. Is this a bug og a feature I have overlooked? Here is the naked code for reproduction, below the results. ------------------------------------------------------------------------ --- version
2013 Oct 18
1
No P.values in polr summary
Hi everyone, If I compute a "Ordered Logistic or Probit Regression" with the polr function from MASS package. the summary give me : coefficients, Standard error and Tvalue.. but not directly the p.value. I can compute "manualy" the Pvalue, but Is there a way to directly obtain the pa.value, and I wonder why the p.valeu is not directly calculated, is there a reason? exemple
2003 Feb 19
5
Subpopulations in Complex Surveys
Hi, is there a way to analyze subpopulations (e.g. women over 50, those who answered "yes" to a particular question) in a survey using Survey package? Other packages (e.g. Stata, SUDAAN) do this with a subpopulation option to identify the subpopulation for which the analysis shoud be done. I did not see this option in the Survey package. Is there another way to do this?
2011 Oct 05
4
SPlus to R
I'm trying to convert an S-Plus program to R.  Since I'm a SAS programmer I'm not facile is either S-Plus or R, so I need some help.  All I did was convert the underscores in S-Plus to the assignment operator <-.  Here are the first few lines of the S-Plus file:   sshc _ function(rc, nc, d, method, alpha=0.05, power=0.8,              tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2),