similar to: read large amount of data

Displaying 20 results from an estimated 1000 matches similar to: "read large amount of data"

2005 Jul 13
1
read.table
Hi, I have a question on read.table. I have a dataset with 273,000 lines and 195 columns. I used the read.table to load the data into R: trn<-read.table('train1.dat', header=F, sep='|', na.strings='.') I found it takes forever. then I run 1/10 of the data (test) using read.table again. And this time it finished quickly. So, there might be something wrong in my data
2005 Aug 12
2
need help
Hi, there: I think i need to re-phrase my question since last time I did not get any reply but i think the question is not that hard, probably i did not make the question clear: I want to find cases like 35, 90, 330, 330, 335 from the rest which look like 3, 3, 3, 3.2, 3.3 4, 4.4, 4.5, 4.6, 4.7 .... basically there is one (or more) big 'gap' in the case i seek. thanks, weiwei --
2005 Oct 11
1
a problem in random forest
Hi, there: I spent some time on this but I think I really cannot figure it out, maybe I missed something here: my data looks like this: > dim(trn3) [1] 7361 209 > dim(val3) [1] 7427 209 > mg.rf2<-randomForest(x=trn3[,1:208], y=trn3[,209], data=trn3, xtest=val3[, 1:208], ytest=val3[,209], importance=T) my test data has 7427 observations but after prediction, > dim(mg.rf2$votes)
2005 Oct 04
1
generalized linear model and missing handling
Hi, I have a dataset and want to build a generalized linear model on it. Unfortunately, complete.cases(df) returns null, which means I have to find a way to "fill" those missings. One way is following my previous post to use median to replace(or use most freq. of level to replace for catergorical case), but I am wondering if there are other ways, when glm or something like it is
2006 Dec 12
0
Re : Re : implementation of t.test
Excuses I have a mistake in previous mail Type stats:::t.test.defaultThe formal way is to use getAnywhere(t.test) Justin BEM Elève Ingénieur Statisticien Economiste BP 294 Yaoundé. Tél (00237)9597295. ----- Message d'origine ---- De : justin bem <justin_bem@yahoo.fr> À : Weiwei Shi <helprhelp@gmail.com> Cc : R-help@stat.math.ethz.ch Envoyé le : Mardi, 12 Décembre 2006,
2005 Jul 08
1
"more" and "tab" functionalities in R under linux
Hi, forgive me if it is due to my "laziness" :) I am wondering if there are functionalities in R, which can do like "more" and "tab" in linux: more(one.data.frame) so I can browse through it. Sometimes I can use one.data.frame[1:100,], but still not as good as "more" in linux. tab: can I use tab to auto complete an defined object name in R so I don't
2005 Oct 05
1
pca in dimension reduction
Hi, there: I am wondering if anyone here can provide an example using pca doing dimension reduction for a dataset. The dataset can be n*q (n>=q or n<=q). As to dimension reduction, are there other implementations for like ICA, Isomap, Locally Linear Embedding... Thanks, weiwei -- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I believed..." ---Matrix III
2005 Oct 11
1
an error in my using of nnet
Hi, there: I am trying nnet as followed: > mg.nnet<-nnet(x=trn3[,r.v[1:100]], y=trn3[,209], size=5, decay = 5e-4, maxit = 200) # weights: 511 initial value 13822.108453 iter 10 value 7408.169201 iter 20 value 7362.201934 iter 30 value 7361.669408 iter 40 value 7361.294379 iter 50 value 7361.045190 final value 7361.038121 converged Error in y - tmp : non-numeric argument to binary operator
2005 Jul 25
1
cluster
Dear listers: Here I have a question on clustering methods available in R. I am trying to down-sampling the majority class in a classification problem on an imbalanced dataset. Since I don't want to lose information in the original dataset, I don't want to use naive down-sampling: I think using clustering on the majority class' side to select "representative" samples might
2005 Jul 07
2
randomForest
> From: Weiwei Shi > > it works. > thanks, > > but: (just curious) > why i tried previously and i got > > > is.vector(sample.size) > [1] TRUE Because a list is also a vector: > a <- c(list(1), list(2)) > a [[1]] [1] 1 [[2]] [1] 2 > is.vector(a) [1] TRUE > is.numeric(a) [1] FALSE Actually, the way I initialize a list of known length is by
2011 May 27
4
network package in R
Hi there, I need a network builder and it can change the node size and color; I am not sure if network package in R can do this or not. The other functions I wanted have been found in that package. BTW, if there is another package in R relating to this, please suggest too. Thanks, Weiwei -- Weiwei Shi, Ph.D Research Scientist "Did you always know?" "No, I did not. But I
2005 Aug 08
2
computationally singular
Hi, I have a dataset which has around 138 variables and 30,000 cases. I am trying to calculate a mahalanobis distance matrix for them and my procedure is like this: Suppose my data is stored in mymatrix > S<-cov(mymatrix) # this is fine > D<-sapply(1:nrow(mymatrix), function(i) mahalanobis(mymatrix, mymatrix[i,], S)) Error in solve.default(cov, ...) : system is computationally
2005 Dec 15
2
question on write.table
Hi, I have a question on write.table: I have a data.frame called t7 as below: > dim(t7) [1] 14015184 6 > t7[1:5,] uci uce par line graphical.forms stems 1 0 0 0 0 active activ 2 0 0 0 0 policy polici 3 0 0 0 0 wc PC 4 0 0 0 0 eff elf 5 0 0 0 0 icn ICC I want to write the
2011 Oct 24
1
heatmap for plotting categorical matrix
Hi there, I have a matrix like this: > a4[1:20, 1:5] 194 211 294 314 315 GO:0000003 1 1 1 1 1 GO:0000072 0 0 0 0 0 GO:0000076 1 0 0 0 0 GO:0000082 1 3 1 1 1 GO:0000083 1 0 0 0 1 GO:0000086 0 1 0 1 1 GO:0000114 0 0 0 0 0 GO:0000115 0 0 0 0 0 GO:0000117 0 0 0 0 0 GO:0000160 0 0 1 0 0
2005 Jun 03
1
factor vector manipulation
Hi, I have one question on factor vector. I have 3 factor vectors: a<-factor(c("1", "2", "3")) b<-factor(c("a", "b", "c")) c<-factor(c("b", "a", "c")) what I want is like: c x 1 b 2 2 a 1 3 c 3 which means, I use b as keys and vector a as values and I find values for c. I used the following
2006 Jun 03
1
time series clustering
Dear Listers: I happened to have a problem requiring time-series clustering since the clusters will change with time (too old data need to be removed from data while new data comes in). I am wondering if there is some paper or reference on this topic and there is some kind of implementation in R? Thanks, Weiwei -- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I
2009 Jul 22
1
margins defined in randomForest and supclust
Hi there, How to solve the conflicts as to the same object between two packages, for example, like margins in both randomForest and supclust? When both libraries are installed, supclust will complain "margins" defined in randomForest. I can only solve it by re-starting R, which is very inconvenient, any clever way? Thanks, Weiwei -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc.
2006 Jan 09
0
Looking for packages to do Feature Selection and Classifi cation
Hi, You should also check my msc.features.select from caMassClass package. It has feature selection algorithm that I found useful in case of mass-spectra data. It performs individual feature selection and/or removes highly correlated neighbor features. Jarek -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] Sent: Friday, January
2005 Aug 04
1
some thoughts on outlier detection, need help!
Dear listers: I have an idea to do the outlier detection and I need to use R to implement it first. Here I hope I can get some input from all the guru's here. I select distance-based approach--- step 1: calculate the distance of any two rows for a dataframe. considering the scaling among different variables, I choose mahalanobis, using variance as scaler. step 2: Let k be the number of
2006 Apr 24
2
regression modeling
Hi, there: I am looking for a regression modeling (like regression trees) approach for a large-scale industry dataset. Any suggestion on a package from R or from other sources which has a decent accuracy and scalability? Any recommendation from experience is highly appreciated. Thanks, Weiwei -- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I believed..."