Displaying 20 results from an estimated 1000 matches similar to: "read large amount of data"
2005 Jul 13
1
read.table
Hi,
I have a question on read.table.
I have a dataset with 273,000 lines and 195 columns. I used the
read.table to load the data into R:
trn<-read.table('train1.dat', header=F, sep='|', na.strings='.')
I found it takes forever.
then I run 1/10 of the data (test) using read.table again. And this
time it finished quickly. So, there might be something wrong in my
data
2005 Aug 12
2
need help
Hi, there:
I think i need to re-phrase my question since last time I did not get
any reply but i think the question is not that hard, probably i did
not make the question clear:
I want to find cases like
35, 90, 330, 330, 335
from the rest which look like
3, 3, 3, 3.2, 3.3
4, 4.4, 4.5, 4.6, 4.7
....
basically there is one (or more) big 'gap' in the case i seek.
thanks,
weiwei
--
2005 Oct 11
1
a problem in random forest
Hi, there:
I spent some time on this but I think I really cannot figure it out, maybe I
missed something here:
my data looks like this:
> dim(trn3)
[1] 7361 209
> dim(val3)
[1] 7427 209
> mg.rf2<-randomForest(x=trn3[,1:208], y=trn3[,209], data=trn3, xtest=val3[,
1:208], ytest=val3[,209], importance=T)
my test data has 7427 observations but after prediction,
> dim(mg.rf2$votes)
2005 Oct 04
1
generalized linear model and missing handling
Hi,
I have a dataset and want to build a generalized linear model on it.
Unfortunately, complete.cases(df) returns null, which means I have to find a
way to "fill" those missings. One way is following my previous post to use
median to replace(or use most freq. of level to replace for catergorical
case), but I am wondering if there are other ways, when glm or something
like it is
2006 Dec 12
0
Re : Re : implementation of t.test
Excuses I have a mistake in previous mail
Type stats:::t.test.defaultThe formal way is to use getAnywhere(t.test)
Justin BEM
Elève Ingénieur Statisticien Economiste
BP 294 Yaoundé.
Tél (00237)9597295.
----- Message d'origine ----
De : justin bem <justin_bem@yahoo.fr>
À : Weiwei Shi <helprhelp@gmail.com>
Cc : R-help@stat.math.ethz.ch
Envoyé le : Mardi, 12 Décembre 2006,
2005 Jul 08
1
"more" and "tab" functionalities in R under linux
Hi,
forgive me if it is due to my "laziness" :)
I am wondering if there are functionalities in R, which can do like
"more" and "tab" in linux:
more(one.data.frame) so I can browse through it. Sometimes I can use
one.data.frame[1:100,], but still not as good as "more" in linux.
tab:
can I use tab to auto complete an defined object name in R so I don't
2005 Oct 05
1
pca in dimension reduction
Hi, there:
I am wondering if anyone here can provide an example using pca doing
dimension reduction for a dataset.
The dataset can be n*q (n>=q or n<=q).
As to dimension reduction, are there other implementations for like ICA,
Isomap, Locally Linear Embedding...
Thanks,
weiwei
--
Weiwei Shi, Ph.D
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III
2005 Oct 11
1
an error in my using of nnet
Hi, there:
I am trying nnet as followed:
> mg.nnet<-nnet(x=trn3[,r.v[1:100]], y=trn3[,209], size=5, decay = 5e-4,
maxit = 200)
# weights: 511
initial value 13822.108453
iter 10 value 7408.169201
iter 20 value 7362.201934
iter 30 value 7361.669408
iter 40 value 7361.294379
iter 50 value 7361.045190
final value 7361.038121
converged
Error in y - tmp : non-numeric argument to binary operator
2005 Jul 25
1
cluster
Dear listers:
Here I have a question on clustering methods available in R. I am
trying to down-sampling the majority class in a classification problem
on an imbalanced dataset. Since I don't want to lose information in
the original dataset, I don't want to use naive down-sampling: I think
using clustering on the majority class' side to select
"representative" samples might
2005 Jul 07
2
randomForest
> From: Weiwei Shi
>
> it works.
> thanks,
>
> but: (just curious)
> why i tried previously and i got
>
> > is.vector(sample.size)
> [1] TRUE
Because a list is also a vector:
> a <- c(list(1), list(2))
> a
[[1]]
[1] 1
[[2]]
[1] 2
> is.vector(a)
[1] TRUE
> is.numeric(a)
[1] FALSE
Actually, the way I initialize a list of known length is by
2011 May 27
4
network package in R
Hi there,
I need a network builder and it can change the node size and color; I am not
sure if network package in R can do this or not. The other functions I
wanted have been found in that package.
BTW, if there is another package in R relating to this, please suggest too.
Thanks,
Weiwei
--
Weiwei Shi, Ph.D
Research Scientist
"Did you always know?"
"No, I did not. But I
2005 Aug 08
2
computationally singular
Hi,
I have a dataset which has around 138 variables and 30,000 cases. I am
trying to calculate a mahalanobis distance matrix for them and my
procedure is like this:
Suppose my data is stored in mymatrix
> S<-cov(mymatrix) # this is fine
> D<-sapply(1:nrow(mymatrix), function(i) mahalanobis(mymatrix, mymatrix[i,], S))
Error in solve.default(cov, ...) : system is computationally
2005 Dec 15
2
question on write.table
Hi,
I have a question on write.table:
I have a data.frame called t7 as below:
> dim(t7)
[1] 14015184 6
> t7[1:5,]
uci uce par line graphical.forms stems
1 0 0 0 0 active activ
2 0 0 0 0 policy polici
3 0 0 0 0 wc PC
4 0 0 0 0 eff elf
5 0 0 0 0 icn ICC
I want to write the
2011 Oct 24
1
heatmap for plotting categorical matrix
Hi there,
I have a matrix like this:
> a4[1:20, 1:5]
194 211 294 314 315
GO:0000003 1 1 1 1 1
GO:0000072 0 0 0 0 0
GO:0000076 1 0 0 0 0
GO:0000082 1 3 1 1 1
GO:0000083 1 0 0 0 1
GO:0000086 0 1 0 1 1
GO:0000114 0 0 0 0 0
GO:0000115 0 0 0 0 0
GO:0000117 0 0 0 0 0
GO:0000160 0 0 1 0 0
2005 Jun 03
1
factor vector manipulation
Hi,
I have one question on factor vector.
I have 3 factor vectors:
a<-factor(c("1", "2", "3"))
b<-factor(c("a", "b", "c"))
c<-factor(c("b", "a", "c"))
what I want is like:
c x
1 b 2
2 a 1
3 c 3
which means, I use b as keys and vector a as values and I find values for c.
I used the following
2006 Jun 03
1
time series clustering
Dear Listers:
I happened to have a problem requiring time-series clustering since the
clusters will change with time (too old data need to be removed from data
while new data comes in). I am wondering if there is some paper or reference
on this topic and there is some kind of implementation in R?
Thanks,
Weiwei
--
Weiwei Shi, Ph.D
"Did you always know?"
"No, I did not. But I
2009 Jul 22
1
margins defined in randomForest and supclust
Hi there,
How to solve the conflicts as to the same object between two packages, for
example, like margins in both randomForest and supclust?
When both libraries are installed, supclust will complain "margins" defined
in randomForest.
I can only solve it by re-starting R, which is very inconvenient, any clever
way?
Thanks,
Weiwei
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
2006 Jan 09
0
Looking for packages to do Feature Selection and Classifi cation
Hi,
You should also check my msc.features.select from caMassClass package. It
has feature selection algorithm that I found useful in case of mass-spectra
data. It performs individual feature selection and/or removes highly
correlated neighbor features.
Jarek
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch]
Sent: Friday, January
2005 Aug 04
1
some thoughts on outlier detection, need help!
Dear listers:
I have an idea to do the outlier detection and I need to use R to
implement it first. Here I hope I can get some input from all the
guru's here.
I select distance-based approach---
step 1:
calculate the distance of any two rows for a dataframe. considering
the scaling among different variables, I choose mahalanobis, using
variance as scaler.
step 2:
Let k be the number of
2006 Apr 24
2
regression modeling
Hi, there:
I am looking for a regression modeling (like regression trees) approach for
a large-scale industry dataset. Any suggestion on a package from R or from
other sources which has a decent accuracy and scalability? Any
recommendation from experience is highly appreciated.
Thanks,
Weiwei
--
Weiwei Shi, Ph.D
"Did you always know?"
"No, I did not. But I believed..."