Displaying 20 results from an estimated 2000 matches similar to: "Mean or mode imputation fro missing values"
2011 Dec 02
2
Imputing data
So I have a very big matrix of about 900 by 400 and there are a couple of NA
in the list. I have used the following functions to impute the missing data
data(pc)
pc.na<-pc
pc.roughfix <- na.roughfix(pc.na)
pc.narf <- randomForest(pc.na, na.action=na.roughfix)
yet it does not replace the NA in the list. Presently I want to replace the
NA with maybe the mean of the rows or columns or
2008 May 21
2
an unknown error message when using gamm function
Dear everyone,
I'm encountering an unknown error message when using gamm function:
> fitoutput <-
gamm(cvd~as.factor(dow)+pm10+s(time,bs="cr",k=15,fx=TRUE)+s(tmean,bs="cr",k=7,fx=TRUE)
+
,correlation=corAR1(form=~1|city),family=poisson,random=list(city=~pm10),data=mimp)
Maximum number of PQL iterations: 20
iteration 1
iteration 2
iteration 3
iteration 4
2009 Apr 28
1
I can't install dprep
When I want to install "dprep" and I always get information:
> install.packages("dprep")
Warning in install.packages("dprep") :
argument 'lib' is missing: using 'C:\Users\Documents/R/win-library/2.8'
--- Please select a CRAN mirror for use in this session ---
Warning message:
package ?dprep? is not available
I have tried a lot of mirror...
2009 Sep 06
1
Two packages and one method
Hi!
I want to use one method "combinations" from "gtools" package but in my code
I must use also "dprep" method where is method "combinations" too. Mayby I
show you result my help function:
Help on topic 'combinations' was found in the following packages:
Package Library
dprep /usr/lib64/R/library
gtools
2007 Aug 10
1
rfImpute
I am having trouble with the rfImpute function in the randomForest package.
Here is a sample...
clunk.roughfix<-na.roughfix(clunk)
>
> clunk.impute<-rfImpute(CONVERT~.,data=clunk)
ntree OOB 1 2
300: 26.80% 3.83% 85.37%
ntree OOB 1 2
300: 18.56% 5.74% 51.22%
Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree,
:
NA not
2010 Jun 30
2
anyone know why package "RandomForest" na.roughfix is so slow??
Hi all,
I am using the package "random forest" for random forest predictions. I
like the package. However, I have fairly large data sets, and it can often
take *hours* just to go through the "na.roughfix" call, which simply goes
through and cleans up any NA values to either the median (numerical data) or
the most frequent occurrence (factors).
I am going to start
2011 Oct 03
2
Import in R with White Spaces
Hi,
I have a simple question about importing data, I would be very grateful if
you could help me out.
I have used read.csv(file name, header=T, sep=",") to bring in a csv file I
saved in MS Excel.The problem is I have white spaces in the middle of values
(not in the column names), and this messes up the column entries. Since I
have many many files that I am importing and I have spaces
2008 Dec 09
4
Pre-model Variable Reduction
Hello All,
I am trying to carry out variable reduction. I do not have information
about the dependent variable, and have only the X variables as it
were.
In selecting variables I wish to keep, I have considered the following criteria.
1) Percentage of missing value in each column/variable
2) Variance of each variable, with a cut-off value.
I recently came across Weka and found that there is an
2005 Feb 04
2
genetic algorithm
Hi,
I am doing some research on feature selection for classfication
problem using genetic algorithm in a wrapper approach. I am wondering
if there is some package which is already built for this purpose. I
was advised before about dprep package but I don't think it used GA
there (if I am wrong, please correct me!)
Thanks,
Ed
2007 Jun 20
1
Error in funcion distancia() in package dprep(v1.0) (PR#9745)
Full_Name: Kang Yousan
Version: 2.5.0
OS: Windows XP
Submission from: (NULL) (211.137.211.67)
There is a bug in function distancia() in package dprep. See the description
below.
> distancia
1 function (x, y)
2 {
3 if (class(y) == "matrix") {
4 distancia = drop(sqrt(colSums((x - t(y))^2)))
5 distancia = t(distancia)
6 }
7 else distancia =
2003 Aug 05
1
na.action in randomForest --- Summary
A few days ago I asked whether there were options other than
na.action=na.fail for the R port of Breiman?s randomForest; the function?s
help page did not say anything about other options.
I have since discovered that a pdf document called ?The randomForest
Package? and made available by Andy Liaw (who made the tool available in
R---thank you) does discuss an option. It is an implementation of
2012 Mar 05
1
Order a data frame based on the order of another data frame
Hi, I am trying to match the order of the rownames of a dataframe with
the rownames of another dataframe (I can't simply sort both sets
because I would have to change the order of many other connected
datasets if I did that): Also, the second dataset (snp.matrix$fam) is
a snp matrix slot:
so for example:
data_one:
x y
2011 Oct 24
1
Creating data frame with residuals of a data frame
Dear experts,
I am trying to create a data frame from the residuals I get after
having applied a linear regression to each column of a data frame, but
I don't know how to create this data frame from the resulting list
since the list has differing numbers of rows.
So for example:
age<- c(5,6,10,14,16,NA,18)
value1<- c(30,70,40,50,NA,NA,NA)
value2<- c(2,4,1,4,4,4,4)
df<-
2013 Feb 18
1
ggplot2 and facet_wrap help
Dear R experts,
I am trying to arrange multiple plots, creating one graph for each
size1 factor variable in my data frame, and each plot has the median
price on the y-axis and the size2 on the x-axis grouped by clarity:
library(ggplot2)
df <- data.frame(price=matrix(sample(1:1000, 100, replace = TRUE), ncol = 1))
df$size1 = 1:nrow(df)
df$size1 = cut(df$size1, breaks=11)
2012 Apr 14
3
Choose between duplicated rows
Dear r experts,
Sorry for this basic question, but I can't seem to find a solution?
I have this data frame:
df <- data.frame(id = c("id1", "id1", "id1", "id2", "id2", "id2"), A =
c(11905, 11907, 11907, 11829, 11829, 11829), v1 = c(NA, 3, NA,1,2,NA), v2 =
c(NA,2,NA, 2, NA,NA), v3 = c(NA,1,NA,1,NA,NA), v4 = c("N",
2011 Oct 03
2
Merge two data frames and find common values and non-matching values
Hi,
I am trying to find a function to match two data frames of different lengths
for one field only.
So, for example,
df1 is:
Name Position location
francesca A 75
cristina B 36
And df2 is:
location Country
75 UK
56 Austria
And I would like to match on "Location" and the output to be something like:
Name Position Location Match
francesca A 75 1
cristina B 36 0
I have tried with
2012 Apr 17
1
Problem accessing .Rdata objects in a loop
Hi,
I am trying to access many .Rdata objects and do some operations with them
using a loop. I can load the files but can't access them. The files' names
are stored in a character vector called "names". After loading the objects,
I can view each one using ls() and see that two objects are present for
each. I am trying to access the one with the name which is the same as the
2006 Dec 18
1
Memory problem on a linux cluster using a large data set
Hello,
I have a large data set 320.000 rows and 1000 columns. All the data has the values 0,1,2.
I wrote a script to remove all the rows with more than 46 missing values. This works perfect on a smaller dataset. But the problem arises when I try to run it on the larger data set I get an error “cannot allocate vector size 1240 kb”. I’ve searched through previous posts and found out that it might
2003 Jul 27
1
multiple imputation with fit.mult.impute in Hmisc
I have always avoided missing data by keeping my distance from
the real world. But I have a student who is doing a study of
real patients. We're trying to test regression models using
multiple imputation. We did the following (roughly):
f <- aregImpute(~ [list of 32 variables, separated by + signs],
n.impute=20, defaultLinear=T, data=t1)
# I read that 20 is better than the default of
2008 Nov 26
1
multiple imputation with fit.mult.impute in Hmisc - how to replace NA with imputed value?
I am doing multiple imputation with Hmisc, and
can't figure out how to replace the NA values with
the imputed values.
Here's a general ourline of the process:
> set.seed(23)
> library("mice")
> library("Hmisc")
> library("Design")
> d <- read.table("DailyDataRaw_01.txt",header=T)
> length(d);length(d[,1])
[1] 43
[1] 2666