thr3ads.net - similar to: "problem about set operation and computation after split"

Displaying 20 results from an estimated 600 matches similar to: "problem about set operation and computation after split"

split() is slow on data.frame (PR#14123)

2009 Dec 09

split() is slow on data.frame (PR#14123)

Please see the following code for the runtime comparison between split() and mysplit.data.frame() (they do the same thing semantically). mysplit.data.frame() is a fix of split() in term of performance. Could somebody include this fix (with possible checking for corner cases) in future version of R and let me know the inclusion of the fix? m=300000 n=6 k=30000 set.seed(0) x=replicate(n,rnorm(m))

split() is slow on data.frame (PR#14123)

2009 Dec 09

split() is slow on data.frame (PR#14123)

lme unequal random-effects variances varIdent pdMat Pinheiro Bates nlme

2004 Jul 12

lme unequal random-effects variances varIdent pdMat Pinheiro Bates nlme

How does one implement a likelihood-ratio test, to test whether the variances of the random effects differ between two groups of subjects? Suppose your data consist of repeated measures on subjects belonging to two groups, say boys and girls, and you are fitting a linear mixed-effects model for the response as a function of time. The within-subject errors (residuals) have the same variance in

split data, but ensure each level of the factor is represented

2008 Oct 13

split data, but ensure each level of the factor is represented

Hello, I'll use part of the iris dataset for an example of what I want to do. > data(iris) > iris<-iris[1:10,1:4] > iris Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 4 4.6 3.1 1.5

Using apply for two datasets

2009 Jan 06

Using apply for two datasets

I can run one-sample t-test on an array, for example a matrix myData1, with the following apply(myData1, 2, t.test) Is there a similar fashion using apply() or something else to run 2-sample t-test with datasets from two groups, myData1 and myData2, without looping? TIA, Gang

Boxplot Help for Neophyte

2006 Feb 20

Boxplot Help for Neophyte

R helpers I am getting to grips with R but came across a small problem today that I could not fix by myself. I have 3 text files, each with a single column of data. I read them in using: myData1<-scan("C:/Program Files/R/myData1.txt") myData2<-scan("C:/Program Files/R/myData2.txt") myData3<-scan("C:/Program Files/R/myData3.txt") I wanted to produce a

reading data into R

2012 May 15

reading data into R

Hi I am really new using R, so this is really a beginner stuff! I created a very small data set on excel and then converted it to .csv file. I am able to open the data on R using the command "read.table ("mydata1.csv", sep=",", header=T)" and it just works fine. But when I want to work on the data (e.g. calculate the mean of variable "X") R says

speed up process

2011 Feb 25

speed up process

Dear users, I have a double for loop that does exactly what I want, but is quite slow. It is not so much with this simplified example, but IRL it is slow. Can anyone help me improve it? The data and code for foo_reg() are available at the end of the email; I preferred going directly into the problematic part. Here is the code (I tried to simplify it but I cannot do it too much or else it

insert missing dates

2012 Jul 03

insert missing dates

Hello I have dataframes. mydata1 <-data.frame(value=c(15,20,25,30,45,50),dates=c("2005-05-25 07:00:00 ","2005-05-25 19:00:00","2005-06-25 07:00:00","2005-06-25 19:00:00 ","2005-07-25 07:00:00","2005-8-25 19:00:00")) or mydata2 <-data.frame(value=c(15,20,25,30,45,50),dates=c("2005-05-25 00:00:00 ","2005-05-25

randomForest memory footprint

2011 Sep 07

randomForest memory footprint

Hello, I am attempting to train a random forest model using the randomForest package on 500,000 rows and 8 columns (7 predictors, 1 response). The data set is the first block of data from the UCI Machine Learning Repo dataset "Record Linkage Comparison Patterns" with the slight modification that I dropped two columns with lots of NA's and I used knn imputation to fill in other gaps.

GLS - Plotting Graphs with 95% conf interval

2011 Jul 11

GLS - Plotting Graphs with 95% conf interval

Hi, I am trying to plot the original data with the line of the model using the predict function. I want to add SE to the graph, but not sure how to get them out as the predict function for gls does not appear to allow for SE=TRUE argument. Here is my code so far: f1<-formula(MaxNASC40_50~hu3+flcmax+TidalFlag) vf1Exp<-varExp(form=~hu3) B1D<-gls(f1,correlation=corGaus(form=Lat~Lon,

How can I import user-defined missings from Spss?

2008 Apr 15

How can I import user-defined missings from Spss?

Hi, It works for me to import spss datasets via library(foreign) with read.spss or via library Hmisc by (spss.get). But no matter which way I do import the data, user-defined missings from Spss are always lost. (it makes no difference if there are a single value, a range, or any combination of them. They are always ignored). Is there any way in R to find out if any value was user-defined missing

Semi Parametric Bootstrap

2013 Jan 10

Semi Parametric Bootstrap

Greetings to you all, I am performing a semi parametric bootstrap in R on a Gamma Distributed data and a Binomial distributed data. The main challenge am facing is the fact that the residual variance depends on the mean (if I am correct). I strongly feel that the script below may be wrong due to mean-variance relationship #####R code####### fit1s

creating NAs for some values only

2011 Feb 13

creating NAs for some values only

Hello, I have some data file, say, mydata 1,2,3,4,5,6,7 3,3,4,4,w,w,1 w,3,6,5,7,8,9 4,4,w,5,3,3,0 i want to replace some percentages of "mydata" file in to NAs for those values that are NOT w's. I know how to apply the percentage thing here but don't know how to select those values that are not "w"s. So far, i was able to do it but the result replaces the w's

stats 'dist' euclidean distance calculation

2018 Mar 15

stats 'dist' euclidean distance calculation

Hello, I am working with a matrix of multilocus genotypes for ~180 individual snail samples, with substantial missing data. I am trying to calculate the pairwise genetic distance between individuals using the stats package 'dist' function, using euclidean distance. I took a subset of this dataset (3 samples x 3 loci) to test how euclidean distance is calculated: 3x3 subset used

confidence intervals for mean (GLM)

2010 Jan 22

confidence intervals for mean (GLM)

Dear useRs, How could I obtain the confidence intervals for the means of my treatments, when my data was fitted to a GLM? I need the CI's for the Poisson and Negative Binomial distributions. Here's what I have: mydata1 <- data.frame('treatments'=gl(4,20), 'value'=rpois(80, 1)) model1 <- glm(value ~ treatments, data=mydata1, family=poisson) means1 <-

Kolmogorov Smirnov Test

2010 Nov 11

Kolmogorov Smirnov Test

I'm using ks.test (mydata, dnorm) on my data. I know some of my different variable samples (mydata1, mydata2, etc) must be normally distributed but the p value is always < 2.0^-16 (the 2.0 can change but not the exponent). I want to test mydata against a normal distribution. What could I be doing wrong? I tried instead using rnorm to create a normal distribution: y = rnorm

Error Handling

2008 Jun 24

Error Handling

Hi All, The for-loop below stopped when error("Cannot get confidence intervals on var-cov components: Non-positive definite approximate variance-covariance") occurred. I assigned a row of NA values to the data frame "m1" manually and reset "j" in the for-loop every time error returned. I’m wondering if there is a function that can detect error or failure, so the

How do you save in R?

2009 May 18

How do you save in R?

I know it sounds like a silly question but whenever i click on "save to file" it doesn't save. whenever i use the function attach(___) it doesn't work, and says object can not be found. i have a series of data (0,0,0,1,1) that i need to save, then i want to attach(...) it in another R window. Please help. Thanks -- View this message in context:

Sweaving single master file to get multiple individualised reports

2007 Nov 08

Sweaving single master file to get multiple individualised reports

Hi Apologies in advance if I've missed something obvious. I have read the Sweave manual, the first article in R News, looked at the Help pages, googled Sweave and words like loop, output, files, multiple, done much the same on R site search (in case I missed something on Google) and I couldn't find exactly what I'm after. What I'm trying to do ?---------------------- Make

similar to: problem about set operation and computation after split