thr3ads.net - similar to: "splitting and saving a large dataframe"

Displaying 20 results from an estimated 400 matches similar to: "splitting and saving a large dataframe"

2005 Oct 20

spliting an integer

Hi there, From the vector X of integers, X = c(11999, 122000, 81997) I would like to make these two vectors: Z= c(1999, 2000, 1997) Y =c(1 , 12 , 8) That is, each entry of vector Z receives the four last digits of each entry of X, and Y receives "the rest". Any suggestions? Thanks in advance, Dimitri [[alternative HTML version deleted]]

Inf in regressions

2005 Oct 25

Inf in regressions

Hi, Suppose I I wish to run lm( y ~ x + z + log(w) ) where w assumes non-negative values. A problem arises when w=0, as log(0) = -Inf, and R doesn't accept that (as it "accepts" NA). Is there a way to tell R to do with -Inf the same it does with NA, i.e, to ignore it? ( Otherwise I have to do something like w[w==0] <- NA which doesn't hurt, but might be a bit

efficiency with "%*%"

2006 Jan 26

efficiency with "%*%"

Hi, x and y are (numeric) vectors. I wonder if one of the following is more efficient than the other: x%*%y or sum(x*y) ? Thanks, Dimitri Szerman

creating a data frame from a list

2006 Jul 05

creating a data frame from a list

Dear all, I have a list with three (named) numeric vectors: > lst = list(a=c(A=1,B=8) , b=c(A=2,B=3,C=0), c=c(B=2,D=0) ) > lst $a A B 1 8 $b A B C 2 3 0 $c B D 2 0 Now, I'd love to use this list to create the following data frame: > dtf = data.frame(a=c(A=1,B=8,C=NA,D=NA), + b=c(A=2,B=3,C=0,D=NA), + c=c(A=NA,B=2,C=NA,D=0) ) > dtf a b

help in vectorization

2006 Jul 12

help in vectorization

Hi, I have two data frames. One is like > dtf = data.frame(y=c(rep(2002,4), rep(2003,5)), + m=c(9:12, 1:5), + def=c(.74,.75,.76,.78,.80,.82,.85,.85,.87)) and the other dtf2 = data.frame(y=rep( c(2002,2003),20), m=c(trunc(runif(20,1,5)),trunc(runif(20,9,12))), inc=rnorm(40,mean=300,sd=150) ) What I want is to divide

Assigning a larger number of levels to a factor that has fewer levels

2011 Apr 07

Assigning a larger number of levels to a factor that has fewer levels

Hello! I have larger and a smaller data frame with 1 factor in each - it's the same factor: large.frame<-data.frame(myfactor=LETTERS[1:10]) small.frame<-data.frame(myfactor=LETTERS[c(9,7,5,3,1)]) levels(large.frame$myfactor) levels(small.frame$myfactor) table(large.frame$myfactor) table(small.frame$myfactor) myfactor has 10 levels in large.frame and 5 levels in small.frame. All 5

discrepancy between paired t test and glht on lme models

2012 Mar 28

discrepancy between paired t test and glht on lme models

Hi folks, I am working with repeated measures data and I ran into issues where the paired t-test results did not match those obtained by employing glht() contrasts on a lme model. While the lme model itself appears to be fine, there seems to be some discrepancy with using glht() on the lme model (unless I am missing something here). I was wondering if someone could help identify the issue. On

tapply changing order of factor levels?

2009 May 06

tapply changing order of factor levels?

Hi, Does tapply change the order when applied on a factor? Below is the code I tried. > mylevels<-c("IN0020020155","IN0019800021","IN0020020064") >

drop rare factors

2012 Jan 18

drop rare factors

I have a data frame with some factor columns. I want to drop the rows with rare factor values (and remove the factor values from the factors). E.g., frame$MyFactor takes values A 1,000 times, B 2,000 times, C 30 times and D 4 times. I want to remove all rows which assume rare values (<1%), i.e., C and D. i.e., frame <- frame[[! (frame$MyFactor %in% c("A","B"))]] except

repeated measures with missing data

2010 Jul 05

repeated measures with missing data

Dear R help group, I am teaching myself linear mixed models with missing data since I would like to analyze a stats design with these kind of models. The textbook example is for the procedure "proc MIXED" in SAS, but I would like to know if there is an equivalent in R. This example only includes two time-measurements across subjects (a t-test "with missing values"), but I

Adding a new variable to each element of a list

2012 Nov 24

Adding a new variable to each element of a list

Hello, I have a list of data with multiple elements, and each element in the list has multiple variables in it. Here's an example: ### Make the fake data dv <- c(1,3,4,2,2,3,2,5,6,3,4,4,3,5,6) subject <- factor(c("s1","s1","s1","s2","s2","s2","s3","s3","s3",

help using tapply

2006 Apr 26

help using tapply

Dear R-mates, # Here's what I am trying to do. I have a dataset like this: id = c(rep(1,8), rep(2,8)) dur1 <- c( 17,18,19,18,24,19,24,24 ) est1 <- c( rep(1,5), rep(2,3) ) dur2 <- c(1,1,3,4,8,12,13,14) est2 <- rep(1,8) mydata = data.frame(id, estat=c(est1, est2), durat=c(dur1, dur2)) # I want to one have this: id = c(rep(1,8), rep(2,8))

summing values by week - based on daily dates - but with some dates missing

2011 Mar 30

summing values by week - based on daily dates - but with some dates missing

Dear everybody, I have the following challenge. I have a data set with 2 subgroups, dates (days), and corresponding values (see example code below). Within each subgroup: I need to aggregate (sum) the values by week - for weeks that start on a Monday (for example, 2008-12-29 was a Monday). I find it difficult because I have missing dates in my data - so that sometimes I don't even have the

repeated measures anova, sphericity, epsilon, etc

2009 Mar 03

repeated measures anova, sphericity, epsilon, etc

I have 3 questions (below). Background: I am teaching an introductory statistics course in which we are covering (among other things) repeated measures anova. This time around teaching it, we are using R for all of our computations. We are starting by covering the univariate approach to repeated measures anova. Doing a basic repeated measures anova (univariate approach) using aov() seems

Efficiency of factor objects

2011 Nov 04

Efficiency of factor objects

R factors are the natural way to represent factors -- and should be efficient since they use small integers. But in fact, for many (but not all) operations, R factors are considerably slower than integers, or even character strings. This appears to be because whenever a factor vector is subsetted, the entire levels vector is copied. For example: > i1 <- sample(1e4,1e6,replace=T) > c1

lm() intercept at the end, rather than at the beginning

2007 Apr 06

lm() intercept at the end, rather than at the beginning

Hi, I wonder if someone has already figured out a way of making summary(mylm) # where mylm is an object of the class lm() to print the "(Intercept)" at the last line, rather than the first line of the output. I don't know about, say, biostatistics, but in economics the intercept is usually the least interesting of the parameters of a regression model. That's why, say, Stata

convert factor to numeric

2003 Jun 04

convert factor to numeric

Hi R-experts! Every once in a while I need to convert a factor to a vector of numeric values. as.numeric(myfactor) of course returns a nice numeric vector of the indexes of the levels which is usually not what I had in mind: > v <- c(25, 3.78, 16.5, 37, 109) > f <- factor(v) > f [1] 25 3.78 16.5 37 109 Levels: 3.78 16.5 25 37 109 > as.numeric(f) [1] 3 1 2 4 5 > What I

O2 optimization produces wrong code (PR#5315)

2003 Nov 25

O2 optimization produces wrong code (PR#5315)

Full_Name: jean coursol Version: 1.7.1, 1.8.0 OS: linux & Windows-XP Submission from: (NULL) (129.175.52.7) Binary MS-Windows akima module from CRAN (1.8.0 version) produces wrong results with some data. Installing akima source in linux, with same data: -with gcc-2.95.3 -O2 : give correct results (under R 1.7.1); -with gcc-3.2.3 -O2 : give wrong results (under R-1.7.1 and R-1.8.0); -with

Gini with frequencies

2005 Jun 24

Gini with frequencies

Hi there, I am trying to compute Gini coefficients for vectors containing income classes. The data I possess look loke this: yit <- c(135, 164, 234, 369) piit <- c(367, 884, 341, 74 ) where yit is the vector of income classes, and fit is the vector of associated frequencies.(This data is from Rustichini, Ichino and Checci (Journal of Public Economics, 1999) ). In ineq pacakge, Gini( )

creating a data frame from a list

2007 Apr 05

creating a data frame from a list

Dear all, A few months ago, I asked for your help on the following problem: I have a list with three (named) numeric vectors: > lst = list(a=c(A=1,B=8) , b=c(A=2,B=3,C=0), c=c(B=2,D=0) ) > lst $a A B 1 8 $b A B C 2 3 0 $c B D 2 0 Now, I'd love to use this list to create the following data frame: > dtf = data.frame(a=c(A=1,B=8,C=NA,D=NA), +

similar to: splitting and saving a large dataframe