Displaying 20 results from an estimated 400 matches similar to: "splitting and saving a large dataframe"
2005 Oct 20
5
spliting an integer
Hi there,
From the vector X of integers,
X = c(11999, 122000, 81997)
I would like to make these two vectors:
Z= c(1999, 2000, 1997)
Y =c(1 , 12 , 8)
That is, each entry of vector Z receives the four last digits of each entry of X, and Y receives "the rest".
Any suggestions?
Thanks in advance,
Dimitri
[[alternative HTML version deleted]]
2005 Oct 25
2
Inf in regressions
Hi,
Suppose I I wish to run
lm( y ~ x + z + log(w) )
where w assumes non-negative values. A problem arises when w=0, as log(0)
= -Inf, and R doesn't accept that (as it "accepts" NA). Is there a way to
tell R to do with -Inf the same it does with NA, i.e, to ignore it? (
Otherwise I have to do something like
w[w==0] <- NA
which doesn't hurt, but might be a bit
2006 Jan 26
1
efficiency with "%*%"
Hi,
x and y are (numeric) vectors. I wonder if one of the following is more
efficient than the other:
x%*%y
or
sum(x*y)
?
Thanks,
Dimitri Szerman
2006 Jul 05
1
creating a data frame from a list
Dear all,
I have a list with three (named) numeric vectors:
> lst = list(a=c(A=1,B=8) , b=c(A=2,B=3,C=0), c=c(B=2,D=0) )
> lst
$a
A B
1 8
$b
A B C
2 3 0
$c
B D
2 0
Now, I'd love to use this list to create the following data frame:
> dtf = data.frame(a=c(A=1,B=8,C=NA,D=NA),
+ b=c(A=2,B=3,C=0,D=NA),
+ c=c(A=NA,B=2,C=NA,D=0) )
> dtf
a b
2006 Jul 12
1
help in vectorization
Hi,
I have two data frames. One is like
> dtf = data.frame(y=c(rep(2002,4), rep(2003,5)),
+ m=c(9:12, 1:5),
+ def=c(.74,.75,.76,.78,.80,.82,.85,.85,.87))
and the other
dtf2 = data.frame(y=rep( c(2002,2003),20),
m=c(trunc(runif(20,1,5)),trunc(runif(20,9,12))),
inc=rnorm(40,mean=300,sd=150) )
What I want is to divide
2011 Apr 07
1
Assigning a larger number of levels to a factor that has fewer levels
Hello!
I have larger and a smaller data frame with 1 factor in each - it's
the same factor:
large.frame<-data.frame(myfactor=LETTERS[1:10])
small.frame<-data.frame(myfactor=LETTERS[c(9,7,5,3,1)])
levels(large.frame$myfactor)
levels(small.frame$myfactor)
table(large.frame$myfactor)
table(small.frame$myfactor)
myfactor has 10 levels in large.frame and 5 levels in small.frame. All
5
2012 Mar 28
1
discrepancy between paired t test and glht on lme models
Hi folks,
I am working with repeated measures data and I ran into issues where the
paired t-test results did not match those obtained by employing glht()
contrasts on a lme model. While the lme model itself appears to be fine,
there seems to be some discrepancy with using glht() on the lme model
(unless I am missing something here). I was wondering if someone could
help identify the issue. On
2009 May 06
4
tapply changing order of factor levels?
Hi,
Does tapply change the order when applied on a factor? Below is the code I
tried.
> mylevels<-c("IN0020020155","IN0019800021","IN0020020064")
>
2012 Jan 18
1
drop rare factors
I have a data frame with some factor columns.
I want to drop the rows with rare factor values
(and remove the factor values from the factors).
E.g., frame$MyFactor takes values
A 1,000 times,
B 2,000 times,
C 30 times and
D 4 times.
I want to remove all rows which assume rare values (<1%), i.e., C and D.
i.e.,
frame <- frame[[! (frame$MyFactor %in% c("A","B"))]]
except
2010 Jul 05
2
repeated measures with missing data
Dear R help group, I am teaching myself linear mixed models with missing data since I would like to analyze a stats design with these kind of models. The textbook example is for the procedure "proc MIXED" in SAS, but I would like to know if there is an equivalent in R. This example only includes two time-measurements across subjects (a t-test "with missing values"), but I
2012 Nov 24
1
Adding a new variable to each element of a list
Hello,
I have a list of data with multiple elements, and each element in the list
has multiple variables in it. Here's an example:
### Make the fake data
dv <- c(1,3,4,2,2,3,2,5,6,3,4,4,3,5,6)
subject <- factor(c("s1","s1","s1","s2","s2","s2","s3","s3","s3",
2006 Apr 26
1
help using tapply
Dear R-mates,
# Here's what I am trying to do. I have a dataset like this:
id = c(rep(1,8), rep(2,8))
dur1 <- c( 17,18,19,18,24,19,24,24 )
est1 <- c( rep(1,5), rep(2,3) )
dur2 <- c(1,1,3,4,8,12,13,14)
est2 <- rep(1,8)
mydata = data.frame(id,
estat=c(est1, est2),
durat=c(dur1, dur2))
# I want to one have this:
id = c(rep(1,8), rep(2,8))
2011 Mar 30
2
summing values by week - based on daily dates - but with some dates missing
Dear everybody,
I have the following challenge. I have a data set with 2 subgroups,
dates (days), and corresponding values (see example code below).
Within each subgroup: I need to aggregate (sum) the values by week -
for weeks that start on a Monday (for example, 2008-12-29 was a
Monday).
I find it difficult because I have missing dates in my data - so that
sometimes I don't even have the
2009 Mar 03
1
repeated measures anova, sphericity, epsilon, etc
I have 3 questions (below).
Background: I am teaching an introductory statistics course in which we are
covering (among other things) repeated measures anova. This time around
teaching it, we are using R for all of our computations. We are starting by
covering the univariate approach to repeated measures anova.
Doing a basic repeated measures anova (univariate approach) using aov()
seems
2011 Nov 04
2
Efficiency of factor objects
R factors are the natural way to represent factors -- and should be
efficient since they use small integers. But in fact, for many (but
not all) operations, R factors are considerably slower than integers,
or even character strings. This appears to be because whenever a
factor vector is subsetted, the entire levels vector is copied. For
example:
> i1 <- sample(1e4,1e6,replace=T)
> c1
2007 Apr 06
2
lm() intercept at the end, rather than at the beginning
Hi,
I wonder if someone has already figured out a way of making
summary(mylm) # where mylm is an object of the class lm()
to print the "(Intercept)" at the last line, rather than the first
line of the output. I don't know about, say, biostatistics, but in
economics the intercept is usually the least interesting of the
parameters of a regression model. That's why, say, Stata
2003 Jun 04
2
convert factor to numeric
Hi R-experts!
Every once in a while I need to convert a factor to a vector of numeric
values. as.numeric(myfactor) of course returns a nice numeric vector of
the indexes of the levels which is usually not what I had in mind:
> v <- c(25, 3.78, 16.5, 37, 109)
> f <- factor(v)
> f
[1] 25 3.78 16.5 37 109
Levels: 3.78 16.5 25 37 109
> as.numeric(f)
[1] 3 1 2 4 5
>
What I
2003 Nov 25
2
O2 optimization produces wrong code (PR#5315)
Full_Name: jean coursol
Version: 1.7.1, 1.8.0
OS: linux & Windows-XP
Submission from: (NULL) (129.175.52.7)
Binary MS-Windows akima module from CRAN (1.8.0 version) produces wrong results
with some data.
Installing akima source in linux, with same data:
-with gcc-2.95.3 -O2 : give correct results (under R 1.7.1);
-with gcc-3.2.3 -O2 : give wrong results (under R-1.7.1 and R-1.8.0);
-with
2005 Jun 24
2
Gini with frequencies
Hi there,
I am trying to compute Gini coefficients for vectors containing income classes. The data I possess look loke this:
yit <- c(135, 164, 234, 369)
piit <- c(367, 884, 341, 74 )
where yit is the vector of income classes, and fit is the vector of associated frequencies.(This data is from Rustichini, Ichino and Checci (Journal of Public Economics, 1999) ). In ineq pacakge, Gini( )
2007 Apr 05
2
creating a data frame from a list
Dear all,
A few months ago, I asked for your help on the following problem:
I have a list with three (named) numeric vectors:
> lst = list(a=c(A=1,B=8) , b=c(A=2,B=3,C=0), c=c(B=2,D=0) )
> lst
$a
A B
1 8
$b
A B C
2 3 0
$c
B D
2 0
Now, I'd love to use this list to create the following data frame:
> dtf = data.frame(a=c(A=1,B=8,C=NA,D=NA),
+