thr3ads.net - similar to: "selecting first row of a variable with long-format data"

Displaying 20 results from an estimated 10000 matches similar to: "selecting first row of a variable with long-format data"

aggregating using 'with' function

2010 Feb 20

aggregating using 'with' function

Hi All, I am interested in aggregating a data frame based on 2 categories--mean effect size (r) for each 'id's' 'mod1'. The 'with' function works well when aggregating on one category (e.g., based on 'id' below) but doesnt work if I try 2 categories. How can this be accomplished? # sample data id<-c(1,1,1,rep(4:12)) n<-c(10,20,13,22,28,12,12,36,19,12,

Data.frame manipulation

2010 Jan 28

Data.frame manipulation

Hi All, I'm conducting a meta-analysis and have taken a data.frame with multiple rows per study (for each effect size) and performed a weighted average of effect size for each study. This results in a reduced # of rows. I am particularly interested in simply reducing the additional variables in the data.frame to the first row of the corresponding id variable. For example:

long format - find age when another variable is first 'high'

2009 May 25

long format - find age when another variable is first 'high'

Dear R, I've got a data frame with children examined multiple times and at various ages. I'm trying to find the first age at which another variable (LDL-Cholesterol) is >= 130 mg/dL; for some children, this may never happen. I can do this with transformBy and ddply, but with 10,000 different children, these functions take some time on my PCs - is there a faster way to do this in R?

Take variables in data.frame and create list of matrices

2011 Nov 03

Take variables in data.frame and create list of matrices

Hi, I have this sample data below and would like to create a list of matricies. setseed(1254) id <- c(1,1,1,1 ,2,2,2) o <- as.factor(c(1:4, 1, 3, 4)) r <- rep(.5, 7) v <- rnorm(7) s <- rnorm(7) dat <-data.frame(id, o, r, v, s) dat #> dat # id o r v s # 1 1 0.5 0.7024631 2.0813672 # 1 2 0.5 -0.5541955 0.1095156 # 1 3 0.5 -1.0418167 0.4164930 # 1

the first and last observation for each subject

2009 Jan 02

the first and last observation for each subject

I have the following data ID x y time 1 10 20 0 1 10 30 1 1 10 40 2 2 12 23 0 2 12 25 1 2 12 28 2 2 12 38 3 3 5 10 0 3 5 15 2 ..... x is time invariant, ID is the subject id number, y is changing over time. I want to find out the difference between the first and last observed y value for each subject and get a table like ID x y 1 10 20 2 12 15 3 5 5 ...... Is there any easy way to generate

Removing rows in dataframe w'o duplicated values

2011 Nov 22

Removing rows in dataframe w'o duplicated values

Hi, Is there an easy way to remove dataframe rows without duplicated values of a specified column ('id')? e.g., dat <- data.frame(id = c(1,1,1,2,3,3), value = c(5,6,7,4,5,4), value2 = c(1,4,3,3,4,3)) dat id value value2 1 1 5 1 2 1 6 4 3 1 7 3 4 2 4 3 5 3 5 4 6 3 4 3 This is sample data and the real data has hundreds of

Equivalent of 'first.var' or 'last.var' from SAS in R?

2008 Sep 25

Equivalent of 'first.var' or 'last.var' from SAS in R?

Hi, I want to sort a data frame by multiple columns and then take the first record in each unique level of the "by" group I used to sort the data frame. Does someone have an example of how to do this? Thanks, Matt -- It is from the wellspring of our despair and the places that we are broken that we come to repair the world. -- Murray Waas

Retaining variable name in a function

2010 Mar 17

Retaining variable name in a function

Hi All, Im interested in creating a function that will convert a variable within a data.frame to a factor while retaining the original name (yes, I know that I can just: var <-factor(var) but I need it as a function for other purposes). e.g.: # this was an attempt but fails. facts <- function(meta, mod, modname = "spec") { meta$mod <- factor(meta$mod)

how do I calculate means or cov matrix for multivariate groups

2010 Feb 22

how do I calculate means or cov matrix for multivariate groups

Hello, Having the matrix d > d value value2 class 1 1 1 x 2 2 2 x 3 3 3 x 4 4 2 x 5 5 1 y 6 11 3 y 7 12 4 z 8 13 5 z 9 14 6 z 10 15 7 z I want to calculate the means and cov matrix for groups x,y,z. I know how to do it the long way. I tried to use tapply and

speeding up regressions using ddply

2010 Sep 22

speeding up regressions using ddply

Hi, I have a data set that I'd like to run logistic regressions on, using ddply to speed up the computation of many models with different combinations of variables. I would like to run regressions on every unique two-variable combination in a portion of my data set, but I can't quite figure out how to do using ddply. The data set looks like this, with "status" as

ddply from plyr package - any alternatives?

2011 Aug 24

ddply from plyr package - any alternatives?

Hello everyone, I was asked to repost this again, sorry for any inconvenience. I'm looking replacement for ddply function from plyr package. Function allows to apply function by category stored in any column/columns. Regular loops or lapplys slow down greatly because my unique combination count exceeds 9000. Is there any available solution which allow me to apply function by category?

sum specific rows in a data frame

2010 Apr 14

sum specific rows in a data frame

I have a data frame called "pose": DESCRIPTION QUANITY CLOSING.PRICE 1 WHEAT May/10 1 467.75 2 WHEAT May/10 2 467.75 3 WHEAT May/10 1 467.75 4 WHEAT May/10 1 467.75 5 COTTON NO.2 May/10 1 78.13 6 COTTON NO.2 May/10 3 78.13 7 COTTON NO.2 May/10 1 78.13

ddply - how to transform df column "in place"

2011 Aug 23

ddply - how to transform df column "in place"

Dear R-users, I am trying to get the plyr syntax right, without much success. Given: d<- data.frame(cbind(x=1,y=seq(20100801,20100830,1))) names(d)<-c("first", "daterep") d2<-d # I can convert the daterep column in place the classic way: d$daterep<-as.Date(strptime(d$daterep, format="%Y%m%d")) # How to do it the plyr way? ddply(d2,

big big problem

2010 Jun 17

big big problem

Dear list, I'll try to be more clear in explaining my problem. I have a data frame like this called X: CLUSTER YEAR variable value1 value2 M1 2005 EC01 NA NA M1 2006 EC01 2 5 M1 2007

New PLYR issue

2012 Jan 17

New PLYR issue

Hello everyone, I have got the same problem, with the same error message. Using R 2.14.1, plyr 1.7.1, R.Studio 0.94.110, Windows XP The plyr mailing list does not provide any help until now. >require(plyr) >c(sample(c(1:100), 50, replace=TRUE))->V1 >c(rep( 1:5, 10))->f1 #variable to group V1 >data.frame(cbind(V1, f1))->DF >str(DF) >ddply(DF$V1, DF$f1,

Subsetting for the ten highest values by group in a dataframe

2012 Jan 27

Subsetting for the ten highest values by group in a dataframe

Hello, I am looking for a way to subset a data frame by choosing the top ten maximum values from that dataframe. As well this occurs within some factor levels. ## I've used plyr here but I'm not married to this approach require(plyr) ## I've created a data.frame with two groups and then a id variable (y) df <- data.frame(x=rnorm(400, mean=20), y=1:400,

plyr and table question

2009 Apr 03

plyr and table question

Dear all, I'm puzzled by the following example inspired by a recent question on R-help, cc <- textConnection("user_id website time 20 google 0930 21 yahoo 0935 20 facebook 1000 25 facebook 1015 61 google 0940") d <- read.table(cc, head=T) ; close(cc) table(d$user_id) # count the

ddply to count frequency of combinations

2011 Jun 21

ddply to count frequency of combinations

I have a dataframe df with two columns x and y. I want to count the number of times a unique x, y combination occurs. For example x<- c(1,2,3,4,5,1,2,3,4) y<- c(1,2,3,4,5,1,2,4,1) df<-as.data.frame(cbind(x, y)) #what is the correct way to use ddply for this example? ddply(df, c('x','y', summarize, ??) #desired output -- format and order doesn't matter # (x, y)

Selecting n observation

2012 Oct 11

Selecting n observation

Hello R help, I have a question similar to what is posted by someone before. my problem is that Instead of last assessment, I want to choose last two. I have a data set with several time assessments for each participant. I want to select the last assessment for each participant. My dataset looks like this: ID week outcome 1 2 14 1 4 28 1 6 42 4 2 14 4 6 46 4 9 64 4 9

[plyr] Moving average filter with plyr

2013 Aug 27

[plyr] Moving average filter with plyr

Dear all, I'm stuck with a problem using plyr to process a rather large junk of data. What I'm trying to do is applying a moving average to all the subparts of the dataframe (the example data can be found here https://dl.dropboxusercontent.com/u/2414056/testData.Rdata). require(plyr) load("testData.Rdata") applyfilter<-function(x){ return(filter(x,rep(1/5, times=5))) }

similar to: selecting first row of a variable with long-format data