thr3ads.net - similar to: "efficiently picking one row from a data frame per unique key"

Displaying 20 results from an estimated 2000 matches similar to: "efficiently picking one row from a data frame per unique key"

Error in unsplit() with tibbles

2020 Nov 21

Error in unsplit() with tibbles

Hello, using the `unsplit()` function with tibbles currently leads to the following error: > mtcars_tb <- as_tibble(mtcars, rownames = NULL) > s <- split(mtcars_tb, mtcars_tb$gear) > unsplit(s, mtcars_tb$gear) Error: Must subset rows with a valid subscript vector. ? Logical subscripts must match the size of the indexed input. x Input has size 15 but subscript `rep(NA, len)` has

[R] bug in unsplit()? (PR#1843)

2002 Jul 28

[R] bug in unsplit()? (PR#1843)

Hedderik van Rijn <hedderik@cmu.edu> writes: > If the second argument to unsplit is not a simple vector (but a "list > containing multiple lists"), the function seems to have some problems. > > Given a slight modification of the examples in help(split): > > > xg <- split(x,list(g1=g,g2=g)) > > unsplit(xg,list(g1=g,g2=g)) > [1] -0.7877109

Error in unsplit() with tibbles

2020 Nov 21

Error in unsplit() with tibbles

I get the sentiment, but this is really just bad coding (on my own part, I suspect), so we might as well just fix it... -pd > On 21 Nov 2020, at 17:42 , Marc Schwartz via R-devel <r-devel at r-project.org> wrote: > > >> On Nov 21, 2020, at 10:55 AM, Mario Annau <mario.annau at gmail.com> wrote: >> >> Hello, >> >> using the `unsplit()`

Using split and then unsplit

2010 Apr 19

Using split and then unsplit

Hello everyone, I use the split function splitting with the f function on a 3 columns and more than 100 000 rows data frame. Once it's split I have a list of data frames still with 3 columns and n rows. I manipulate those list elements and get a list of data frames still with 3 columns but less rows. So when I unsplit it, I get an error as I use the same factor function I used to split ( f in

NAs in unsplit factor

2006 Jun 08

NAs in unsplit factor

R-devel, Below is a simple example calling split and unsplit on a numeric vector of length 2 where 'f' is c(1,NA). > unsplit(split(c(1,2), c(1,NA)), c(1,NA)) [1] 1 0 I noticed that the call to vector in unsplit gives us 0 as the 2nd element of the result. Is this the intended result, as opposed to NA? Thanks for your help, Jeff -- Jeff Enos Kane Capital Management jeff at

Using unsplit - unsplit does not seem to reverse the effect of split

2005 Sep 27

Using unsplit - unsplit does not seem to reverse the effect of split

In data OME in MASS I would like to extract the first 5 observations per subject (=ID). So I do library(MASS) OMEsub <- split(OME, OME$ID) OMEsub <- lapply(OMEsub,function(x)x[1:5,]) unsplit(OMEsub, OME$ID) - which results in [[1]] [1] 1 1 1 1 1 [[2]] [1] 30 30 30 30 30 [[3]] [1] low low low low low Levels: N/A high low [[4]] [1] 35 35 40 40 45 [[5]] [1] coherent incoherent coherent

unsplit list of data.frames with one column

2009 May 08

unsplit list of data.frames with one column

Perhaps this is the intended behavior, but I discovered that unsplit throws an error when it tries to set rownames of a variable that has no dimension. This occurs when unsplit is passed a list of data.frames that have only a single column. An example: df <- data.frame(letters[seq(25)]) fac <- rep(seq(5), 5) unsplit(split(df, fac), fac) For reference, I'm using R version 2.9.0

Problems with unsplit()

2011 May 19

Problems with unsplit()

Hi everyone, I have already used split() and unsplit() in data frames without problems, but now I’m applying these functions to other data and when using unsplit() I have received the following message: Error in `row.names<-.data.frame`(`*tmp*`, value = c("1", "2", "3", "4", : duplicate ''row.names'' are not allowed In

getting percentiles by factor

2011 Mar 10

getting percentiles by factor

Hello, I'm trying to get percentiles (PERCENTRANK for excel users) by factor in the following data.frame: myExample <- data.frame(Ret=seq(-2, 2.5, by=0.5),PE=seq(10,19),Sectors=rep(c("Financial","Industrial"),5)) myExample <- na.omit(myExample) Thanks to Patrick I I managed to put together the following lines which does it for the "Ret" column: myecdf

Efficient cbind of elements from two lists

2009 Nov 19

Efficient cbind of elements from two lists

Hi! I have a data.frame "data" and splitted it. data <- split(data, data[,1]) This is a quite slow procedure; and I do not want to do it again. So, any unsplit and "resplit" is no option for me. But: I have to cbind "variables" to the splitted data from another list, that contains of vectors with matching sizes, so for (i in 1:length(data)) { data[[i]]

Coda: On the efficiency of unsplit() for Rolf Turner's recent post

2024 Oct 06

Coda: On the efficiency of unsplit() for Rolf Turner's recent post

(only of interest -- maybe! -- to those who followed this thread of a couple of weeks ago) Just for the heckuva it, I compared the timing of Deepayan's unsplit(x,f) solution to my as.vector(do.call(rbind, x)) approach to the query for a list of 3 vectors each of length 1000 (the original toy example was for a list of 3 vectors of length 5). Unsurprisingly, I think, because the unsplit()

Is there a sexy way ...?

2024 Sep 27

Is there a sexy way ...?

>>>>> Chris Evans via R-help >>>>> on Fri, 27 Sep 2024 12:20:47 +0200 writes: > Oh glorious!? Thanks Duncan. > Fortune cookie nomination! I don't disagree with the nomination -- thank you, Duncan! However, please note that I'm sure Rolf's was challenged / question was ment to work correctly for all factors `f` with levels

group means: split and unsplit

2005 Jun 25

group means: split and unsplit

Took me a while but I figured out how to put in common values of group means/counts, etc. to do the same thing as egen. lapply with split and then unsplit. Thomas Davidoff Assistant Professor Haas School of Business UC Berkeley Berkeley, CA 94720 phone: (510) 643-1425 fax: (510) 643-7357 davidoff@haas.berkeley.edu http://faculty.haas.berkeley.edu/davidoff [[alternative HTML

Possible bug in "unsplit" (PR#14084)

2009 Nov 25

Possible bug in "unsplit" (PR#14084)

Dear R-bug-people I have encountered a problem with "unsplit", which I believe may be caused by a bug in the function. However, unexpericend with bug-reports I apologise if this is barely a user problem rather than a problem within R. The problem occurs if an object is split by several grouping factors with levels not occuring in the data, and using drop = TRUE. This may appear as

Problem with POSIXct in ave

2010 Aug 20

Problem with POSIXct in ave

Hi, I am having trouble using the ave function with a POSIXct object. For example: x<-Sys.time()+0:9*3600 dat<-data.frame(id=rep(c('a',' b','c'),each=10),dt=rep(x,3),i=rep(1:10,3)) dat # This is what I want to do: dat$time.elapsed<-unsplit(lapply(split(dat$dt,dat$id),function(x) x-x[1]),f=dat$id) dat # The above code does the trick, but from the standpoint of

Scalling/Centering the Data by an Index

2006 Jul 13

Scalling/Centering the Data by an Index

Dear All: I would like to center the data in 'x' by 'group'. The following code scale the data and I have not been able to figure out how to change it so I get the centered data. x <- c(1, 2, 3, 4, 5, 6, 7, 8) group <- c(1,1,1,2,2,2,2,2) unsplit(lapply(split(x,group),scale),group) I would appreciate your help. Ashraf

Cumsum in Lattice Panel Function

2011 May 06

Cumsum in Lattice Panel Function

I'm trying to create an xyplot with a "groups" argument where the y-variable is the cumsum of the values stored in the input data frame. I almost have it, but I can't get it to automatically adjust the y-axis scale. How do I get the y-axis to automatically scale as it would have if the cumsum values had been stored in the data frame? Here is the code I have so far:

Clustered standard errors in a panel

2005 Jul 21

Clustered standard errors in a panel

I want to do the following: glm(y ~ x1 + x2 +...) within a panel. Hence y, x1, and x2 all vary at the individual level. However, there is likely correlation of these variables within an individual, so standard errors need adjustment. I do not want to estimate fixed effects, but do want to cluster standard errors at the individual level. Is there an automated way to do this? Nothing in

Error in unsplit() with tibbles

2020 Nov 21

Error in unsplit() with tibbles

Cool - thank you Peter! @Marc: This is really not a tidyverse vs base-R debate and I personally think that they should both work together for most parts. The common environment is still R. But just to give you the full picture I also filed a bug for tibbles (https://github.com/tidyverse/tibble/issues/829). With these two fixes I think that split/unsplit would work for tibbles and users (like me)

Error in unsplit() with tibbles

2020 Nov 21

Error in unsplit() with tibbles

> On Nov 21, 2020, at 10:55 AM, Mario Annau <mario.annau at gmail.com> wrote: > > Hello, > > using the `unsplit()` function with tibbles currently leads to the > following error: > >> mtcars_tb <- as_tibble(mtcars, rownames = NULL) >> s <- split(mtcars_tb, mtcars_tb$gear) >> unsplit(s, mtcars_tb$gear) > Error: Must subset rows with a valid

similar to: efficiently picking one row from a data frame per unique key