Displaying 20 results from an estimated 2000 matches similar to: "efficiently picking one row from a data frame per unique key"
2020 Nov 21
3
Error in unsplit() with tibbles
Hello,
using the `unsplit()` function with tibbles currently leads to the
following error:
> mtcars_tb <- as_tibble(mtcars, rownames = NULL)
> s <- split(mtcars_tb, mtcars_tb$gear)
> unsplit(s, mtcars_tb$gear)
Error: Must subset rows with a valid subscript vector.
? Logical subscripts must match the size of the indexed input.
x Input has size 15 but subscript `rep(NA, len)` has
2002 Jul 28
1
[R] bug in unsplit()? (PR#1843)
Hedderik van Rijn <hedderik@cmu.edu> writes:
> If the second argument to unsplit is not a simple vector (but a "list
> containing multiple lists"), the function seems to have some problems.
>
> Given a slight modification of the examples in help(split):
>
> > xg <- split(x,list(g1=g,g2=g))
> > unsplit(xg,list(g1=g,g2=g))
> [1] -0.7877109
2020 Nov 21
2
Error in unsplit() with tibbles
I get the sentiment, but this is really just bad coding (on my own part, I suspect), so we might as well just fix it...
-pd
> On 21 Nov 2020, at 17:42 , Marc Schwartz via R-devel <r-devel at r-project.org> wrote:
>
>
>> On Nov 21, 2020, at 10:55 AM, Mario Annau <mario.annau at gmail.com> wrote:
>>
>> Hello,
>>
>> using the `unsplit()`
2010 Apr 19
2
Using split and then unsplit
Hello everyone,
I use the split function splitting with the f function on a 3 columns and
more than 100 000 rows data frame. Once it's split I have a list of data
frames still with 3 columns and n rows. I manipulate those list elements and
get a list of data frames still with 3 columns but less rows. So when I
unsplit it, I get an error as I use the same factor function I used to split
( f in
2006 Jun 08
1
NAs in unsplit factor
R-devel,
Below is a simple example calling split and unsplit on a numeric
vector of length 2 where 'f' is c(1,NA).
> unsplit(split(c(1,2), c(1,NA)), c(1,NA))
[1] 1 0
I noticed that the call to vector in unsplit gives us 0 as the 2nd
element of the result.
Is this the intended result, as opposed to NA?
Thanks for your help,
Jeff
--
Jeff Enos
Kane Capital Management
jeff at
2005 Sep 27
2
Using unsplit - unsplit does not seem to reverse the effect of split
In data OME in MASS I would like to extract the first 5 observations per subject (=ID). So I do
library(MASS)
OMEsub <- split(OME, OME$ID)
OMEsub <- lapply(OMEsub,function(x)x[1:5,])
unsplit(OMEsub, OME$ID)
- which results in
[[1]]
[1] 1 1 1 1 1
[[2]]
[1] 30 30 30 30 30
[[3]]
[1] low low low low low
Levels: N/A high low
[[4]]
[1] 35 35 40 40 45
[[5]]
[1] coherent incoherent coherent
2009 May 08
1
unsplit list of data.frames with one column
Perhaps this is the intended behavior, but I discovered that unsplit
throws an error when it tries to set rownames of a variable that has
no dimension. This occurs when unsplit is passed a list of
data.frames that have only a single column.
An example:
df <- data.frame(letters[seq(25)])
fac <- rep(seq(5), 5)
unsplit(split(df, fac), fac)
For reference, I'm using R version 2.9.0
2011 May 19
1
Problems with unsplit()
Hi everyone,
I have already used split() and unsplit() in data frames without problems,
but now I’m applying these functions to other data and when using unsplit()
I have received the following message:
Error in `row.names<-.data.frame`(`*tmp*`, value = c("1", "2", "3", "4", :
duplicate ''row.names'' are not allowed
In
2011 Mar 10
1
getting percentiles by factor
Hello,
I'm trying to get percentiles (PERCENTRANK for excel users) by factor in the
following data.frame:
myExample <- data.frame(Ret=seq(-2, 2.5,
by=0.5),PE=seq(10,19),Sectors=rep(c("Financial","Industrial"),5))
myExample <- na.omit(myExample)
Thanks to Patrick I I managed to put together the following lines which does
it for the "Ret" column:
myecdf
2009 Nov 19
2
Efficient cbind of elements from two lists
Hi!
I have a data.frame "data" and splitted it.
data <- split(data, data[,1])
This is a quite slow procedure; and I do not want to do it again. So,
any unsplit and "resplit" is no option for me.
But: I have to cbind "variables" to the splitted data from another list,
that contains of vectors with matching sizes, so
for (i in 1:length(data)) {
data[[i]]
2024 Oct 06
0
Coda: On the efficiency of unsplit() for Rolf Turner's recent post
(only of interest -- maybe! -- to those who followed this thread of a
couple of weeks ago)
Just for the heckuva it, I compared the timing of Deepayan's unsplit(x,f)
solution to my as.vector(do.call(rbind, x)) approach to the query for a
list of 3 vectors each of length 1000 (the original toy example was for a
list of 3 vectors of length 5). Unsurprisingly, I think, because the
unsplit()
2024 Sep 27
1
Is there a sexy way ...?
>>>>> Chris Evans via R-help
>>>>> on Fri, 27 Sep 2024 12:20:47 +0200 writes:
> Oh glorious!? Thanks Duncan.
> Fortune cookie nomination!
I don't disagree with the nomination -- thank you, Duncan!
However, please note that I'm sure Rolf's was challenged /
question was ment to work correctly for all factors `f` with
levels
2005 Jun 25
1
group means: split and unsplit
Took me a while but I figured out how to put in common values of
group means/counts, etc. to do the same thing as egen. lapply with
split and then unsplit.
Thomas Davidoff
Assistant Professor
Haas School of Business
UC Berkeley
Berkeley, CA 94720
phone: (510) 643-1425
fax: (510) 643-7357
davidoff@haas.berkeley.edu
http://faculty.haas.berkeley.edu/davidoff
[[alternative HTML
2009 Nov 25
0
Possible bug in "unsplit" (PR#14084)
Dear R-bug-people
I have encountered a problem with "unsplit", which I believe may be
caused by a bug in the function. However, unexpericend with bug-reports
I apologise if this is barely a user problem rather than a problem
within R.
The problem occurs if an object is split by several grouping factors
with levels not occuring in the data, and using drop = TRUE. This may
appear as
2010 Aug 20
2
Problem with POSIXct in ave
Hi,
I am having trouble using the ave function with a POSIXct object. For
example:
x<-Sys.time()+0:9*3600
dat<-data.frame(id=rep(c('a','
b','c'),each=10),dt=rep(x,3),i=rep(1:10,3))
dat
# This is what I want to do:
dat$time.elapsed<-unsplit(lapply(split(dat$dt,dat$id),function(x)
x-x[1]),f=dat$id)
dat
# The above code does the trick, but from the standpoint of
2006 Jul 13
1
Scalling/Centering the Data by an Index
Dear All:
I would like to center the data in 'x' by 'group'. The following code scale
the data and I have not been able to figure out how to change it so I get
the centered data.
x <- c(1, 2, 3, 4, 5, 6, 7, 8)
group <- c(1,1,1,2,2,2,2,2)
unsplit(lapply(split(x,group),scale),group)
I would appreciate your help.
Ashraf
2011 May 06
1
Cumsum in Lattice Panel Function
I'm trying to create an xyplot with a "groups" argument where the y-variable
is the cumsum of the values stored in the input data frame. I almost have
it, but I can't get it to automatically adjust the y-axis scale. How do I
get the y-axis to automatically scale as it would have if the cumsum values
had been stored in the data frame?
Here is the code I have so far:
2005 Jul 21
1
Clustered standard errors in a panel
I want to do the following:
glm(y ~ x1 + x2 +...)
within a panel. Hence y, x1, and x2 all vary at the individual
level. However, there is likely correlation of these variables
within an individual, so standard errors need adjustment.
I do not want to estimate fixed effects, but do want to cluster
standard errors at the individual level.
Is there an automated way to do this? Nothing in
2020 Nov 21
0
Error in unsplit() with tibbles
Cool - thank you Peter!
@Marc: This is really not a tidyverse vs base-R debate and I personally
think that they should both work together for most parts. The common
environment is still R. But just to give you the full picture I also filed
a bug for tibbles (https://github.com/tidyverse/tibble/issues/829). With
these two fixes I think that split/unsplit would work for tibbles and users
(like me)
2020 Nov 21
0
Error in unsplit() with tibbles
> On Nov 21, 2020, at 10:55 AM, Mario Annau <mario.annau at gmail.com> wrote:
>
> Hello,
>
> using the `unsplit()` function with tibbles currently leads to the
> following error:
>
>> mtcars_tb <- as_tibble(mtcars, rownames = NULL)
>> s <- split(mtcars_tb, mtcars_tb$gear)
>> unsplit(s, mtcars_tb$gear)
> Error: Must subset rows with a valid