thr3ads.net - similar to: "Select only unique rows from a data frame"

Displaying 20 results from an estimated 10000 matches similar to: "Select only unique rows from a data frame"

2013 Feb 01

expand.grid on contents of a list

Hello! I have a list of variable length. One example is: X=vector("list",3) X[[1]]=1:2 X[[2]]=1:2 X[[3]]=1:2 How could I run expand.grid on the elements of X so that the results would be the same as expand.grid(1:2,1:2,1:2)? Thank you! Dimitri -- Dimitri Liakhovitski gfk.com <http://marketfusionanalytics.com/> [[alternative HTML version deleted]]

grabbing from elements of a list without a loop

2013 Feb 12

grabbing from elements of a list without a loop

Hello! # I have a list with several data frames: mylist<-list(data.frame(a=1:2,b=2:3), data.frame(a=3:4,b=5:6),data.frame(a=7:8,b=9:10)) (mylist) # I want to grab only one specific column from each list element neededcolumns<-c(1,2,0) # number of the column I need from each element of the list # Below, I am doing it using a loop: newlist<-NULL for(i in 1:length(mylist) ) {

Fastest way to compare a single value with all values in one column of a data frame

2013 Jan 29

Fastest way to compare a single value with all values in one column of a data frame

Hello! I have a large data frame x: x<-data.frame(item=letters[1:5],a=1:5,b=11:15) # in actuality, x has 1000 rows x$item<-as.character(x$item) I also have a small data frame y with just 1 row: y<-data.frame(item="f",a=3,b=10) y$item<-as.character(y$item) I have to decide if y$a is larger than the smallest of all the values in x$a. If it is, I want y to replace the whole

Looping through rows of all elements of a list that has variable length

2013 Feb 03

Looping through rows of all elements of a list that has variable length

Dear R-ers, I have a list of data frames such that the length of the list is unknown in advance (it could be 1 or 2 or more). Each element of the list contains a data frame. I need to loop through all rows of the list element 1 AND (if applicable) of the list element 2 etc. and do something at each iteration. I am trying to figure out how to write a code that is generic, i.e., loops through the

Assigning cases to groupings based on the values of several variables

2012 Dec 07

Assigning cases to groupings based on the values of several variables

Dear R-ers, my task is to simple: to assign cases to desired groupings based on the combined values on 2 variables. I can think of 3 methods of doing it. Method 1 seems to me pretty r-like, but it requires a lot of lines of code - onerous. Method 2 is a loop, so not very good - as it loops through all rows of mydata. Method 3 is a loop but loops through fewer lines, so it seems to me more

"rounding" to a number that is LOWER than my number

2011 Sep 16

"rounding" to a number that is LOWER than my number

Hello! What function would allow me to "round" down, rather than up? For example, x<-1.98 I'd like to get 1.9 - rather than 2.0. Reason - I am creating a minimum for an axis for a plot, and I need it to be lower than x (which, in turn, is the lowest number already). Thank you! -- Dimitri Liakhovitski marketfusionanalytics.com

summing columns with NAs present

2011 Aug 05

summing columns with NAs present

Hello! I have a data frame with some NAs. test<-data.frame(a=c(1,2,NA),b=c(10,NA,20)) I need to sum up values in 2 variables. However: test$a+test$b procudes NAs in rows that have NAs. How could I sum up columns while ignoring NAs (the way the function sum(..., na.rm=T) works? Thank you! -- Dimitri Liakhovitski marketfusionanalytics.com

Efficient way of creating a shifted (lagged) variable?

2011 Aug 04

Efficient way of creating a shifted (lagged) variable?

Hello! I have a data set: set.seed(123) y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01-31"),by="week")) y$var1<-c(1,2,3,round(rnorm(54),1)) y$var2<-c(10,20,30,round(rnorm(54),1)) # All I need is to create lagged variables for var1 and var2. I looked around a bit and found several ways of doing it. They all seem quite complicated - while in

Identifying US holidays

2011 Aug 01

Identifying US holidays

Hello! I am trying to identify which ones of a vector of dates are US holidays. And, ideally, which is which. And I do not know (a-priori) which dates those should be. I have, for example: x<-seq(as.Date("2011-01-01"),as.Date("2011-12-31"),by="day") (x) I think chron should help me here - but maybe I am not using it properly: library(chron) is.holiday(chron) #

Overimposing one map in ssplot onto another

2012 Jan 27

Overimposing one map in ssplot onto another

Hello! I have 2 maps - both created in ssplot and both identical in terms of outline. Is there any way to superimpose Map1 (which has black borders between Canadian provinces) onto Map2 (which is also a map of Canada)? Thanks a lot for your hints! Dimitri ### A. Reading in Canada data at the province and then at the county level: library(raster) getData('ISO3') # Canada's code is

glm and lm can't find weights

2013 Mar 11

glm and lm can't find weights

Hello, and apologies for not providing an example. However, my question is more general. I have a lengthy function. This function is using another internal function that modifies the data frame I am reading in. This internal function is using the command model.frame (with data and weights inside) and returns a data frame I am using for further analyses. However, when I try to run my function

optim seems to be finding a local minimum

2011 Nov 10

optim seems to be finding a local minimum

Hello! I am trying to create an R optimization routine for a task that's currently being done using Excel (lots of tables, formulas, and Solver). However, otpim seems to be finding a local minimum. Example data, functions, and comparison with the solution found in Excel are below. I am not experienced in optimizations so thanks a lot for your advice! Dimitri ### 2 Inputs:

squared "pie chart" - is there such a thing?

2011 Jul 21

squared "pie chart" - is there such a thing?

Hello! It's a shoot in the dark, but I'll try. If one has a total of 100 (e.g., %), and three components of the total, e.g., mytotal=data.frame(x=50,y=30,z=20), - one could build a pie chart with 3 sectors representing x, y, and z according to their proportions in the total. I am wondering if it's possible to build something very similar, but not on a circle but in a square - such that

calculating mean excluding zeros

2011 Jul 19

calculating mean excluding zeros

Sorry if it's been discussed before - don't seem to find it. I'd like to calculate a mean while ignoring zeros. "mean" doesn't seem to have an option for that. Any other function/package that could do it? Thanks for a pointer! -- Dimitri Liakhovitski marketfusionanalytics.com

splitting a string based on the last underscore

2011 Jul 29

splitting a string based on the last underscore

Hello! Hope you could help me split the strings. I have a set of strings: x<-c("name_a1_2.5.o","name_a2_2.53.o","name_a3_bla_1.o") I need to extract from each string: 1. Its unique part that comes before the last "_", i.e.: "a1","a2","a3_bla". 2. The part that comes after the last "_" and before ".o"

using vif from package "car" - "aliased coefficients in the model"

2011 Sep 13

using vif from package "car" - "aliased coefficients in the model"

Hello! I have run a simple regression - lm and created a regression object "myreg". I can see all the coefficients when I print(myreg). Then I tried to run vif(myreg) from the package "car". However, it's giving me an error: in vif.lm(regr.f) : there are aliased coefficients in the model Very sorry for my question: Is there any way to get the vif's for all predictors?

identifying weeks (dates) that certain days (dates) fall into

2011 Aug 02

identifying weeks (dates) that certain days (dates) fall into

Hello! I have dates for the beginning of each week, e.g.: weekly<-data.frame(week=seq(as.Date("2010-04-01"), as.Date("2011-12-26"),by="week")) week # each week starts on a Monday I also have a vector of dates I am interested in, e.g.: july4<-as.Date(c("2010-07-04","2011-07-04")) I would like to flag the weeks in my weekly$week that

Which Durbin-Watson is correct? (weights involved) - using durbinWatsonTest and dwtest (packages car and lmtest)

2011 Aug 12

Which Durbin-Watson is correct? (weights involved) - using durbinWatsonTest and dwtest (packages car and lmtest)

Hello! I have a data frame mysample (sorry for a long way of creating it below - but I need it in this form, and it works). I regress Y onto X1 through X11 - first without weights, then with weights: regtest1<-lm(Y~., data=mysample[-13])) regtest2<-lm(Y~., data=mysample[-13]),weights=mysample$weight) summary(regtest1) summary(regtest2) Then I calculate Durbin-Watson for both regressions

transforming a badly organized data base into a list of data frames

2009 Sep 04

transforming a badly organized data base into a list of data frames

Dear R-ers! I have a badly organized data base in Excel. Once I read it into R it looks like this (all variables become factors because of many spaces and other characters in Excel):

help with merging 2 data frames

2012 Jul 11

help with merging 2 data frames

Dear R-ers, I feel I am close, but can't get it quite right. Thanks a lot for your help! Dimitri # I have 2 data frames: x<-data.frame(a=c("aa","aa","ab","ab","ba","ba","bb","bb"),b=c(1:2,1:2,1:2,1:2),d=c(10,20,30,40,50,60,70,80))

similar to: Select only unique rows from a data frame