thr3ads.net - similar to: "problems with merge() - the output has many repeated lines"

Displaying 20 results from an estimated 9000 matches similar to: "problems with merge() - the output has many repeated lines"

remove extreme values or winsorize – loop - dataframe

2010 Aug 01

remove extreme values or winsorize – loop - dataframe

Hi everyone! #I need a loop or a function that creates a X2 variable that is X1 without the extreme values (or X1 winsorized) by industry and year. #My reproducible example: firm<-sort(rep(1:1000,10),decreasing=F) year<-rep(1998:2007,1000) industry<-rep(c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10),rep(6,10),rep(7,10),rep(8,10),rep(9,10), rep(10,10)),1000) X1<-rnorm(10000)

subset in dataframes

2011 Oct 02

subset in dataframes

I need help in subseting a dataframe: data1<-data.frame(year=c(2001,2002,2003,2004,2001,2002,2003,2004, 2001,2002,2003,2004,2001,2002,2003,2004), firm=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),x=c(11,22,-32,25,-26,47,85,98, 101,14,87,56,12,43,67,54), y=c(110,220,302,250,260,470,850,980,1010,140,870,560,120,430,670,540)) data1 I want to keep the firms where all x>0 (where there are

if else

2009 Jun 08

if else

Hi R-helpers! I have the following dataframe: firm<-c(rep(1:3,4)) year<-c(rep(2001:2003,4)) X1<-rep(c(10,NA),6) X2<-rep(c(5,NA,2),4) data<-data.frame(firm, year,X1,X2) data So I want to obtain the same dataframe with a variable X3 that is: X1, if X2=NA X2, if X1=NA X1+X2 if X1 and X2 are not NA So my final data is X3<-c(15,NA,12,5,10,2,15,NA,12,5,10,2)

R: subset dataframe/list

2009 Jun 02

R: subset dataframe/list

Thank you all!!! The problem was the decimal symbol! My data was saved in a txt file, so I?ve introduced the dec="," in ?read.table? and it worked. What I?ve done was coeficientes<-read.table("coeficientes.txt",sep="\t",header=T,dec=",") Then, subset worked fine coeficientesWanted<-subset(coeficientes,b1>0) Thanks again, Cec?lia Carmo

help with a loop (coefficients with lmList)

2009 Aug 09

help with a loop (coefficients with lmList)

Hi R-helpers. #I start with the reproducible example: firm<-c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10)) year<-c(rep(1998:2007,5)) industry<-c(rep(1,20),rep(5,10),rep(7,10),rep(9,10)) X1<-rnorm(50) X2<-rnorm(50,mean=0.5,sd=0.1) Y<-rnorm(50,mean=0,sd=0.5) data<-data.frame(firm, industry,year,X1,X2,Y) data #I need to calculate for all the industries the following

paired samples, matching rows, merge()

2010 Aug 20

paired samples, matching rows, merge()

Hi everyone! I'm matching two samples to create one sample that have pairs of observations equal for the k1 variable. Merge() doesn't work because I dont't want to recycle the values. x <- data.frame(k1=c(1,1,2,3,3,5), k2=c(20,21,22,23,24,25)) x y <- data.frame(k1=c(1,1,2,2,3,4,5,5), k2=c(10,11,12,13,14,15,16,17)) y merge(x,y,by="k1") k1 k2.x k2.y 1 1 20

subset dataframe/list

2009 Jun 01

subset dataframe/list

Hi R-helpers! I have the following object: > head(coeficientes) caedois b1 b2 b3 1 1 0,033120395 -20,29478338 -0,274638864 2 2 -0,040629634 74,54239889 -0,069958424 3 5 -0,001116816 35,2398622 0,214327185 4 10 0,171875 5 14 0,007288399 40,06560548 -0,081828338 6 15 0,027530346 0,969969409 0,102775555

Select the rows in a dataframe that matches a criteria in another dataframe

2009 May 10

Select the rows in a dataframe that matches a criteria in another dataframe

Hi everyone! Thank you for the help you have been given to me, and here I'm with another problem with my dataframes: I have two dataframes (with much more observations), like these: Dataframe1 Firm Year cash 500400200 2007 100 500400200 2006 200 500400200 2005 400 500400300 2007 300 500400300 2006 240 500400300 2005 120 500400400

subset dataframe by number of rows of equal values

2009 May 24

subset dataframe by number of rows of equal values

Hi R helpers! I have the following dataframe ?choose? choose<-data.frame(firm=c(1,1,2,2,2,2,3,3,4,4,4,4,4,4), year=c(2000,2001,2000,2001,2002,2003,2000,2003,2001,2002,2003,2004,2005,2006),code=c(10,10,11,11,11,11,12,12,13,13,13,13,13,13)) choose I want to subset it to obtain another one with those observations for which there more than 2 observations in the column ?code?. So I want a

importing spreadsheet data - linera regression - panel data

2009 Apr 19

importing spreadsheet data - linera regression - panel data

Hi everyone and thank you for the help you could give me. My data is in a spreadsheet. The 1st column identifies the firm (with the fiscal number), the columns 2 to 11 have the variable value for 11 years. I have many variables (files like this). Each file has about 40.000 firms (rows). I transformed all the files in txt files. The data is a panel data, like this: firm revenu2007 revenue2006

matched samples, dataframe, panel data

2013 Jun 07

matched samples, dataframe, panel data

I R-helpers #I have a data panel of thousands of firms, by year and industry and #one dummy variable that separates the firms in two categories: 1 if the firm have an auditor; 0 if not #and another variable the represents the firm dimension (total assets in thousand of euros) #I need to create two separated samples with the same number os firms where #one firm in the first have a corresponding

plm package, R squared, dummies in panel data

2011 Sep 05

plm package, R squared, dummies in panel data

Hi R-helpers, I have two questions I hope you could help me with them: In the plm package how can I calculate the R2 within, R2 between and R2 overall? Is there any special reason to not display these values? When using first differences do I need to have some special care with dummies (both year dummies and industry dummies)? (A friend who works with Stata told me that there is

balanced panel data

2011 May 19

balanced panel data

I have a dataframe with many firm-year observations and many variables. Not all firms have information for all the years. I want another dataframe with only those firms that have information all years. This is, I want a balanced panel data, but with the maximum number of years. In my reprocucible example I want to keep firms 1,2 and 3 (period 2000 to 2004). I need your help to create a

error message in plm

2009 May 29

error message in plm

Hi everyone, Could anyone tell me what means the follow error message Error in xj[i] : invalid subscript type 'closure' It happens when I run the function plm, like this: >ff<-totaccz~lactivoz+varvolnegz >ss<-plm(ff,data=regaccdis,na.omit) Error in xj[i] : invalid subscript type 'closure' > coef(ss) (Intercept) lactivoz varvolnegz 0.02571212 6.94227541

save plm coefficients

2009 May 29

save plm coefficients

Hi R-helpers, I want to determine the coefficients of the following regression for several subsets, and I want to save it in a dataframe: The data is in ?regaccdis?, ?regaccdis$caedois? is the column that defines the subsets and the function I have runned is coef(plm(ff,data=regaccdis,na.action=na.omit,model="pooling",subset=(regaccdis$caedois==i))) I?ve created a dataframe named

textual analysis - transforming several pdf to txt - naming the files

2023 Jul 05

textual analysis - transforming several pdf to txt - naming the files

convertpdf2txt <- function(dirpath){ files <- list.files(dirpath, pattern = "Consoli.*\\.pdf$", full.names = TRUE) files <- chartr("\\", "/", files) x <- lapply(files, function(x){ pdftools::pdf_text(x) %>% paste0(collapse = " ") %>% stringr::str_squish() }) new_names <-

simple loop

2009 Jun 28

simple loop

Hi everyone! I have this dataframe: firm<-c(rep(1,4),rep(2,4),rep(3,4),rep(4,4),rep(5,4),rep(6,4)) year<-c(rep(2000:2003,6)) industry<-c(rep(10,4),rep(20,4),rep(30,4),rep(10,4),rep(20,4),rep(30,4)) X1<-c(10,14,18,16,20,45,23,54,24,67,98,58,16,32,57,12,54,0,0,22,11,3,5,6) data<-data.frame(firm, industry,year,X1) data I need a loop that calculates the mean of X1 by year and by

save the output of summary(lmList(x)) into a dataframe

2009 Jun 16

save the output of summary(lmList(x)) into a dataframe

Hi r-helpers! I need to save the output of summary() function that I?ve runned like this: z<- lmList(y~x1+x2| x3, na.action=na.omit,data1,subset=year==1999) w<-summary(z) The output (w) is something like this: Call: Model: y ~ x1 + x2 | x3 Data: data1 Coefficients: (Intercept) Estimate Std. Error t value Pr(>|t|) 1 0.081110514 1.141352e-01

linear model coefficients by year and industry, fitted values, residuals, panel data

2013 Apr 03

linear model coefficients by year and industry, fitted values, residuals, panel data

Hi R-helpers, My real data is a panel (unbalanced and with gaps in years) of thousands of firms, by year and industry, and with financial information (variables X, Y, Z, for example), the number of firms by year and industry is not always equal, the number of years by industry is not always equal. #reproducible example firm1<-sort(rep(1:10,5),decreasing=F) year1<-rep(2000:2004,10)

textual analysis - transforming several pdf to txt - naming the files

2023 Jul 05

textual analysis - transforming several pdf to txt - naming the files

I am taking my first steps in textual analysis with R. I have pdf files consisting of company reports for several years (1 file corresponds to 1 company and 1 year). My idea is to start by transforming all my pdf files into txt files for further treatment and analysis (this will allow me to group the files by company or by year, depending on the future analysis to be performed). I do not have

similar to: problems with merge() - the output has many repeated lines