Displaying 20 results from an estimated 500 matches similar to: "remove extreme values or winsorize – loop - dataframe"
2013 Jun 07
4
matched samples, dataframe, panel data
I R-helpers
#I have a data panel of thousands of firms, by year and industry and
#one dummy variable that separates the firms in two categories: 1 if the firm have an auditor; 0 if not
#and another variable the represents the firm dimension (total assets in thousand of euros)
#I need to create two separated samples with the same number os firms where
#one firm in the first have a corresponding
2013 Apr 03
1
linear model coefficients by year and industry, fitted values, residuals, panel data
Hi R-helpers,
My real data is a panel (unbalanced and with gaps in years) of thousands of firms, by year and industry, and with financial information (variables X, Y, Z, for example), the number of firms by year and industry is not always equal, the number of years by industry is not always equal.
#reproducible example
firm1<-sort(rep(1:10,5),decreasing=F)
year1<-rep(2000:2004,10)
2013 Jun 10
2
please check this
Hi,
Try this:
which(duplicated(res10Percent))
# [1] 117 125 157 189 213 235 267 275 278 293 301 327 331 335 339 367 369 371 379
#[20] 413 415 417 441 459 461 477 479 505
res10PercentSub1<-subset(res10Percent[which(duplicated(res10Percent)),],dummy==1)? #most of the duplicated are dummy==1
res10PercentSub0<-subset(res10Percent[which(duplicated(res10Percent)),],dummy==0)
2009 May 04
4
Creating a variable which is the sum of equal rows in a dataframe
Hi everyone:
I need to count the number of banks of each firm in my
data. The firm is identified by the fiscal number. The
banks of each firm appears like this:
Firm Banks
500600700 Citybank
500600700 CGD
500600700 BES
500600800 Citybank
500600800 Bank1
500600900 CGD
I want to obtain the following dataframe:
Firm
2011 Oct 02
2
subset in dataframes
I need help in subseting a dataframe:
data1<-data.frame(year=c(2001,2002,2003,2004,2001,2002,2003,2004,
2001,2002,2003,2004,2001,2002,2003,2004),
firm=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),x=c(11,22,-32,25,-26,47,85,98,
101,14,87,56,12,43,67,54),
y=c(110,220,302,250,260,470,850,980,1010,140,870,560,120,430,670,540))
data1
I want to keep the firms where all x>0 (where there are
2011 Sep 22
2
the opposite of lag() in panel data
Hi R-helpers
I want a function that performs the opposite of lag() with panel data.
I have transformed my data before with pdata.frame(mydata,
index=c("groupindex", “timeindex"))
And then I’ve done lag(mydata, -1) but it doesn’t work.
The error message was:
Error in rep(1, ak) : invalid ''times'' argument
Thank you in advance,
Cecília Carmo
2009 Jun 01
1
Fwd: subset dataframe/list
--- the forwarded message follows ---
-------------- next part --------------
An embedded message was scrubbed...
From: "Cecilia Carmo" <cecilia.carmo at ua.pt>
Subject: Re: [R] subset dataframe/list
Date: Mon, 01 Jun 2009 21:33:15 +0100
Size: 3657
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090601/921f7638/attachment-0002.mht>
2010 Aug 21
3
problems with merge() - the output has many repeated lines
Hi everyone,
I have been merging many big dataframes (about 80000 rows
each) and I never had this problem, but now it happened to
me and I want to know if someone knows what could be
happening.
The final dataframe has many rows, an impossible number! I
have done edit(dataframe) and I saw that there are many
repeated rows (all equal).
Thanks for any help,
Cec?lia Carmo
Universidade de
2009 Jun 28
2
simple loop
Hi everyone!
I have this dataframe:
firm<-c(rep(1,4),rep(2,4),rep(3,4),rep(4,4),rep(5,4),rep(6,4))
year<-c(rep(2000:2003,6))
industry<-c(rep(10,4),rep(20,4),rep(30,4),rep(10,4),rep(20,4),rep(30,4))
X1<-c(10,14,18,16,20,45,23,54,24,67,98,58,16,32,57,12,54,0,0,22,11,3,5,6)
data<-data.frame(firm, industry,year,X1)
data
I need a loop that calculates the mean of X1 by year and
by
2009 Jun 16
2
save the output of summary(lmList(x)) into a dataframe
Hi r-helpers!
I need to save the output of summary() function that I?ve
runned like this:
z<- lmList(y~x1+x2| x3,
na.action=na.omit,data1,subset=year==1999)
w<-summary(z)
The output (w) is something like this:
Call:
Model: y ~ x1 + x2 | x3
Data: data1
Coefficients:
(Intercept)
Estimate Std. Error t value Pr(>|t|)
1 0.081110514 1.141352e-01
2009 Jun 02
1
R: subset dataframe/list
Thank you all!!!
The problem was the decimal symbol! My data was saved in a
txt file, so I?ve introduced the dec="," in ?read.table?
and it worked. What I?ve done was
coeficientes<-read.table("coeficientes.txt",sep="\t",header=T,dec=",")
Then, subset worked fine
coeficientesWanted<-subset(coeficientes,b1>0)
Thanks again,
Cec?lia Carmo
2009 Jun 01
2
subset dataframe/list
Hi R-helpers!
I have the following object:
> head(coeficientes)
caedois b1 b2 b3
1 1 0,033120395 -20,29478338 -0,274638864
2 2 -0,040629634 74,54239889 -0,069958424
3 5 -0,001116816 35,2398622 0,214327185
4 10 0,171875
5 14 0,007288399 40,06560548 -0,081828338
6 15 0,027530346 0,969969409 0,102775555
2010 Aug 20
5
paired samples, matching rows, merge()
Hi everyone!
I'm matching two samples to create one sample that have
pairs of observations equal for the k1 variable. Merge()
doesn't work because I dont't want to recycle the values.
x <- data.frame(k1=c(1,1,2,3,3,5),
k2=c(20,21,22,23,24,25))
x
y <- data.frame(k1=c(1,1,2,2,3,4,5,5),
k2=c(10,11,12,13,14,15,16,17))
y
merge(x,y,by="k1")
k1 k2.x k2.y
1 1 20
2011 May 19
2
balanced panel data
I have a dataframe with many firm-year observations and many variables.
Not all firms have information for all the years.
I want another dataframe with only those firms that have information all
years.
This is, I want a balanced panel data, but with the maximum number of years.
In my reprocucible example I want to keep firms 1,2 and 3 (period 2000 to
2004).
I need your help to create a
2009 Jun 08
5
if else
Hi R-helpers!
I have the following dataframe:
firm<-c(rep(1:3,4))
year<-c(rep(2001:2003,4))
X1<-rep(c(10,NA),6)
X2<-rep(c(5,NA,2),4)
data<-data.frame(firm, year,X1,X2)
data
So I want to obtain the same dataframe with a variable X3
that is:
X1, if X2=NA
X2, if X1=NA
X1+X2 if X1 and X2 are not NA
So my final data is
X3<-c(15,NA,12,5,10,2,15,NA,12,5,10,2)
2009 Apr 19
2
importing spreadsheet data - linera regression - panel data
Hi everyone and thank you for the help you could give me.
My data is in a spreadsheet. The 1st column identifies the
firm (with the fiscal number), the columns 2 to 11 have
the variable value for 11 years. I have many variables
(files like this). Each file has about 40.000 firms
(rows). I transformed all the files in txt files. The data
is a panel data, like this:
firm revenu2007 revenue2006
2009 May 24
1
subset dataframe by number of rows of equal values
Hi R helpers!
I have the following dataframe ?choose?
choose<-data.frame(firm=c(1,1,2,2,2,2,3,3,4,4,4,4,4,4),
year=c(2000,2001,2000,2001,2002,2003,2000,2003,2001,2002,2003,2004,2005,2006),code=c(10,10,11,11,11,11,12,12,13,13,13,13,13,13))
choose
I want to subset it to obtain another one with those
observations for which there more than 2 observations in
the column ?code?. So I want a
2009 Aug 09
1
help with a loop (coefficients with lmList)
Hi R-helpers.
#I start with the reproducible example:
firm<-c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10))
year<-c(rep(1998:2007,5))
industry<-c(rep(1,20),rep(5,10),rep(7,10),rep(9,10))
X1<-rnorm(50)
X2<-rnorm(50,mean=0.5,sd=0.1)
Y<-rnorm(50,mean=0,sd=0.5)
data<-data.frame(firm, industry,year,X1,X2,Y)
data
#I need to calculate for all the industries the following
2011 Sep 05
1
plm package, R squared, dummies in panel data
Hi R-helpers,
I have two questions I hope you could help me with them:
In the plm package how can I calculate the R2 within, R2 between and R2
overall? Is there any special reason to not display these values?
When using first differences do I need to have some special care with
dummies (both year dummies and industry dummies)?
(A friend who works with Stata told me that there is
2011 May 19
1
Problems with unsplit()
Hi everyone,
I have already used split() and unsplit() in data frames without problems,
but now I’m applying these functions to other data and when using unsplit()
I have received the following message:
Error in `row.names<-.data.frame`(`*tmp*`, value = c("1", "2", "3", "4", :
duplicate ''row.names'' are not allowed
In