HI,
I guess it is a bit confusing as to what you want.? In the example dataset,
there was no democracy_index, but in the result you want it.? Regarding the
median calculation, I guess you want to calculate the median for each country.?
I created one more country (China) with fake data.?
May be this helps:
dat1<-read.table(text="
Country? log_GDP yr
Canada 9.115211 1950
Canada 9.205848 1955
Canada 9.247975 1960
Canada 9.429002 1965
Canada 9.554069 1970
Canada 9.719351 1975
Canada 9.851376 1980
Canada 9.937892 1985
Canada 10.01457 1990
Canada 10.04093 1995
Canada 10.20005 2000
USA?????? 9.27824??? 1950
USA?????? 9.38968??? 1955
USA?????? 9.415136? 1960
USA?????? 9.594625? 1965
USA?????? 9.70207??? 1970
USA?????? 9.800418? 1975
USA?????? 9.96813??? 1980
USA?????? 10.07001? 1985
USA?????? 10.18331? 1990
USA?????? 10.25446? 1995
USA?????? 10.4131??? 2000
China????? 7.5?????? 1950
China????? 7.32????? 1955
China?????? 7.33? 1960
China?????? 7.6? 1965
China?????? 7.8??? 1970
China?????? 8.0?? 1975
China?????? 8.2??? 1980
China?????? 8.3? 1985
China?????? 8.5? 1990
China?????? 8.6? 1995
China?????? 8.7??? 2000??
",sep="",header=TRUE,stringsAsFactors=FALSE)
?dat2<-with(dat1,aggregate(log_GDP,by=list(Country=Country),mean))
colnames(dat2)[2]<-"Mean"
?dat3<-with(dat1,aggregate(log_GDP,by=list(Country=Country),median))
colnames(dat3)[2]<-"Median"
dat4<-merge(dat3,dat2)
dat4$HighIncome<-ifelse(dat4$Mean>dat4$Median,dat4$Country[dat4$Mean>dat4$Median],NA)
?dat4$LowIncome<-ifelse(dat4$Mean>dat4$Median,NA,dat4$Country[!dat4$Mean>dat4$Median])
dat5<-dat4[,-2]
dat5
#? Country???? Mean HighIncome LowIncome
#1? Canada 9.665116?????? <NA>??? Canada
#2?? China 7.986364?????? <NA>???? China
#3???? USA 9.824471??????? USA????? <NA>
res<-merge(dat1,dat5)
?head(res)
#? Country? log_GDP?? yr???? Mean HighIncome LowIncome
#1? Canada 9.115211 1950 9.665116?????? <NA>??? Canada
#2? Canada 9.205848 1955 9.665116?????? <NA>??? Canada
#3? Canada 9.247975 1960 9.665116?????? <NA>??? Canada
#4? Canada 9.429002 1965 9.665116?????? <NA>??? Canada
#5? Canada 9.554069 1970 9.665116?????? <NA>??? Canada
#6? Canada 9.719351 1975 9.665116?????? <NA>??? Canada
A.K.
----- Original Message -----
From: fuckecon <iamstanhu at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Sunday, October 28, 2012 12:16 AM
Subject: [R] keep average values and delete duplicate rows
Hello experts,
I am sorry that my subject line is confusing, because I am confused as nuts.
Let me take a shot at explaining what I am trying to do.
I have a data set of log GDP, education, democracy index, and a whole bunch
of variables for every country from 1950 to? 2000. Each country accounts for
10 observations with each observation representing the mean GDP for each 5
year interval.
Example:
Country? log GDP yr
Canada??? 9.115211 1950
Canada??? 9.205848 1955
Canada??? 9.247975 1960
Canada??? 9.429002 1965
Canada??? 9.554069 1970
Canada??? 9.719351 1975
Canada??? 9.851376 1980
Canada??? 9.937892 1985
Canada??? 10.01457 1990
Canada??? 10.04093 1995
Canada??? 10.20005 2000
USA? ? ? 9.27824? ? 1950
USA? ? ? 9.38968? ? 1955
USA? ? ? 9.415136? 1960
USA? ? ? 9.594625? 1965
USA? ? ? 9.70207? ? 1970
USA? ? ? 9.800418? 1975
USA? ? ? 9.96813? ? 1980
USA? ? ? 10.07001? 1985
USA? ? ? 10.18331? 1990
USA? ? ? 10.25446? 1995
USA? ? ? 10.4131? ? 2000
For log GDP:
I want to create a new object in R with one line for each country and? the
average log GDP from the 10 5yr interval observations. With the subset I
want to then create a table with 3 columns and 4 rows.
(I have no idea how to write the codes to create the new object. Friend said
something about conditional median.)
Columns
1) All countries
2) High income countries
3) Low income countries
Rows
1) Democracy index
2) Log GDP
3) Obs
4) Countries
To create a high and low income columns, I am using the median as the
boundary. (i.e. high income for gdp > median of the mean for each country,
low income for gdp <= median of the mean for each country.)
I hope someone can understand what I am writing here and help me out with
it.
Thanks so much!
--
View this message in context:
http://r.789695.n4.nabble.com/keep-average-values-and-delete-duplicate-rows-tp4647677.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.