Hi all I have a large dataframe with (among others) a categorical variable of 52 levels and would like to create a barplot with the bars ordered in decreasing frequency of the levels. I belive it is referred to as a pareto plot. Consider a subset where I keep only the categorical variable in question. # Example: v1 = c("aa", "cc", "bb", "bb", "cc", "bb") df = data.frame(v1=v1) # How can I tell ggplot to sort the bars? # First bar = "bb" (3), second bar "cc" (2) and third bar "aa" (1) # with 52 levels in the real data frame (many with equal counts) # and other similar variables, I hope it is possible to script this efficiently ggplot(df) + geom_bar(aes(v1)) Thank you in advance Morten -- View this message in context: http://r.789695.n4.nabble.com/ggplot2-Pareto-plot-Barplot-in-decreasing-frequency-tp2965796p2965796.html Sent from the R help mailing list archive at Nabble.com.
David Winsemius
2010-Oct-06 22:03 UTC
[R] ggplot2 Pareto plot (Barplot in decreasing frequency)
On Oct 6, 2010, at 5:19 PM, Morten wrote:> > Hi all > > I have a large dataframe with (among others) a categorical variable > of 52 > levels and would like to create a barplot with the bars ordered in > decreasing frequency of the levels. I belive it is referred to as a > pareto > plot. > > Consider a subset where I keep only the categorical variable in > question. > > # Example: > > df = data.frame(v1=v1) v1 = c("aa", "cc", "bb", "bb", "cc", "bb") > > # How can I tell ggplot to sort the bars?I seem to remember that sorting the levels is the way this has been answered in the past for ggplot and ggplot2. So table() "v1" and then reverse: > v2 <- factor(v1, level=names(table(v1))[rev(order(table(v1)))] ) > v2 [1] aa cc bb bb cc bb Levels: bb cc aa> # First bar = "bb" (3), second bar "cc" (2) and third bar "aa" (1) > # with 52 levels in the real data frame (many with equal counts) > # and other similar variables, I hope it is possible to script this > efficiently > > ggplot(df) + geom_bar(aes(v1))Substituting v2 for v1 in that code seemed to do the trick in ggplot2.> > Thank you in advance > > Morten > -- > View this message in context: http://r.789695.n4.nabble.com/ggplot2-Pareto-plot-Barplot-in-decreasing-frequency-tp2965796p2965796.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Dennis Murphy
2010-Oct-07 03:48 UTC
[R] ggplot2 Pareto plot (Barplot in decreasing frequency)
Hi: # Generate a factor and a random set of counts/frequencies df <- data.frame(gp = LETTERS[1:20], frq = rpois(20, 30)) # bar plot in lexicographic order of factor levels ggplot(df, aes(x = gp)) + geom_bar(aes(y = frq), stat = 'identity') # bar plot in increasing order of frequency ggplot(df, aes(x = reorder(gp, frq))) + geom_bar(aes(y = frq), stat 'identity') # bar plot in decreasing order of frequency ggplot(df, aes(x = reorder(gp, desc(frq)))) + geom_bar(aes(y = frq), stat 'identity') Reordering the levels of the factor according to some measure of interest is precisely what the reorder(fac, measure) function does; see ?reorder.factor in the base package. Another example below. On Wed, Oct 6, 2010 at 2:19 PM, Morten <Morten.Lindberg@siv.no> wrote:> > Hi all > > I have a large dataframe with (among others) a categorical variable of 52 > levels and would like to create a barplot with the bars ordered in > decreasing frequency of the levels. I belive it is referred to as a pareto > plot. > > Consider a subset where I keep only the categorical variable in question. > > # Example: > v1 = c("aa", "cc", "bb", "bb", "cc", "bb") > df = data.frame(v1=v1) > > # How can I tell ggplot to sort the bars? > # First bar = "bb" (3), second bar "cc" (2) and third bar "aa" (1) > # with 52 levels in the real data frame (many with equal counts) > # and other similar variables, I hope it is possible to script this > efficiently >Another toy example: x2 <- sample(c('aa', 'bb', 'cc', 'dd', 'ee'), 200, replace = TRUE) # Create a data frame from a table prior to calling ggplot() x2df <- as.data.frame(table(x2)) # summarize table into data frame x2df # your mileage will vary... x2 Freq 1 aa 49 2 bb 42 3 cc 31 4 dd 40 5 ee 38> str(df)'data.frame': 20 obs. of 2 variables: $ gp : Factor w/ 20 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ... $ frq: num 25 30 33 25 29 29 25 26 31 31 ... # plot in lexicographic order ggplot(x2df, aes(x = x2)) + geom_bar(aes(y = Freq), stat = 'identity') # plot in order of increasing frequency ggplot(x2df, aes(x = reorder(x2, Freq))) + geom_bar(aes(y = Freq), stat 'identity') HTH, Dennis ggplot(df) + geom_bar(aes(v1))> > Thank you in advance > > Morten > -- > View this message in context: > http://r.789695.n4.nabble.com/ggplot2-Pareto-plot-Barplot-in-decreasing-frequency-tp2965796p2965796.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]