R-help Forum I am attempting to create a stacked bar chart but I have to many categories. The following code works and I end up plotting all 134 countries but really only need (say) the top 50 or so. I am trying to figure out how to further filter out the countries with the largest total medal counts to plot. The bolded red code is the point where I am thinking is the point where I would do this . I've tried several different methods but to no avail. Any suggestions? # Load data file matching NOCs with mao regions (countries) noc <- read_csv("~/NGA_Files/JuneMakeoverMonday/noc_regions.csv", col_types = cols( NOC = col_character(), region = col_character() )) # Add regions to data and remove missing points data_regions <- data %>% left_join(noc,by="NOC") %>% filter(!is.na(region)) # Subset to variables of interest medals <- data_regions %>% select(region, Medal) # count number of medals awarded to each Team medal_counts_ctry <- medals %>% filter(!is.na(Medal))%>% group_by(region, Medal) %>% summarize(Count=length(Medal)) #head(medal_counts_ctry) # order Team by total medal count levs_medal <- medal_counts_ctry %>% group_by(region) %>% summarize(Total=sum(Count)) %>% arrange(desc(Total)) medal_counts_ctry$region <- factor(medal_counts_ctry$region, levels=levs_medal$region) medal_data <- medal_counts_ctry %>% filter(medal_counts_ctry$.rows > 100) # plot ggplot(medal_data, aes(x=region, y=Count, fill=Medal)) + geom_col() + coord_flip() + scale_fill_manual(values=c("darkorange3","darkgoldenrod1","cornsilk3")) + ggtitle("Historical medal counts from Country Teams") + theme(plot.title = element_text(hjust = 0.5))> str(medal_counts_ctry)grouped_df [323 x 3] (S3: grouped_df/tbl_df/tbl/data.frame) $ region: Factor w/ 134 levels "USA","Russia",..: 101 70 70 70 29 29 29 73 73 73 ... $ Medal : Factor w/ 3 levels "Bronze","Gold",..: 1 1 2 3 1 2 3 1 2 3 ... $ Count : int [1:323] 2 8 5 4 91 91 92 9 2 5 ... - attr(*, "groups")= tibble [134 x 2] (S3: tbl_df/tbl/data.frame) ..$ region: Factor w/ 134 levels "USA","Russia",..: 1 2 3 4 5 6 7 8 9 10 ... ..$ .rows : list<int> [1:134] .. ..$ : int [1:3] 307 308 309 .. ..$ : int [1:3] 235 236 237 .. ..$ : int [1:3] 102 103 104 .. ..$ : int [1:3] 296 297 298 .. ..$ : int [1:3] 95 96 97 .. ..$ : int [1:3] 138 139 140 .. ..$ : int [1:3] 263 264 265 .. ..$ : int [1:3] 46 47 48 .. ..$ : int [1:3] 11 12 13 .. ..$ : int [1:3] 117 118 119 .. ..$ : int [1:3] 194 195 196 .. ..$ : int [1:3] 208 209 210 .. ..$ : int [1:3] 52 53 54 .. ..$ : int [1:3] 147 148 149 .. ..$ : int [1:3] 92 93 94 .. ..$ : int [1:3] 266 267 268 .. ..$ : int [1:3] 232 233 234 .. ..$ : int [1:3] 69 70 71 .. ..$ : int [1:3] 253 254 255 .......... Jeff Reichman [[alternative HTML version deleted]]
As has already been pointed out to you (several times, I believe) -- **HTML code is stripped on this *plain text* list**. Hence, "bolded, red code" is meaningless! Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, Jun 27, 2021 at 9:10 AM Jeff Reichman <reichmanj at sbcglobal.net> wrote:> R-help Forum > > I am attempting to create a stacked bar chart but I have to many > categories. > The following code works and I end up plotting all 134 countries but really > only need (say) the top 50 or so. > > I am trying to figure out how to further filter out the countries with the > largest total medal counts to plot. The bolded red code is the point where > I > am thinking is the point where I would do this . I've tried several > different methods but to no avail. Any suggestions? > > > # Load data file matching NOCs with mao regions (countries) > noc <- read_csv("~/NGA_Files/JuneMakeoverMonday/noc_regions.csv", > col_types = cols( > NOC = col_character(), > region = col_character() > )) > > # Add regions to data and remove missing points > data_regions <- data %>% > left_join(noc,by="NOC") %>% > filter(!is.na(region)) > > # Subset to variables of interest > medals <- data_regions %>% > select(region, Medal) > > # count number of medals awarded to each Team > medal_counts_ctry <- medals %>% filter(!is.na(Medal))%>% > group_by(region, Medal) %>% > summarize(Count=length(Medal)) > > #head(medal_counts_ctry) > > # order Team by total medal count > levs_medal <- medal_counts_ctry %>% > group_by(region) %>% > summarize(Total=sum(Count)) %>% > arrange(desc(Total)) > > medal_counts_ctry$region <- factor(medal_counts_ctry$region, > levels=levs_medal$region) > > medal_data <- medal_counts_ctry %>% filter(medal_counts_ctry$.rows > 100) > > # plot > ggplot(medal_data, aes(x=region, y=Count, fill=Medal)) + > geom_col() + > coord_flip() + > scale_fill_manual(values=c("darkorange3","darkgoldenrod1","cornsilk3")) + > ggtitle("Historical medal counts from Country Teams") + > theme(plot.title = element_text(hjust = 0.5)) > > > > str(medal_counts_ctry) > grouped_df [323 x 3] (S3: grouped_df/tbl_df/tbl/data.frame) > $ region: Factor w/ 134 levels "USA","Russia",..: 101 70 70 70 29 29 29 73 > 73 73 ... > $ Medal : Factor w/ 3 levels "Bronze","Gold",..: 1 1 2 3 1 2 3 1 2 3 ... > $ Count : int [1:323] 2 8 5 4 91 91 92 9 2 5 ... > - attr(*, "groups")= tibble [134 x 2] (S3: tbl_df/tbl/data.frame) > ..$ region: Factor w/ 134 levels "USA","Russia",..: 1 2 3 4 5 6 7 8 9 10 > ... > ..$ .rows : list<int> [1:134] > .. ..$ : int [1:3] 307 308 309 > .. ..$ : int [1:3] 235 236 237 > .. ..$ : int [1:3] 102 103 104 > .. ..$ : int [1:3] 296 297 298 > .. ..$ : int [1:3] 95 96 97 > .. ..$ : int [1:3] 138 139 140 > .. ..$ : int [1:3] 263 264 265 > .. ..$ : int [1:3] 46 47 48 > .. ..$ : int [1:3] 11 12 13 > .. ..$ : int [1:3] 117 118 119 > .. ..$ : int [1:3] 194 195 196 > .. ..$ : int [1:3] 208 209 210 > .. ..$ : int [1:3] 52 53 54 > .. ..$ : int [1:3] 147 148 149 > .. ..$ : int [1:3] 92 93 94 > .. ..$ : int [1:3] 266 267 268 > .. ..$ : int [1:3] 232 233 234 > .. ..$ : int [1:3] 69 70 71 > .. ..$ : int [1:3] 253 254 255 .......... > > Jeff Reichman > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hello, Something like this? # count number of medals awarded to each Team medal_counts_ctry <- medals %>% na.omit() %>% count(region, Medal, name = "Count") #head(medal_counts_ctry) # order Team by total medal count levs_medal <- medal_counts_ctry %>% group_by(region) %>% summarize(Total = sum(Count)) %>% arrange(desc(Total)) %>% pull(region) medal_counts_ctry$region <- factor(medal_counts_ctry$region, levels = levs_medal) # keep top 50 medal counts top_count <- 50 medal_data <- medal_counts_ctry %>% slice_max(order_by = Count, n = top_count) Hope this helps, Rui Barradas ?s 17:10 de 27/06/21, Jeff Reichman escreveu:> R-help Forum > > I am attempting to create a stacked bar chart but I have to many categories. > The following code works and I end up plotting all 134 countries but really > only need (say) the top 50 or so. > > I am trying to figure out how to further filter out the countries with the > largest total medal counts to plot. The bolded red code is the point where I > am thinking is the point where I would do this . I've tried several > different methods but to no avail. Any suggestions? > > > # Load data file matching NOCs with mao regions (countries) > noc <- read_csv("~/NGA_Files/JuneMakeoverMonday/noc_regions.csv", > col_types = cols( > NOC = col_character(), > region = col_character() > )) > > # Add regions to data and remove missing points > data_regions <- data %>% > left_join(noc,by="NOC") %>% > filter(!is.na(region)) > > # Subset to variables of interest > medals <- data_regions %>% > select(region, Medal) > > # count number of medals awarded to each Team > medal_counts_ctry <- medals %>% filter(!is.na(Medal))%>% > group_by(region, Medal) %>% > summarize(Count=length(Medal)) > > #head(medal_counts_ctry) > > # order Team by total medal count > levs_medal <- medal_counts_ctry %>% > group_by(region) %>% > summarize(Total=sum(Count)) %>% > arrange(desc(Total)) > > medal_counts_ctry$region <- factor(medal_counts_ctry$region, > levels=levs_medal$region) > > medal_data <- medal_counts_ctry %>% filter(medal_counts_ctry$.rows > 100) > > # plot > ggplot(medal_data, aes(x=region, y=Count, fill=Medal)) + > geom_col() + > coord_flip() + > scale_fill_manual(values=c("darkorange3","darkgoldenrod1","cornsilk3")) + > ggtitle("Historical medal counts from Country Teams") + > theme(plot.title = element_text(hjust = 0.5)) > > >> str(medal_counts_ctry) > grouped_df [323 x 3] (S3: grouped_df/tbl_df/tbl/data.frame) > $ region: Factor w/ 134 levels "USA","Russia",..: 101 70 70 70 29 29 29 73 > 73 73 ... > $ Medal : Factor w/ 3 levels "Bronze","Gold",..: 1 1 2 3 1 2 3 1 2 3 ... > $ Count : int [1:323] 2 8 5 4 91 91 92 9 2 5 ... > - attr(*, "groups")= tibble [134 x 2] (S3: tbl_df/tbl/data.frame) > ..$ region: Factor w/ 134 levels "USA","Russia",..: 1 2 3 4 5 6 7 8 9 10 > ... > ..$ .rows : list<int> [1:134] > .. ..$ : int [1:3] 307 308 309 > .. ..$ : int [1:3] 235 236 237 > .. ..$ : int [1:3] 102 103 104 > .. ..$ : int [1:3] 296 297 298 > .. ..$ : int [1:3] 95 96 97 > .. ..$ : int [1:3] 138 139 140 > .. ..$ : int [1:3] 263 264 265 > .. ..$ : int [1:3] 46 47 48 > .. ..$ : int [1:3] 11 12 13 > .. ..$ : int [1:3] 117 118 119 > .. ..$ : int [1:3] 194 195 196# count number of medals awarded to each Teammedal_counts_ctry <- medals %>% na.omit() %>% count(region, Medal, name = "Count") #head(medal_counts_ctry) # order Team by total medal count levs_medal <- medal_counts_ctry %>% group_by(region) %>% summarize(Total = sum(Count)) %>% arrange(desc(Total)) %>% pull(region) medal_counts_ctry$region <- factor(medal_counts_ctry$region, levels = levs_medal) # keep top 50 medal counts top_count <- 50 medal_data <- medal_counts_ctry %>% slice_max(order_by = Count, n = top_count)> .. ..$ : int [1:3] 208 209 210 > .. ..$ : int [1:3] 52 53 54# count number of medals awarded to each Teammedal_counts_ctry <- medals %>% na.omit() %>% count(region, Medal, name = "Count") #head(medal_counts_ctry) # order Team by total medal count levs_medal <- medal_counts_ctry %>% group_by(region) %>% summarize(Total = sum(Count)) %>% arrange(desc(Total)) %>% pull(region) medal_counts_ctry$region <- factor(medal_counts_ctry$region, levels = levs_medal) # keep top 50 medal counts top_count <- 50 medal_data <- medal_counts_ctry %>% slice_max(order_by = Count, n = top_count)> .. ..$ : int [1:3] 147 148 149 > .. ..$ : int [1:3] 92 93 94 > .. ..$ : int [1:3] 266 267 268 > .. ..$ : int [1:3] 232 233 234 > .. ..$ : int [1:3] 69 70 71 > .. ..$ : int [1:3] 253 254 255 .......... > > Jeff Reichman > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >