Diego Bellavia
2022-Feb-22 08:51 UTC
[R] How to compute ggplot2 boxplot outliers after Reshape/Melt in LONG/stacked format (in aloop)
Here it is a test Dataframe (just a subset of the real one), that is in LONG format (each subject / PID has several repeated measures over time) dput(head(testdf)) structure(list(pid = c(1, 1, 1, 1, 1, 2), age = c(52, 52, 52, 52, 52, 76), height = c(175, 175, 175, 175, 175, 164), weight = c(68, 70, 68, 68, 68, 95.5), bsa = c(1.82549041118621, 1.84811901382349, 1.82549041118621, 1.82549041118621, 1.82549041118621, 2.01198114104261 ), lab_bnp_log = c(5.96100533962327, 5.29330482472449, 5.9597158487934, 5.76205138278018, 5.03695260241363, 6.85583020611722), lab_tni_log = c(-5.49676830527187, -5.36019277026612, -5.80914299031403, -5.5727542122498, -5.59942245933196, -4.56594947283481), lab_crp_log = c(NA, 1.50407739677627, 0.955511445027436, 1.54756250871601, 1.54756250871601, 0.8754687373539), lab_bun = c(57, 47, 57, 49, 61, 52), lab_creat = c(1.2, 1, 1.3, 1.2, 1.4, 1.5 )), row.names = c(NA, 6L), class = "data.frame" I am trying to correctly label boxplot outliers using GGPLOT2 package, as a step of my exploratory analysis. To plot an Histogram and a Boxplot I use a For loop, that starts after having "melted" the dataset library(ggplot2) library(reshape2) library(tidyverse) library(egg) myplots <- melt(testdf, id.vars = "pid", na.rm = TRUE) is_outlier <- function(x) { return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x)) } pdf(paste("EXPL_QUANT.pdf",sep="")) # Start the FOR Loop, where i = Variable name and value = variable value (number) for(i in 1:length(levels(myplots$variable))) { MyHisto <- ggplot(myplots[myplots$variable == levels(myplots$variable)[i],], aes(x=value)) + geom_histogram(aes(y = ..density..), binwidth = 0.7, # Modify it to have more/less bins in the histo fill = 'yellow', alpha = 0.7, col = 'black') + geom_density(colour="#00000040", lwd = 1, fill="lightyellow", alpha=0.5) + geom_vline(aes(xintercept = summarised_value, color = stat), size = 1, data = . %>% summarise(mean = mean(value), median = median(value)) %>% gather ("stat", "summarised_value", mean:median)) + scale_color_manual(values = c("blue", "red")) + theme_bw() + theme(axis.text.x=element_text(size=14), axis.title.x=element_text(size=16), axis.text.y=element_text(size=14), axis.title.y=element_text(size=16), legend.position = c(.95, .95), legend.justification = c(1,1), legend.background = element_rect(color = "black")) + ggtitle(paste0("Explorative Analysis of ",levels(myplots$variable)[i])) } # Plot the BoxPlot dat <- myplots[myplots$variable == levels(myplots$variable)[i],] %>% tibble::rownames_to_column(var="outlier") %>% mutate(outlier = ifelse(is_outlier(myplots$value[i]), myplots$value[i], as.numeric(NA))) dat$outlier[which(is.na(dat$is_outlier))] <- as.numeric(NA) MyBox <- ggplot(dat, aes(x = i, y = value)) + geom_boxplot(fill = "royalblue" , colour = "black" , outlier.colour = "red" , outlier.shape = 20 # , outlier.size = 1 ) + coord_flip() + # Horizontal Boxplot scale_y_continuous(name = levels(myplots$variable)[i]) + # Variable name on the Y Axis scale_x_continuous(name = levels(myplots$variable)[i]) + # Variable name on the X Axis geom_text(aes(label=outlier),na.rm=TRUE,nudge_y=0.05) egg::ggarrange(MyHisto, MyBox, heights = 2:1) # Combine and Line up Histogram and Boxplot (EGG Package) } dev.off()The loop complete with no errors and the graphs are programmatically built, I can see the outliers in the boxplots depicted as red dots (that is fine) BUT I cannot see the labels. I believe the problem is that outliers column contains just NAs for each variable, so of course I cannot label anything, and I think the following line mutate(outlier = ifelse(is_outlier(myplots$value[i]), myplots$value[i], as.numeric(NA))should be modified so to provide the is_outlier() function with a proper vector of values, each grouped by PID and by variable, but, even if I am correct I do not know how to do that. Any help in understanding why the loop is not working and how to fix it would be greatly appreciated Best, Diego [[alternative HTML version deleted]]