Hi R community, I have a data frame with three variables, where each row adds up to 90. I want to assign a category of low, medium, or high to the values in each row - where the lowest value per row will be set to 10, the medium value set to 30, and the high value set to 50 - so each row still adds up to 90. For example: Data: Orig tree shrub grass 32 11 47 23 41 26 49 23 18 Data: New tree shrub grass 30 10 50 10 50 30 50 30 10 I am not attaching any code here as I have not been able to write anything effective! appreciate help with this! thank you, JC -- [[alternative HTML version deleted]]
Some ideas: You could create a cluster model with k=3 for each of the 3 variables, to determine what constitutes high/medium/low centroid values for each of the 3 types of plant types. Centroid values could then be used as the upper/lower boundary ranges for high/med/low. Or utilize a histogram for each variable, and use quantiles or densities, etc. to determine the natural breaks for the high/med/low ranges for each of the IVs. On 2022-05-29 15:28, Janet Choate wrote:> Hi R community, > I have a data frame with three variables, where each row adds up to 90. > I want to assign a category of low, medium, or high to the values in > each > row - where the lowest value per row will be set to 10, the medium > value > set to 30, and the high value set to 50 - so each row still adds up to > 90. > > For example: > Data: Orig > tree shrub grass > 32 11 47 > 23 41 26 > 49 23 18 > > Data: New > tree shrub grass > 30 10 50 > 10 50 30 > 50 30 10 > > I am not attaching any code here as I have not been able to write > anything > effective! appreciate help with this! > thank you, > JC > > -- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
You could write a function that deals with one row of your data, based on the order() function. E.g., > to_10_30_50 function(x) { stopifnot(is.numeric(x), length(x)==3, sum(x)==90, all(x>0)) c(10,30,50)[order(x)] } <bytecode: 0x000001912dcd1bd8> > to_10_30_50(c(23,41,26)) [1] 10 50 30 Then loop over the rows. Since this is a data.frame and not a matrix, you need to coerce each row from a single-row data.frame to a numeric vector: > data <- data.frame(tree=c(32,23,49), shrub=c(11,41,23), grass=c(47,26,18)) > for(i in 1:nrow(new)) data[i,] <- to_10_30_50(as.numeric(data[i,])) > data tree shrub grass 1 30 10 50 2 10 50 30 3 50 30 10 -Bill On Sun, May 29, 2022 at 12:29 PM Janet Choate <jsc.eco at gmail.com> wrote:> Hi R community, > I have a data frame with three variables, where each row adds up to 90. > I want to assign a category of low, medium, or high to the values in each > row - where the lowest value per row will be set to 10, the medium value > set to 30, and the high value set to 50 - so each row still adds up to 90. > > For example: > Data: Orig > tree shrub grass > 32 11 47 > 23 41 26 > 49 23 18 > > Data: New > tree shrub grass > 30 10 50 > 10 50 30 > 50 30 10 > > I am not attaching any code here as I have not been able to write anything > effective! appreciate help with this! > thank you, > JC > > -- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Orig <- read.table(text=" tree shrub grass 32 11 47 23 41 26 49 23 18 ", header=TRUE) New <- Orig for (i in seq(nrow(Orig))) New[i,] <- c(10, 30, 50)[order(unlist(Orig[i,]))] New> On May 29, 2022, at 15:28, Janet Choate <jsc.eco at gmail.com> wrote: > > Hi R community, > I have a data frame with three variables, where each row adds up to 90. > I want to assign a category of low, medium, or high to the values in each > row - where the lowest value per row will be set to 10, the medium value > set to 30, and the high value set to 50 - so each row still adds up to 90. > > For example: > Data: Orig > tree shrub grass > 32 11 47 > 23 41 26 > 49 23 18 > > Data: New > tree shrub grass > 30 10 50 > 10 50 30 > 50 30 10 > > I am not attaching any code here as I have not been able to write anything > effective! appreciate help with this! > thank you, > JC > > -- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Crmh%40temple.edu%7C165bca7d509542fc339d08da41a98821%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637894493792524879%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZxDMzULApfm9p%2BnnXhToAfvFNZx7du6e%2BbqoaNc6iYE%3D&reserved=0 > PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Crmh%40temple.edu%7C165bca7d509542fc339d08da41a98821%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637894493792524879%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oVJe7FTikuD7Y59kbg9O1k4od357HPwTcylhTn6ZLWw%3D&reserved=0 > and provide commented, minimal, self-contained, reproducible code.
Hello, Here is a way. Define a function to change the values and call it in a apply loop. But Tom's suggestions are more reasonable, you should have a good reason why to change the data. x <- ' tree shrub grass 32 11 47 23 41 26 49 23 18' orig <- read.table(textConnection(x), header = TRUE) f <- function(x) { stopifnot(length(x) == 3L) i_min <- which.min(x) i_max <- which.max(x) s <- (x[i_min] - 10) + (x[i_max] - 50) x[i_min] <- 10 x[i_max] <- 50 x[-c(i_min, i_max)] <- x[-c(i_min, i_max)] + s x } t(apply(orig, 1, f)) # tree shrub grass # [1,] 30 10 50 # [2,] 10 50 30 # [3,] 50 30 10 Hope this helps, Rui Barradas ?s 20:28 de 29/05/2022, Janet Choate escreveu:> Hi R community, > I have a data frame with three variables, where each row adds up to 90. > I want to assign a category of low, medium, or high to the values in each > row - where the lowest value per row will be set to 10, the medium value > set to 30, and the high value set to 50 - so each row still adds up to 90. > > For example: > Data: Orig > tree shrub grass > 32 11 47 > 23 41 26 > 49 23 18 > > Data: New > tree shrub grass > 30 10 50 > 10 50 30 > 50 30 10 > > I am not attaching any code here as I have not been able to write anything > effective! appreciate help with this! > thank you, > JC > > -- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Suppose your data are in a table called plant.data Suppose you want to process row i. Then plant.data[i,] is a vector of 3 numbers. ord <- order(plant.data[i,]) gives you a vector of 3 positive integers such that plant.data[i,ord] is in ascending order. plant.data[i,ord[1]] <- 10 plant.data[i,ord[2]] <- 30 plant.data[i,ord[3]] <- 50 or even plant.data[i,ord] <- c(10,30,50) Wrapping it up and tying a bow on it: new.values <- c(10,30,50) for (i in 1:nrow(plant.data)) plant.data[i,order(plant.data[i,])] <- new.values> plant.orig <- data.frame(+ tree = c(32,23,49), + shrub = c(11,41,23), + grass = c(47,26,18))> plant.origtree shrub grass 1 32 11 47 2 23 41 26 3 49 23 18> new.values <- c(10,30,50) > plant.new <- plant.orig > for (i in 1:nrow(plant.new))+ plant.new[i,order(plant.new[i,])] <- new.values> plant.newtree shrub grass 1 30 10 50 2 10 50 30 3 50 30 10 On Mon, 30 May 2022 at 07:29, Janet Choate <jsc.eco at gmail.com> wrote:> Hi R community, > I have a data frame with three variables, where each row adds up to 90. > I want to assign a category of low, medium, or high to the values in each > row - where the lowest value per row will be set to 10, the medium value > set to 30, and the high value set to 50 - so each row still adds up to 90. > > For example: > Data: Orig > tree shrub grass > 32 11 47 > 23 41 26 > 49 23 18 > > Data: New > tree shrub grass > 30 10 50 > 10 50 30 > 50 30 10 > > I am not attaching any code here as I have not been able to write anything > effective! appreciate help with this! > thank you, > JC > > -- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]