Hi R community, I have a data frame with three variables, where each row adds up to 90. I want to assign a category of low, medium, or high to the values in each row - where the lowest value per row will be set to 10, the medium value set to 30, and the high value set to 50 - so each row still adds up to 90. For example: Data: Orig tree shrub grass 32 11 47 23 41 26 49 23 18 Data: New tree shrub grass 30 10 50 10 50 30 50 30 10 I am not attaching any code here as I have not been able to write anything effective! appreciate help with this! thank you, JC -- [[alternative HTML version deleted]]
Some ideas: You could create a cluster model with k=3 for each of the 3 variables, to determine what constitutes high/medium/low centroid values for each of the 3 types of plant types. Centroid values could then be used as the upper/lower boundary ranges for high/med/low. Or utilize a histogram for each variable, and use quantiles or densities, etc. to determine the natural breaks for the high/med/low ranges for each of the IVs. On 2022-05-29 15:28, Janet Choate wrote:> Hi R community, > I have a data frame with three variables, where each row adds up to 90. > I want to assign a category of low, medium, or high to the values in > each > row - where the lowest value per row will be set to 10, the medium > value > set to 30, and the high value set to 50 - so each row still adds up to > 90. > > For example: > Data: Orig > tree shrub grass > 32 11 47 > 23 41 26 > 49 23 18 > > Data: New > tree shrub grass > 30 10 50 > 10 50 30 > 50 30 10 > > I am not attaching any code here as I have not been able to write > anything > effective! appreciate help with this! > thank you, > JC > > -- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
You could write a function that deals with one row of your data, based on
the order() function. E.g.,
> to_10_30_50
function(x) {
stopifnot(is.numeric(x), length(x)==3, sum(x)==90, all(x>0))
c(10,30,50)[order(x)]
}
<bytecode: 0x000001912dcd1bd8>
> to_10_30_50(c(23,41,26))
[1] 10 50 30
Then loop over the rows. Since this is a data.frame and not a matrix, you
need to coerce each row from a single-row data.frame to a numeric vector:
> data <- data.frame(tree=c(32,23,49), shrub=c(11,41,23),
grass=c(47,26,18))
> for(i in 1:nrow(new)) data[i,] <- to_10_30_50(as.numeric(data[i,]))
> data
tree shrub grass
1 30 10 50
2 10 50 30
3 50 30 10
-Bill
On Sun, May 29, 2022 at 12:29 PM Janet Choate <jsc.eco at gmail.com>
wrote:
> Hi R community,
> I have a data frame with three variables, where each row adds up to 90.
> I want to assign a category of low, medium, or high to the values in each
> row - where the lowest value per row will be set to 10, the medium value
> set to 30, and the high value set to 50 - so each row still adds up to 90.
>
> For example:
> Data: Orig
> tree shrub grass
> 32 11 47
> 23 41 26
> 49 23 18
>
> Data: New
> tree shrub grass
> 30 10 50
> 10 50 30
> 50 30 10
>
> I am not attaching any code here as I have not been able to write anything
> effective! appreciate help with this!
> thank you,
> JC
>
> --
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
Orig <- read.table(text=" tree shrub grass 32 11 47 23 41 26 49 23 18 ", header=TRUE) New <- Orig for (i in seq(nrow(Orig))) New[i,] <- c(10, 30, 50)[order(unlist(Orig[i,]))] New> On May 29, 2022, at 15:28, Janet Choate <jsc.eco at gmail.com> wrote: > > Hi R community, > I have a data frame with three variables, where each row adds up to 90. > I want to assign a category of low, medium, or high to the values in each > row - where the lowest value per row will be set to 10, the medium value > set to 30, and the high value set to 50 - so each row still adds up to 90. > > For example: > Data: Orig > tree shrub grass > 32 11 47 > 23 41 26 > 49 23 18 > > Data: New > tree shrub grass > 30 10 50 > 10 50 30 > 50 30 10 > > I am not attaching any code here as I have not been able to write anything > effective! appreciate help with this! > thank you, > JC > > -- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Crmh%40temple.edu%7C165bca7d509542fc339d08da41a98821%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637894493792524879%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZxDMzULApfm9p%2BnnXhToAfvFNZx7du6e%2BbqoaNc6iYE%3D&reserved=0 > PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Crmh%40temple.edu%7C165bca7d509542fc339d08da41a98821%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637894493792524879%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oVJe7FTikuD7Y59kbg9O1k4od357HPwTcylhTn6ZLWw%3D&reserved=0 > and provide commented, minimal, self-contained, reproducible code.
Hello,
Here is a way. Define a function to change the values and call it in a
apply loop. But Tom's suggestions are more reasonable, you should have a
good reason why to change the data.
x <- '
tree shrub grass
32 11 47
23 41 26
49 23 18'
orig <- read.table(textConnection(x), header = TRUE)
f <- function(x) {
stopifnot(length(x) == 3L)
i_min <- which.min(x)
i_max <- which.max(x)
s <- (x[i_min] - 10) + (x[i_max] - 50)
x[i_min] <- 10
x[i_max] <- 50
x[-c(i_min, i_max)] <- x[-c(i_min, i_max)] + s
x
}
t(apply(orig, 1, f))
# tree shrub grass
# [1,] 30 10 50
# [2,] 10 50 30
# [3,] 50 30 10
Hope this helps,
Rui Barradas
?s 20:28 de 29/05/2022, Janet Choate escreveu:> Hi R community,
> I have a data frame with three variables, where each row adds up to 90.
> I want to assign a category of low, medium, or high to the values in each
> row - where the lowest value per row will be set to 10, the medium value
> set to 30, and the high value set to 50 - so each row still adds up to 90.
>
> For example:
> Data: Orig
> tree shrub grass
> 32 11 47
> 23 41 26
> 49 23 18
>
> Data: New
> tree shrub grass
> 30 10 50
> 10 50 30
> 50 30 10
>
> I am not attaching any code here as I have not been able to write anything
> effective! appreciate help with this!
> thank you,
> JC
>
> --
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Suppose your data are in a table called plant.data
Suppose you want to process row i.
Then plant.data[i,] is a vector of 3 numbers.
ord <- order(plant.data[i,])
gives you a vector of 3 positive integers such
that plant.data[i,ord] is in ascending order.
plant.data[i,ord[1]] <- 10
plant.data[i,ord[2]] <- 30
plant.data[i,ord[3]] <- 50
or even
plant.data[i,ord] <- c(10,30,50)
Wrapping it up and tying a bow on it:
new.values <- c(10,30,50)
for (i in 1:nrow(plant.data))
plant.data[i,order(plant.data[i,])] <- new.values
> plant.orig <- data.frame(
+ tree = c(32,23,49),
+ shrub = c(11,41,23),
+ grass = c(47,26,18))> plant.orig
tree shrub grass
1 32 11 47
2 23 41 26
3 49 23 18> new.values <- c(10,30,50)
> plant.new <- plant.orig
> for (i in 1:nrow(plant.new))
+ plant.new[i,order(plant.new[i,])] <- new.values> plant.new
tree shrub grass
1 30 10 50
2 10 50 30
3 50 30 10
On Mon, 30 May 2022 at 07:29, Janet Choate <jsc.eco at gmail.com> wrote:
> Hi R community,
> I have a data frame with three variables, where each row adds up to 90.
> I want to assign a category of low, medium, or high to the values in each
> row - where the lowest value per row will be set to 10, the medium value
> set to 30, and the high value set to 50 - so each row still adds up to 90.
>
> For example:
> Data: Orig
> tree shrub grass
> 32 11 47
> 23 41 26
> 49 23 18
>
> Data: New
> tree shrub grass
> 30 10 50
> 10 50 30
> 50 30 10
>
> I am not attaching any code here as I have not been able to write anything
> effective! appreciate help with this!
> thank you,
> JC
>
> --
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]