Rui Barradas
2022-Aug-26 04:14 UTC
[R] Getting minimum value of a column according a factor column of a dataframe
Hello, To return 2 rows for each Code, one for the min and another for the max, try the following. I'm borrowing Bert's by() idea, it makes everything simpler. There is a hack to have the original Code order kept, since the final result res should have two rows for each Code, see what is order()'ed below. # the output has 2 consecutive rows with # the same Code, so repeat the unique Codes i <- order(rep(unique(df1$Code), each = 2)) res <- by(df1, df1$Code, \(x) x[c(which.min(x$Q), which.max(x$Q)), ]) res <- do.call(rbind, res)[order(i), ] # remake the row names, they're ugly after rbind row.names(res) <- NULL res Hope this helps, Rui Barradas ?s 19:02 de 25/08/2022, javad bayat escreveu:> ;Dear all > First of all I appreciate you for the answers you have sent. I did the > codes that Rui provided and I got what I wanted. > " > res <- lapply(split(df1, df1$Code), \(x) x[which.min(x$Q),]) > res <- do.call(rbind, res) > i <- order(unique(df1$Code)) > res[order(i), ] > " > I think I should explain more about my request. I had a large data frame > (11059 rows and 16 columns). The Code column represented the stations code, > totally the number of stations were 128. At each station I had many > measured variables, like Q and N and O, and these variables were measured > in different years and days. The days that data were measured were > different for each station, and due to this reason I had different rows for > stations. For example, station number one (41009) had 158 rows and station > number 2 (41011) had 113 rows. Note that the station's codes are not in > order format (e.g smallest to largest). > Back to my request, I wanted to extract the minimum value of the Q for each > station (based on the Code column). The problem was that I wanted to have > other column values which were measured for this minimum of the Q. > I hope my explanation was clear enough. As I said before, I used the Rui's > codes and I got what I wanted. Although, other solutions provided by others > were all correct. > > Regarding my request, unfortunately I faced another problem. I had to > extract the maximum of the Q and put it exactly under the minimum of the Q. > Something like the below one: > " > > Code > > Y > > M > > D > > Q > > N > > O > > 41003 > > 81 > > 1 > > 19 > > 0.16 > > 7.17 > > 2.5 > > 41003 > > 79 > > 8 > > 17 > > 10.21 > > 5.5 > > 7.2 > > 41009 > > 79 > > 2 > > 21 > > 0.218 > > 5.56 > > 4.04 > 41009 79 10 20 12.24 5.3 7.1 > . > . > . > . > " > I extract both min and max according to the codes, and I have 2 dataframes, > one for the minimum values and another for the max values. Both dataframe > have a Code column which is exactly similar. > Can I extract both min and max simultaneously or I have to combine two > dataframes? > I used the rbind and merge function but they did not give the desired > results. >> df3 = merge (df1, df2, by = "Code") > The result of this code adds a second dataframe as columns to the first > one. I want the first row of the second dataframe put below the first row > of the first dataframe and so on. I used a function to do this but it seems > it does not work correctly. > >> fun2 = function(x,y){ > i = 1 > for(i in x) { > if (x[i,1] == y[i,1]){ > rbind(x[i,],y[i,]) > i = i+1 > } > } > } >> fun2(df1, df2) > > Sincerely > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Aug 25, 2022 at 9:08 PM <avi.e.gross at gmail.com> wrote: > >> Yes, Timothy, the request was not seen by all of us as the same. >> >> Indeed if the request was to show a subset of the original data consisting >> of only the rows that were the minimum for each Code and also showed ties, >> then the solution is a tad more complex. I would then do something along >> the >> lines of what others showed such as generating another column showing the >> minimum for each row and then showing only rows that matched their value in >> two columns or whatever was needed. >> >> As noted, keeping the output in a specific order was not initially >> requested. >> >> Keeping the data in some order is a common enough request but in this >> situation, I suspect the order many might want would be the one showing the >> minimums in order, not the codes in the original order. >> >> -----Original Message----- >> From: Ebert,Timothy Aaron <tebert at ufl.edu> >> Sent: Thursday, August 25, 2022 11:59 AM >> To: avi.e.gross at gmail.com >> Cc: R-help at r-project.org >> Subject: RE: [R] Getting minimum value of a column according a factor >> column >> of a dataframe >> >> My assumption (maybe wrong) was that we needed to keep the other variables. >> I want to find the values of Y, M, D, N, and O for the minimum value of Q >> within each unique value of Code, keeping the data in the original order. >> All one need to do is filter Q in the original dataframe by your answer for >> minQ. >> >> Keeping the data in the original order seems unnecessary, but that is what >> was asked in a later post. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > >
javad bayat
2022-Aug-26 04:38 UTC
[R] Getting minimum value of a column according a factor column of a dataframe
Dear all; Many thanks for your suggestions and answers. The problem was solved by the codes that Rui sent just now. Jim's codes also were correct but did not give the results in original order. Dear avi.e.gross at gmail.co, has provided codes but they just result in columns with max or min but they did not give the whole row's values related to the Q min. Thank you so much again. Sincerely On Fri, Aug 26, 2022 at 8:44 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > To return 2 rows for each Code, one for the min and another for the max, > try the following. > > I'm borrowing Bert's by() idea, it makes everything simpler. > There is a hack to have the original Code order kept, since the final > result res should have two rows for each Code, see what is order()'ed > below. > > > # the output has 2 consecutive rows with > # the same Code, so repeat the unique Codes > i <- order(rep(unique(df1$Code), each = 2)) > res <- by(df1, df1$Code, \(x) x[c(which.min(x$Q), which.max(x$Q)), ]) > res <- do.call(rbind, res)[order(i), ] > > # remake the row names, they're ugly after rbind > row.names(res) <- NULL > res > > > Hope this helps, > > Rui Barradas > > ?s 19:02 de 25/08/2022, javad bayat escreveu: > > ;Dear all > > First of all I appreciate you for the answers you have sent. I did the > > codes that Rui provided and I got what I wanted. > > " > > res <- lapply(split(df1, df1$Code), \(x) x[which.min(x$Q),]) > > res <- do.call(rbind, res) > > i <- order(unique(df1$Code)) > > res[order(i), ] > > " > > I think I should explain more about my request. I had a large data frame > > (11059 rows and 16 columns). The Code column represented the stations > code, > > totally the number of stations were 128. At each station I had many > > measured variables, like Q and N and O, and these variables were measured > > in different years and days. The days that data were measured were > > different for each station, and due to this reason I had different rows > for > > stations. For example, station number one (41009) had 158 rows and > station > > number 2 (41011) had 113 rows. Note that the station's codes are not in > > order format (e.g smallest to largest). > > Back to my request, I wanted to extract the minimum value of the Q for > each > > station (based on the Code column). The problem was that I wanted to have > > other column values which were measured for this minimum of the Q. > > I hope my explanation was clear enough. As I said before, I used the > Rui's > > codes and I got what I wanted. Although, other solutions provided by > others > > were all correct. > > > > Regarding my request, unfortunately I faced another problem. I had to > > extract the maximum of the Q and put it exactly under the minimum of the > Q. > > Something like the below one: > > " > > > > Code > > > > Y > > > > M > > > > D > > > > Q > > > > N > > > > O > > > > 41003 > > > > 81 > > > > 1 > > > > 19 > > > > 0.16 > > > > 7.17 > > > > 2.5 > > > > 41003 > > > > 79 > > > > 8 > > > > 17 > > > > 10.21 > > > > 5.5 > > > > 7.2 > > > > 41009 > > > > 79 > > > > 2 > > > > 21 > > > > 0.218 > > > > 5.56 > > > > 4.04 > > 41009 79 10 20 12.24 5.3 7.1 > > . > > . > > . > > . > > " > > I extract both min and max according to the codes, and I have 2 > dataframes, > > one for the minimum values and another for the max values. Both dataframe > > have a Code column which is exactly similar. > > Can I extract both min and max simultaneously or I have to combine two > > dataframes? > > I used the rbind and merge function but they did not give the desired > > results. > >> df3 = merge (df1, df2, by = "Code") > > The result of this code adds a second dataframe as columns to the first > > one. I want the first row of the second dataframe put below the first row > > of the first dataframe and so on. I used a function to do this but it > seems > > it does not work correctly. > > > >> fun2 = function(x,y){ > > i = 1 > > for(i in x) { > > if (x[i,1] == y[i,1]){ > > rbind(x[i,],y[i,]) > > i = i+1 > > } > > } > > } > >> fun2(df1, df2) > > > > Sincerely > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Aug 25, 2022 at 9:08 PM <avi.e.gross at gmail.com> wrote: > > > >> Yes, Timothy, the request was not seen by all of us as the same. > >> > >> Indeed if the request was to show a subset of the original data > consisting > >> of only the rows that were the minimum for each Code and also showed > ties, > >> then the solution is a tad more complex. I would then do something along > >> the > >> lines of what others showed such as generating another column showing > the > >> minimum for each row and then showing only rows that matched their > value in > >> two columns or whatever was needed. > >> > >> As noted, keeping the output in a specific order was not initially > >> requested. > >> > >> Keeping the data in some order is a common enough request but in this > >> situation, I suspect the order many might want would be the one showing > the > >> minimums in order, not the codes in the original order. > >> > >> -----Original Message----- > >> From: Ebert,Timothy Aaron <tebert at ufl.edu> > >> Sent: Thursday, August 25, 2022 11:59 AM > >> To: avi.e.gross at gmail.com > >> Cc: R-help at r-project.org > >> Subject: RE: [R] Getting minimum value of a column according a factor > >> column > >> of a dataframe > >> > >> My assumption (maybe wrong) was that we needed to keep the other > variables. > >> I want to find the values of Y, M, D, N, and O for the minimum value of > Q > >> within each unique value of Code, keeping the data in the original > order. > >> All one need to do is filter Q in the original dataframe by your answer > for > >> minQ. > >> > >> Keeping the data in the original order seems unnecessary, but that is > what > >> was asked in a later post. > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > >-- Best Regards Javad Bayat M.Sc. Environment Engineering Alternative Mail: bayat194 at yahoo.com [[alternative HTML version deleted]]
Ebert,Timothy Aaron
2022-Aug-26 12:34 UTC
[R] Getting minimum value of a column according a factor column of a dataframe
What is the point of this data frame? Beyond the fun of being able to morph data into any shape desired, there is usually some next step. I know that the values in Q are the minimum and maximum for each code, and the first value is the minimum and other value the maximum, but what comes next? If I want to get the difference in Y for the min and max Q I will need to reshape the data (or it would be easier that way). I cannot make a graph that identifies min and max Q, or an equivalent statistical analysis. It appears that no thought has gone into what to do about ties for min or max or what to do if min=max. The latter could happen if there is only one observation for Code or if there is no variability in Q. While it is very educational for me to see other answers to such problems, I just fail to see the point in the final product in this case because it will take more work to make the data useful. The choices in regards to ties might make the analysis results more (or less) meaningful. Maybe the latter works if Q is a non-decreasing function, though in that case an alternative coding is to select the first and last value for each Code. Using min and max is only relevant if Q sometimes decreases. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Rui Barradas Sent: Friday, August 26, 2022 12:14 AM To: javad bayat <j.bayat194 at gmail.com>; avi.e.gross at gmail.com Cc: R-help at r-project.org Subject: Re: [R] Getting minimum value of a column according a factor column of a dataframe [External Email] Hello, To return 2 rows for each Code, one for the min and another for the max, try the following. I'm borrowing Bert's by() idea, it makes everything simpler. There is a hack to have the original Code order kept, since the final result res should have two rows for each Code, see what is order()'ed below. # the output has 2 consecutive rows with # the same Code, so repeat the unique Codes i <- order(rep(unique(df1$Code), each = 2)) res <- by(df1, df1$Code, \(x) x[c(which.min(x$Q), which.max(x$Q)), ]) res <- do.call(rbind, res)[order(i), ] # remake the row names, they're ugly after rbind row.names(res) <- NULL res Hope this helps, Rui Barradas ?s 19:02 de 25/08/2022, javad bayat escreveu:> ;Dear all > First of all I appreciate you for the answers you have sent. I did the > codes that Rui provided and I got what I wanted. > " > res <- lapply(split(df1, df1$Code), \(x) x[which.min(x$Q),]) res <- > do.call(rbind, res) i <- order(unique(df1$Code)) res[order(i), ] " > I think I should explain more about my request. I had a large data > frame > (11059 rows and 16 columns). The Code column represented the stations > code, totally the number of stations were 128. At each station I had > many measured variables, like Q and N and O, and these variables were > measured in different years and days. The days that data were measured > were different for each station, and due to this reason I had > different rows for stations. For example, station number one (41009) > had 158 rows and station number 2 (41011) had 113 rows. Note that the > station's codes are not in order format (e.g smallest to largest). > Back to my request, I wanted to extract the minimum value of the Q for > each station (based on the Code column). The problem was that I wanted > to have other column values which were measured for this minimum of the Q. > I hope my explanation was clear enough. As I said before, I used the > Rui's codes and I got what I wanted. Although, other solutions > provided by others were all correct. > > Regarding my request, unfortunately I faced another problem. I had to > extract the maximum of the Q and put it exactly under the minimum of the Q. > Something like the below one: > " > > Code > > Y > > M > > D > > Q > > N > > O > > 41003 > > 81 > > 1 > > 19 > > 0.16 > > 7.17 > > 2.5 > > 41003 > > 79 > > 8 > > 17 > > 10.21 > > 5.5 > > 7.2 > > 41009 > > 79 > > 2 > > 21 > > 0.218 > > 5.56 > > 4.04 > 41009 79 10 20 12.24 5.3 7.1 > . > . > . > . > " > I extract both min and max according to the codes, and I have 2 > dataframes, one for the minimum values and another for the max values. > Both dataframe have a Code column which is exactly similar. > Can I extract both min and max simultaneously or I have to combine two > dataframes? > I used the rbind and merge function but they did not give the desired > results. >> df3 = merge (df1, df2, by = "Code") > The result of this code adds a second dataframe as columns to the > first one. I want the first row of the second dataframe put below the > first row of the first dataframe and so on. I used a function to do > this but it seems it does not work correctly. > >> fun2 = function(x,y){ > i = 1 > for(i in x) { > if (x[i,1] == y[i,1]){ > rbind(x[i,],y[i,]) > i = i+1 > } > } > } >> fun2(df1, df2) > > Sincerely > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Aug 25, 2022 at 9:08 PM <avi.e.gross at gmail.com> wrote: > >> Yes, Timothy, the request was not seen by all of us as the same. >> >> Indeed if the request was to show a subset of the original data >> consisting of only the rows that were the minimum for each Code and >> also showed ties, then the solution is a tad more complex. I would >> then do something along the lines of what others showed such as >> generating another column showing the minimum for each row and then >> showing only rows that matched their value in two columns or whatever >> was needed. >> >> As noted, keeping the output in a specific order was not initially >> requested. >> >> Keeping the data in some order is a common enough request but in this >> situation, I suspect the order many might want would be the one >> showing the minimums in order, not the codes in the original order. >> >> -----Original Message----- >> From: Ebert,Timothy Aaron <tebert at ufl.edu> >> Sent: Thursday, August 25, 2022 11:59 AM >> To: avi.e.gross at gmail.com >> Cc: R-help at r-project.org >> Subject: RE: [R] Getting minimum value of a column according a factor >> column of a dataframe >> >> My assumption (maybe wrong) was that we needed to keep the other variables. >> I want to find the values of Y, M, D, N, and O for the minimum value >> of Q within each unique value of Code, keeping the data in the original order. >> All one need to do is filter Q in the original dataframe by your >> answer for minQ. >> >> Keeping the data in the original order seems unnecessary, but that is >> what was asked in a later post. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsta >> t.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40u >> fl.edu%7C2cba5775104e4a1277fd08da871974f0%7C0d4da0f84a314d76ace60a623 >> 31e1b84%7C0%7C0%7C637970840667682013%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi >> MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C% >> 7C%7C&sdata=M1vxIIc8o0fs40Io8KmRUSyEYw1ad8mikRKzBuQd4rg%3D&re >> served=0 >> PLEASE do read the posting guide >> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww. >> r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.ed >> u%7C2cba5775104e4a1277fd08da871974f0%7C0d4da0f84a314d76ace60a62331e1b >> 84%7C0%7C0%7C637970840667682013%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL >> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C >> &sdata=vdMExt9qwLs13FqKl7Q2rUVCvTByUSIPYhVySnb5JI8%3D&reserve >> d=0 and provide commented, minimal, self-contained, reproducible >> code. >> > >______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7C2cba5775104e4a1277fd08da871974f0%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637970840667682013%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=M1vxIIc8o0fs40Io8KmRUSyEYw1ad8mikRKzBuQd4rg%3D&reserved=0 PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C2cba5775104e4a1277fd08da871974f0%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637970840667682013%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vdMExt9qwLs13FqKl7Q2rUVCvTByUSIPYhVySnb5JI8%3D&reserved=0 and provide commented, minimal, self-contained, reproducible code.