javad bayat
2022-Aug-25 18:02 UTC
[R] Getting minimum value of a column according a factor column of a dataframe
;Dear all
First of all I appreciate you for the answers you have sent. I did the
codes that Rui provided and I got what I wanted.
"
res <- lapply(split(df1, df1$Code), \(x) x[which.min(x$Q),])
res <- do.call(rbind, res)
i <- order(unique(df1$Code))
res[order(i), ]
"
I think I should explain more about my request. I had a large data frame
(11059 rows and 16 columns). The Code column represented the stations code,
totally the number of stations were 128. At each station I had many
measured variables, like Q and N and O, and these variables were measured
in different years and days. The days that data were measured were
different for each station, and due to this reason I had different rows for
stations. For example, station number one (41009) had 158 rows and station
number 2 (41011) had 113 rows. Note that the station's codes are not in
order format (e.g smallest to largest).
Back to my request, I wanted to extract the minimum value of the Q for each
station (based on the Code column). The problem was that I wanted to have
other column values which were measured for this minimum of the Q.
I hope my explanation was clear enough. As I said before, I used the Rui's
codes and I got what I wanted. Although, other solutions provided by others
were all correct.
Regarding my request, unfortunately I faced another problem. I had to
extract the maximum of the Q and put it exactly under the minimum of the Q.
Something like the below one:
"
Code
Y
M
D
Q
N
O
41003
81
1
19
0.16
7.17
2.5
41003
79
8
17
10.21
5.5
7.2
41009
79
2
21
0.218
5.56
4.04
41009 79 10 20 12.24 5.3 7.1
.
.
.
.
"
I extract both min and max according to the codes, and I have 2 dataframes,
one for the minimum values and another for the max values. Both dataframe
have a Code column which is exactly similar.
Can I extract both min and max simultaneously or I have to combine two
dataframes?
I used the rbind and merge function but they did not give the desired
results.> df3 = merge (df1, df2, by = "Code")
The result of this code adds a second dataframe as columns to the first
one. I want the first row of the second dataframe put below the first row
of the first dataframe and so on. I used a function to do this but it seems
it does not work correctly.
> fun2 = function(x,y){
i = 1
for(i in x) {
if (x[i,1] == y[i,1]){
rbind(x[i,],y[i,])
i = i+1
}
}
}> fun2(df1, df2)
Sincerely
On Thu, Aug 25, 2022 at 9:08 PM <avi.e.gross at gmail.com> wrote:
> Yes, Timothy, the request was not seen by all of us as the same.
>
> Indeed if the request was to show a subset of the original data consisting
> of only the rows that were the minimum for each Code and also showed ties,
> then the solution is a tad more complex. I would then do something along
> the
> lines of what others showed such as generating another column showing the
> minimum for each row and then showing only rows that matched their value in
> two columns or whatever was needed.
>
> As noted, keeping the output in a specific order was not initially
> requested.
>
> Keeping the data in some order is a common enough request but in this
> situation, I suspect the order many might want would be the one showing the
> minimums in order, not the codes in the original order.
>
> -----Original Message-----
> From: Ebert,Timothy Aaron <tebert at ufl.edu>
> Sent: Thursday, August 25, 2022 11:59 AM
> To: avi.e.gross at gmail.com
> Cc: R-help at r-project.org
> Subject: RE: [R] Getting minimum value of a column according a factor
> column
> of a dataframe
>
> My assumption (maybe wrong) was that we needed to keep the other variables.
> I want to find the values of Y, M, D, N, and O for the minimum value of Q
> within each unique value of Code, keeping the data in the original order.
> All one need to do is filter Q in the original dataframe by your answer for
> minQ.
>
> Keeping the data in the original order seems unnecessary, but that is what
> was asked in a later post.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Best Regards
Javad Bayat
M.Sc. Environment Engineering
Alternative Mail: bayat194 at yahoo.com
[[alternative HTML version deleted]]
Jim Lemon
2022-Aug-25 21:42 UTC
[R] Getting minimum value of a column according a factor column of a dataframe
Hi Javad, In that case, just modify the function to extract the rows with both the minimum and maximum Q from each station df1<-read.table(text="Code Y M D Q N O 41003 81 1 19 0.16 7.17 2.5 41003 77 9 22 0.197 6.8 2.2 41003 79 7 28 0.21 4.7 6.2 41005 79 8 17 0.21 5.5 7.2 41005 80 10 30 0.21 6.84 2.6 41005 80 12 20 0.21 6.84 2.4 41005 79 6 14 0.217 5.61 3.55 41009 79 2 21 0.218 5.56 4.04 41009 79 5 27 0.218 6.4 3.12 41009 80 11 29 0.22 6.84 2.8 41009 78 5 28 0.232 6 3.2 41009 81 8 20 0.233 6.39 1.6 41009 79 9 30 0.24 5.6 7.5 41017 79 10 20 0.24 5.3 7.1 41017 80 7 30 0.24 6.73 2.6", stringsAsFactors=FALSE,header=TRUE) # define a function that returns the desired rows minmaxQrow<-function(x) return(x[c(which.min(x$Q),which.max(x$Q)),]) # apply the function to the data frame df1a<-by(df1,df1$Code,minmaxQrow) # set the result to the first element of the list df1b<-df1a[[1]] # rbind the remaining rows for(i in 2:length(df1a)) df1b<-rbind(df1b,df1a[[i]]) # display the result df1b Jim On Fri, Aug 26, 2022 at 5:25 AM javad bayat <j.bayat194 at gmail.com> wrote:> ... > I think I should explain more about my request. I had a large data frame > (11059 rows and 16 columns). The Code column represented the stations code, > totally the number of stations were 128. At each station I had many > measured variables, like Q and N and O, and these variables were measured > in different years and days. The days that data were measured were > different for each station, and due to this reason I had different rows for > stations. For example, station number one (41009) had 158 rows and station > number 2 (41011) had 113 rows. Note that the station's codes are not in > order format (e.g smallest to largest). > Back to my request, I wanted to extract the minimum value of the Q for each > station (based on the Code column). The problem was that I wanted to have > other column values which were measured for this minimum of the Q. > I hope my explanation was clear enough. As I said before, I used the Rui's > codes and I got what I wanted. Although, other solutions provided by others > were all correct. >
@vi@e@gross m@iii@g oii gm@ii@com
2022-Aug-25 21:50 UTC
[R] Getting minimum value of a column according a factor column of a dataframe
The requirements keep being clarified and it would have been very useful to know
more in advance.
To be clear. My earlier suggestion was based on JUST wanting the minimum for
each unique version of Code. Then you wanted it in the original order so that
was handled by carefully making that a factor column in the order you wanted the
output. Now the request is to throw back in ALL the columns for as many rows as
are deemed minimums.
So, not in any way demeaning the various methods others offer, I suggest you
look at database style joins. R has some in the core with names like merge() and
other packages such as dplyr have all kinds of variations on a theme.
In my case, you can extend the part of my code that makes this as a second
data.frame/tibble:
Code minQ
1 41003 0.160
2 88888 0.160
3 41005 0.210
4 41009 0.218
5 41017 0.240
Call it df2 as in:
mydf.min <-
mydf %>%
group_by(Code) %>%
summarize(minQ = min(Q))
You now have the original thing I called mydf that has a column called Code and
lots of other columns and you have a smaller one with fewer rows and columns
called mydf.min and they share a single common column.
You want to merge these two using whatever kind of join makes sense. Dplyr
offers an inner_join(), left_join(), right_join() and full_join() and you can
tweak merge() and others to do similar things.
What you seem to want is to find all rows that share both a particular value for
Code and at the same time a particular value for minQ in one versus Q in the
other. You want to ignore all others. What gets returned can have all the
original columns and perhaps also the minQ column (which can be removed) and if
two or more rows in one grouping share exactly the same minimum, may get
slightly similar but different lines matched. Is that what you want? Do note
matching floating point equally can be dangerous but in this case since the
numbers were all just read in, should be fine.
You can play with it but I tried this:
test <- left_join(mydf.min, mydf, by=c("Code", "minQ" =
"Q"))
The values returned (and not I added an 88888 category earlier in my data) look
like this:
Code minQ Y M D N O
1 41003 0.160 81 1 19 7.17 2.50
2 88888 0.160 81 1 19 7.17 2.50
3 41005 0.210 79 8 17 5.50 7.20
4 41005 0.210 80 10 30 6.84 2.60
5 41005 0.210 80 12 20 6.84 2.40
6 41009 0.218 79 2 21 5.56 4.04
7 41009 0.218 79 5 27 6.40 3.12
8 41017 0.240 79 10 20 5.30 7.10
9 41017 0.240 80 7 30 6.73 2.60
You can see there are three minimum rows for Code 41005 for example and the join
keeps them all.
You can trivially remove the minQ column by naming the original as Q or just
removing it. You can again reorder things if you wish including by sorting on
other columns ascending or descending using other functions/verbs.
As noted, this has to work for you with your larger code set and sometimes
complex enough such requirements can be done many ways as it can be as much art
as science. I personally would probably write most of the above as one long
pipeline looking a bit like:
mydf <-
read.table(?) %>%
mutate(Code=factor(..., levels=unique(...)))
result <-
mydf %>%
group_by(Code) %>%
summarize(...) %>%
left_join(mydf, by=c("Code", "minQ" = "Q"))
%>%
select(-minQ)
And of course it may be better to use the new R pipe operator if your version is
new and so on, filling in whatever details make sense to you. At the end of the
pipeline, you might want to use verbs that sort the data as described above.
My guess is you will now tell us about yet another condition suggestions like
mine do not fulfill and I will likely then ignore ?
From: javad bayat <j.bayat194 at gmail.com>
Sent: Thursday, August 25, 2022 2:02 PM
To: avi.e.gross at gmail.com
Cc: R-help at r-project.org
Subject: Re: [R] Getting minimum value of a column according a factor column of
a dataframe
;Dear all
First of all I appreciate you for the answers you have sent. I did the codes
that Rui provided and I got what I wanted.
"
res <- lapply(split(df1, df1$Code), \(x) x[which.min(x$Q),])
res <- do.call(rbind, res)
i <- order(unique(df1$Code))
res[order(i), ]
"
I think I should explain more about my request. I had a large data frame (11059
rows and 16 columns). The Code column represented the stations code, totally the
number of stations were 128. At each station I had many measured variables, like
Q and N and O, and these variables were measured in different years and days.
The days that data were measured were different for each station, and due to
this reason I had different rows for stations. For example, station number one
(41009) had 158 rows and station number 2 (41011) had 113 rows. Note that the
station's codes are not in order format (e.g smallest to largest).
Back to my request, I wanted to extract the minimum value of the Q for each
station (based on the Code column). The problem was that I wanted to have other
column values which were measured for this minimum of the Q.
I hope my explanation was clear enough. As I said before, I used the Rui's
codes and I got what I wanted. Although, other solutions provided by others were
all correct.
Regarding my request, unfortunately I faced another problem. I had to extract
the maximum of the Q and put it exactly under the minimum of the Q. Something
like the below one:
"
Code
Y
M
D
Q
N
O
41003
81
1
19
0.16
7.17
2.5
41003
79
8
17
10.21
5.5
7.2
41009
79
2
21
0.218
5.56
4.04
41009
79
10
20
12.24
5.3
7.1
.
.
.
.
"
I extract both min and max according to the codes, and I have 2 dataframes, one
for the minimum values and another for the max values. Both dataframe have a
Code column which is exactly similar.
Can I extract both min and max simultaneously or I have to combine two
dataframes?
I used the rbind and merge function but they did not give the desired results.
> df3 = merge (df1, df2, by = "Code")
The result of this code adds a second dataframe as columns to the first one. I
want the first row of the second dataframe put below the first row of the first
dataframe and so on. I used a function to do this but it seems it does not work
correctly.
> fun2 = function(x,y){
i = 1
for(i in x) {
if (x[i,1] == y[i,1]){
rbind(x[i,],y[i,])
i = i+1
}
}
}> fun2(df1, df2)
Sincerely
On Thu, Aug 25, 2022 at 9:08 PM <avi.e.gross at gmail.com
<mailto:avi.e.gross at gmail.com> > wrote:
Yes, Timothy, the request was not seen by all of us as the same.
Indeed if the request was to show a subset of the original data consisting
of only the rows that were the minimum for each Code and also showed ties,
then the solution is a tad more complex. I would then do something along the
lines of what others showed such as generating another column showing the
minimum for each row and then showing only rows that matched their value in
two columns or whatever was needed.
As noted, keeping the output in a specific order was not initially
requested.
Keeping the data in some order is a common enough request but in this
situation, I suspect the order many might want would be the one showing the
minimums in order, not the codes in the original order.
-----Original Message-----
From: Ebert,Timothy Aaron <tebert at ufl.edu <mailto:tebert at ufl.edu>
>
Sent: Thursday, August 25, 2022 11:59 AM
To: avi.e.gross at gmail.com <mailto:avi.e.gross at gmail.com>
Cc: R-help at r-project.org <mailto:R-help at r-project.org>
Subject: RE: [R] Getting minimum value of a column according a factor column
of a dataframe
My assumption (maybe wrong) was that we needed to keep the other variables.
I want to find the values of Y, M, D, N, and O for the minimum value of Q
within each unique value of Code, keeping the data in the original order.
All one need to do is filter Q in the original dataframe by your answer for
minQ.
Keeping the data in the original order seems unnecessary, but that is what
was asked in a later post.
______________________________________________
R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Best Regards
Javad Bayat
M.Sc. Environment Engineering
Alternative Mail: bayat194 at yahoo.com <mailto:bayat194 at yahoo.com>
[[alternative HTML version deleted]]
Rui Barradas
2022-Aug-26 04:14 UTC
[R] Getting minimum value of a column according a factor column of a dataframe
Hello, To return 2 rows for each Code, one for the min and another for the max, try the following. I'm borrowing Bert's by() idea, it makes everything simpler. There is a hack to have the original Code order kept, since the final result res should have two rows for each Code, see what is order()'ed below. # the output has 2 consecutive rows with # the same Code, so repeat the unique Codes i <- order(rep(unique(df1$Code), each = 2)) res <- by(df1, df1$Code, \(x) x[c(which.min(x$Q), which.max(x$Q)), ]) res <- do.call(rbind, res)[order(i), ] # remake the row names, they're ugly after rbind row.names(res) <- NULL res Hope this helps, Rui Barradas ?s 19:02 de 25/08/2022, javad bayat escreveu:> ;Dear all > First of all I appreciate you for the answers you have sent. I did the > codes that Rui provided and I got what I wanted. > " > res <- lapply(split(df1, df1$Code), \(x) x[which.min(x$Q),]) > res <- do.call(rbind, res) > i <- order(unique(df1$Code)) > res[order(i), ] > " > I think I should explain more about my request. I had a large data frame > (11059 rows and 16 columns). The Code column represented the stations code, > totally the number of stations were 128. At each station I had many > measured variables, like Q and N and O, and these variables were measured > in different years and days. The days that data were measured were > different for each station, and due to this reason I had different rows for > stations. For example, station number one (41009) had 158 rows and station > number 2 (41011) had 113 rows. Note that the station's codes are not in > order format (e.g smallest to largest). > Back to my request, I wanted to extract the minimum value of the Q for each > station (based on the Code column). The problem was that I wanted to have > other column values which were measured for this minimum of the Q. > I hope my explanation was clear enough. As I said before, I used the Rui's > codes and I got what I wanted. Although, other solutions provided by others > were all correct. > > Regarding my request, unfortunately I faced another problem. I had to > extract the maximum of the Q and put it exactly under the minimum of the Q. > Something like the below one: > " > > Code > > Y > > M > > D > > Q > > N > > O > > 41003 > > 81 > > 1 > > 19 > > 0.16 > > 7.17 > > 2.5 > > 41003 > > 79 > > 8 > > 17 > > 10.21 > > 5.5 > > 7.2 > > 41009 > > 79 > > 2 > > 21 > > 0.218 > > 5.56 > > 4.04 > 41009 79 10 20 12.24 5.3 7.1 > . > . > . > . > " > I extract both min and max according to the codes, and I have 2 dataframes, > one for the minimum values and another for the max values. Both dataframe > have a Code column which is exactly similar. > Can I extract both min and max simultaneously or I have to combine two > dataframes? > I used the rbind and merge function but they did not give the desired > results. >> df3 = merge (df1, df2, by = "Code") > The result of this code adds a second dataframe as columns to the first > one. I want the first row of the second dataframe put below the first row > of the first dataframe and so on. I used a function to do this but it seems > it does not work correctly. > >> fun2 = function(x,y){ > i = 1 > for(i in x) { > if (x[i,1] == y[i,1]){ > rbind(x[i,],y[i,]) > i = i+1 > } > } > } >> fun2(df1, df2) > > Sincerely > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Aug 25, 2022 at 9:08 PM <avi.e.gross at gmail.com> wrote: > >> Yes, Timothy, the request was not seen by all of us as the same. >> >> Indeed if the request was to show a subset of the original data consisting >> of only the rows that were the minimum for each Code and also showed ties, >> then the solution is a tad more complex. I would then do something along >> the >> lines of what others showed such as generating another column showing the >> minimum for each row and then showing only rows that matched their value in >> two columns or whatever was needed. >> >> As noted, keeping the output in a specific order was not initially >> requested. >> >> Keeping the data in some order is a common enough request but in this >> situation, I suspect the order many might want would be the one showing the >> minimums in order, not the codes in the original order. >> >> -----Original Message----- >> From: Ebert,Timothy Aaron <tebert at ufl.edu> >> Sent: Thursday, August 25, 2022 11:59 AM >> To: avi.e.gross at gmail.com >> Cc: R-help at r-project.org >> Subject: RE: [R] Getting minimum value of a column according a factor >> column >> of a dataframe >> >> My assumption (maybe wrong) was that we needed to keep the other variables. >> I want to find the values of Y, M, D, N, and O for the minimum value of Q >> within each unique value of Code, keeping the data in the original order. >> All one need to do is filter Q in the original dataframe by your answer for >> minQ. >> >> Keeping the data in the original order seems unnecessary, but that is what >> was asked in a later post. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > >