thr3ads.net - R help - [R] Adding SORT to UNIQUE [Dec 2021]

If this information is useful, please help other people find it:
Share via:

Fox, John

2021-Dec-21 17:28 UTC

[R] Adding SORT to UNIQUE

Dear Jeff,

?On 2021-12-21, 11:59 AM, "R-help on behalf of Jeff Newmiller"
<r-help-bounces at r-project.org on behalf of jdnewmil at dcn.davis.ca.us>
wrote:

    Intuitive, perhaps, but noticably slower. 

I think that in most applications, one wouldn't notice the difference; for
example:
> D <- data.frame(matrix(rnorm(1000*1e6), 1e6, 1000))
> microbenchmark(D[, 1])Unit: microseconds
   expr   min    lq    mean median     uq    max neval
 D[, 1] 3.321 3.362 3.98561  3.444 3.5875 51.291   100
> microbenchmark(D[[1]])Unit: microseconds
   expr   min    lq    mean median     uq    max neval
 D[[1]] 1.722 1.763 1.99137  1.804 1.8655 17.876   100

Best,
 John


    And it doesn't work on tibbles by design. Data frames are lists of
columns.


    On December 21, 2021 8:38:35 AM PST, Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
    >On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
    >> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
    >>> Thanks for the reply.
    >>>
    >>> sort(unique(Data[1]))
    >>> Error in `[.data.frame`(x, order(x, na.last = na.last,
decreasing     >>> decreasing)) :
    >>>      undefined columns selected
    >> 
    >> That's the wrong syntax:  Data[1] is not "column one of
Data".  Use
    >> Data[[1]] for that, so
    >> 
    >>     sort(unique(Data[[1]]))
    >
    >Actually, I'd probably recommend
    >
    >   sort(unique(Data[, 1]))
    >
    >instead.  This treats Data as a matrix rather than as a list. 
    >Dataframes are lists that look like matrices, but to me the matrix 
    >aspect is usually more intuitive.
    >
    >Duncan Murdoch
    >
    >> 
    >> I think Rui already pointed out the typo in the quoted text
below...
    >> 
    >> Duncan Murdoch
    >> 
    >>>
    >>> The recommended syntax did not work, as listed above.
    >>>
    >>> What I want is the sort of distinct column output. Again, the
column may
    >>> be text or numbers. This is a huge analysis effort with data
coming at
    >>> me from many different sources.
    >>>
    >>>
    >>> *Stephen Dawson, DSL*
    >>> /Executive Strategy Consultant/
    >>> Business & Technology
    >>> +1 (865) 804-3454
    >>> http://www.shdawson.com <http://www.shdawson.com>
    >>>
    >>>
    >>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
    >>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help
wrote:
    >>>>> Thanks everyone for the replies.
    >>>>>
    >>>>> It is clear one either needs to write a function or put
the unique
    >>>>> entries into another dataframe.
    >>>>>
    >>>>> It seems odd R cannot sort a list of unique column
entries with ease.
    >>>>> Python and SQL can do it with ease.
    >>>>
    >>>> I've seen several responses that looked pretty simple. 
It's hard to
    >>>> beat sort(unique(x)), though there's a fair bit of
confusion about
    >>>> what you actually want.  Maybe you should post an example
of the code
    >>>> you'd use in Python?
    >>>>
    >>>> Duncan Murdoch
    >>>>
    >>>>>
    >>>>> QUESTION
    >>>>> Is there a simpler means than other than the unique
function to capture
    >>>>> distinct column entries, then sort that list?
    >>>>>
    >>>>>
    >>>>> *Stephen Dawson, DSL*
    >>>>> /Executive Strategy Consultant/
    >>>>> Business & Technology
    >>>>> +1 (865) 804-3454
    >>>>> http://www.shdawson.com <http://www.shdawson.com>
    >>>>>
    >>>>>
    >>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
    >>>>>> Hello,
    >>>>>>
    >>>>>> Inline.
    >>>>>>
    >>>>>> ?s 21:18 de 20/12/21, Stephen H. Dawson, DSL via
R-help escreveu:
    >>>>>>> Thanks.
    >>>>>>>
    >>>>>>> sort(unique(Data[[1]]))
    >>>>>>>
    >>>>>>> This syntax provides row numbers, not column
values.
    >>>>>>
    >>>>>> This is not right.
    >>>>>> The syntax Data[1] extracts a sub-data.frame, the
syntax Data[[1]]
    >>>>>> extracts the column vector.
    >>>>>>
    >>>>>> As for my previous answer, it was not addressing
the question, I
    >>>>>> misinterpreted it as being a question on how to
sort by numeric order
    >>>>>> when the data is not numeric. Here is a, hopefully,
complete answer.
    >>>>>> Still with package stringr.
    >>>>>>
    >>>>>>
    >>>>>> cols_to_sort <- 1:4
    >>>>>>
    >>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){
    >>>>>>      stringr::str_sort(unique(x), numeric = TRUE)
    >>>>>> })
    >>>>>>
    >>>>>>
    >>>>>> Or using Avi's suggestion of writing a function
to do all the work and
    >>>>>> simplify the lapply loop later,
    >>>>>>
    >>>>>>
    >>>>>> unisort2 <- function(vec, ...)
stringr::str_sort(unique(vec), ...)
    >>>>>> Data2 <- lapply(Data[cols_to_sort], unisort,
numeric = TRUE)
    >>>>>>
    >>>>>>
    >>>>>> Hope this helps,
    >>>>>>
    >>>>>> Rui Barradas
    >>>>>>
    >>>>>>
    >>>>>>>
    >>>>>>> *Stephen Dawson, DSL*
    >>>>>>> /Executive Strategy Consultant/
    >>>>>>> Business & Technology
    >>>>>>> +1 (865) 804-3454
    >>>>>>> http://www.shdawson.com
<http://www.shdawson.com>
    >>>>>>>
    >>>>>>>
    >>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL
via R-help wrote:
    >>>>>>>> Hi,
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> Running a simple syntax set to review
entries in dataframe columns.
    >>>>>>>> Here is the working code.
    >>>>>>>>
    >>>>>>>> Data <-
read.csv("./input/Source.csv", header=T)
    >>>>>>>> describe(Data)
    >>>>>>>> summary(Data)
    >>>>>>>> unique(Data[1])
    >>>>>>>> unique(Data[2])
    >>>>>>>> unique(Data[3])
    >>>>>>>> unique(Data[4])
    >>>>>>>>
    >>>>>>>> I would like to add sort the unique
entries. The data in the various
    >>>>>>>> columns are not defined as numbers, but
also text. I realize 1 and
    >>>>>>>> 10 will not sort properly, as the column is
not defined as a number,
    >>>>>>>> but want to see what I have in the columns
viewed as sorted.
    >>>>>>>>
    >>>>>>>> QUESTION
    >>>>>>>> What is the best process to sort unique
output, please?
    >>>>>>>>
    >>>>>>>>
    >>>>>>>> Thanks.
    >>>>>>>
    >>>>>>> ______________________________________________
    >>>>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
    >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
    >>>>>>> PLEASE do read the posting guide
    >>>>>>> http://www.R-project.org/posting-guide.html
    >>>>>>> and provide commented, minimal, self-contained,
reproducible code.
    >>>>>>
    >>>>>
    >>>>> ______________________________________________
    >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
    >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
    >>>>> PLEASE do read the posting guide
    >>>>> http://www.R-project.org/posting-guide.html
    >>>>> and provide commented, minimal, self-contained,
reproducible code.
    >>>>
    >>>>
    >>>
    >>>
    >>
    >
    >______________________________________________
    >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    >https://stat.ethz.ch/mailman/listinfo/r-help
    >PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
    >and provide commented, minimal, self-contained, reproducible code.

    -- 
    Sent from my phone. Please excuse my brevity.

    ______________________________________________
    R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

Jeff Newmiller

2021-Dec-21 17:58 UTC

head link

[R] Adding SORT to UNIQUE

When your brain is wired to treat a data frame like a matrix, then you think
things like

for ( col in colnames( col ) ) {
  idx <- expr
  D[ col, idx ] <- otherexpr
}

are reasonable, when

for ( col in colnames( col ) ) {
  idx <- expr
  D[[ col ]][ idx ] <- otherexpr
}

does actually run significantly faster.


On December 21, 2021 9:28:52 AM PST, "Fox, John" <jfox at
mcmaster.ca> wrote:>Dear Jeff,
>
>?On 2021-12-21, 11:59 AM, "R-help on behalf of Jeff Newmiller"
<r-help-bounces at r-project.org on behalf of jdnewmil at dcn.davis.ca.us>
wrote:
>
>    Intuitive, perhaps, but noticably slower. 
>
>I think that in most applications, one wouldn't notice the difference;
for example:
>
>> D <- data.frame(matrix(rnorm(1000*1e6), 1e6, 1000))
>
>> microbenchmark(D[, 1])
>Unit: microseconds
>   expr   min    lq    mean median     uq    max neval
> D[, 1] 3.321 3.362 3.98561  3.444 3.5875 51.291   100
>
>> microbenchmark(D[[1]])
>Unit: microseconds
>   expr   min    lq    mean median     uq    max neval
> D[[1]] 1.722 1.763 1.99137  1.804 1.8655 17.876   100
>
>Best,
> John
>
>
>    And it doesn't work on tibbles by design. Data frames are lists of
columns.
>
>
>    On December 21, 2021 8:38:35 AM PST, Duncan Murdoch <murdoch.duncan
at gmail.com> wrote:
>    >On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>    >> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>    >>> Thanks for the reply.
>    >>>
>    >>> sort(unique(Data[1]))
>    >>> Error in `[.data.frame`(x, order(x, na.last = na.last,
decreasing >    >>> decreasing)) :
>    >>>      undefined columns selected
>    >> 
>    >> That's the wrong syntax:  Data[1] is not "column one
of Data".  Use
>    >> Data[[1]] for that, so
>    >> 
>    >>     sort(unique(Data[[1]]))
>    >
>    >Actually, I'd probably recommend
>    >
>    >   sort(unique(Data[, 1]))
>    >
>    >instead.  This treats Data as a matrix rather than as a list. 
>    >Dataframes are lists that look like matrices, but to me the matrix 
>    >aspect is usually more intuitive.
>    >
>    >Duncan Murdoch
>    >
>    >> 
>    >> I think Rui already pointed out the typo in the quoted text
below...
>    >> 
>    >> Duncan Murdoch
>    >> 
>    >>>
>    >>> The recommended syntax did not work, as listed above.
>    >>>
>    >>> What I want is the sort of distinct column output. Again,
the column may
>    >>> be text or numbers. This is a huge analysis effort with
data coming at
>    >>> me from many different sources.
>    >>>
>    >>>
>    >>> *Stephen Dawson, DSL*
>    >>> /Executive Strategy Consultant/
>    >>> Business & Technology
>    >>> +1 (865) 804-3454
>    >>> http://www.shdawson.com <http://www.shdawson.com>
>    >>>
>    >>>
>    >>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
>    >>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via
R-help wrote:
>    >>>>> Thanks everyone for the replies.
>    >>>>>
>    >>>>> It is clear one either needs to write a function or
put the unique
>    >>>>> entries into another dataframe.
>    >>>>>
>    >>>>> It seems odd R cannot sort a list of unique column
entries with ease.
>    >>>>> Python and SQL can do it with ease.
>    >>>>
>    >>>> I've seen several responses that looked pretty
simple.  It's hard to
>    >>>> beat sort(unique(x)), though there's a fair bit of
confusion about
>    >>>> what you actually want.  Maybe you should post an
example of the code
>    >>>> you'd use in Python?
>    >>>>
>    >>>> Duncan Murdoch
>    >>>>
>    >>>>>
>    >>>>> QUESTION
>    >>>>> Is there a simpler means than other than the unique
function to capture
>    >>>>> distinct column entries, then sort that list?
>    >>>>>
>    >>>>>
>    >>>>> *Stephen Dawson, DSL*
>    >>>>> /Executive Strategy Consultant/
>    >>>>> Business & Technology
>    >>>>> +1 (865) 804-3454
>    >>>>> http://www.shdawson.com
<http://www.shdawson.com>
>    >>>>>
>    >>>>>
>    >>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>    >>>>>> Hello,
>    >>>>>>
>    >>>>>> Inline.
>    >>>>>>
>    >>>>>> ?s 21:18 de 20/12/21, Stephen H. Dawson, DSL
via R-help escreveu:
>    >>>>>>> Thanks.
>    >>>>>>>
>    >>>>>>> sort(unique(Data[[1]]))
>    >>>>>>>
>    >>>>>>> This syntax provides row numbers, not
column values.
>    >>>>>>
>    >>>>>> This is not right.
>    >>>>>> The syntax Data[1] extracts a sub-data.frame,
the syntax Data[[1]]
>    >>>>>> extracts the column vector.
>    >>>>>>
>    >>>>>> As for my previous answer, it was not
addressing the question, I
>    >>>>>> misinterpreted it as being a question on how to
sort by numeric order
>    >>>>>> when the data is not numeric. Here is a,
hopefully, complete answer.
>    >>>>>> Still with package stringr.
>    >>>>>>
>    >>>>>>
>    >>>>>> cols_to_sort <- 1:4
>    >>>>>>
>    >>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){
>    >>>>>>      stringr::str_sort(unique(x), numeric =
TRUE)
>    >>>>>> })
>    >>>>>>
>    >>>>>>
>    >>>>>> Or using Avi's suggestion of writing a
function to do all the work and
>    >>>>>> simplify the lapply loop later,
>    >>>>>>
>    >>>>>>
>    >>>>>> unisort2 <- function(vec, ...)
stringr::str_sort(unique(vec), ...)
>    >>>>>> Data2 <- lapply(Data[cols_to_sort], unisort,
numeric = TRUE)
>    >>>>>>
>    >>>>>>
>    >>>>>> Hope this helps,
>    >>>>>>
>    >>>>>> Rui Barradas
>    >>>>>>
>    >>>>>>
>    >>>>>>>
>    >>>>>>> *Stephen Dawson, DSL*
>    >>>>>>> /Executive Strategy Consultant/
>    >>>>>>> Business & Technology
>    >>>>>>> +1 (865) 804-3454
>    >>>>>>> http://www.shdawson.com
<http://www.shdawson.com>
>    >>>>>>>
>    >>>>>>>
>    >>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson,
DSL via R-help wrote:
>    >>>>>>>> Hi,
>    >>>>>>>>
>    >>>>>>>>
>    >>>>>>>> Running a simple syntax set to review
entries in dataframe columns.
>    >>>>>>>> Here is the working code.
>    >>>>>>>>
>    >>>>>>>> Data <-
read.csv("./input/Source.csv", header=T)
>    >>>>>>>> describe(Data)
>    >>>>>>>> summary(Data)
>    >>>>>>>> unique(Data[1])
>    >>>>>>>> unique(Data[2])
>    >>>>>>>> unique(Data[3])
>    >>>>>>>> unique(Data[4])
>    >>>>>>>>
>    >>>>>>>> I would like to add sort the unique
entries. The data in the various
>    >>>>>>>> columns are not defined as numbers, but
also text. I realize 1 and
>    >>>>>>>> 10 will not sort properly, as the
column is not defined as a number,
>    >>>>>>>> but want to see what I have in the
columns viewed as sorted.
>    >>>>>>>>
>    >>>>>>>> QUESTION
>    >>>>>>>> What is the best process to sort unique
output, please?
>    >>>>>>>>
>    >>>>>>>>
>    >>>>>>>> Thanks.
>    >>>>>>>
>    >>>>>>>
______________________________________________
>    >>>>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
>    >>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-help
>    >>>>>>> PLEASE do read the posting guide
>    >>>>>>> http://www.R-project.org/posting-guide.html
>    >>>>>>> and provide commented, minimal,
self-contained, reproducible code.
>    >>>>>>
>    >>>>>
>    >>>>> ______________________________________________
>    >>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
>    >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>    >>>>> PLEASE do read the posting guide
>    >>>>> http://www.R-project.org/posting-guide.html
>    >>>>> and provide commented, minimal, self-contained,
reproducible code.
>    >>>>
>    >>>>
>    >>>
>    >>>
>    >>
>    >
>    >______________________________________________
>    >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>    >https://stat.ethz.ch/mailman/listinfo/r-help
>    >PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>    >and provide commented, minimal, self-contained, reproducible code.
>
>    -- 
>    Sent from my phone. Please excuse my brevity.
>
>    ______________________________________________
>    R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>    https://stat.ethz.ch/mailman/listinfo/r-help
>    PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>    and provide commented, minimal, self-contained, reproducible code.
>
-- 
Sent from my phone. Please excuse my brevity.

R help - Dec 2021 - Adding SORT to UNIQUE

[R] Adding SORT to UNIQUE

[R] Adding SORT to UNIQUE