thr3ads.net - R help - [R] Adding SORT to UNIQUE [Dec 2021]

If this information is useful, please help other people find it:
Share via:

Duncan Murdoch

2021-Dec-21 17:53 UTC

[R] Adding SORT to UNIQUE

On 21/12/2021 12:29 p.m., Jeff Newmiller wrote:> It is a very rational choice, not a design flaw. I don't like every
choice they have made for that class, but this one is very solid, and treating
data frames as lists of columns consistently helps all of us.I think outlawing matrix notation is a really bad idea.  It makes code 
harder to read, and makes it much harder to switch to matrices, which 
sometimes gives a huge speed boost to code.

For example, John Fox posted an example that showed that operations on 
whole columns of dataframes is about twice as fast using list notation 
as using matrix notation.  But for operating on whole rows, matrices are 
about 100 times faster than dataframes.  You shouldn't use notation that 
makes the switch to matrices more difficult.

Duncan Murdoch
> 
> On December 21, 2021 9:02:56 AM PST, Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
>> On 21/12/2021 11:59 a.m., Jeff Newmiller wrote:
>>> Intuitive, perhaps, but noticably slower. And it doesn't work
on tibbles by design. Data frames are lists of columns.
>>
>> That's just one of the design flaws in tibbles, but not the worst
one.
>>
>> Duncan Murdoch
>>
>>>
>>> On December 21, 2021 8:38:35 AM PST, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
>>>> On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>>>>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>>>>>> Thanks for the reply.
>>>>>>
>>>>>> sort(unique(Data[1]))
>>>>>> Error in `[.data.frame`(x, order(x, na.last = na.last,
decreasing >>>>>> decreasing)) :
>>>>>>      ? undefined columns selected
>>>>>
>>>>> That's the wrong syntax:  Data[1] is not "column
one of Data".  Use
>>>>> Data[[1]] for that, so
>>>>>
>>>>>       sort(unique(Data[[1]]))
>>>>
>>>> Actually, I'd probably recommend
>>>>
>>>>     sort(unique(Data[, 1]))
>>>>
>>>> instead.  This treats Data as a matrix rather than as a list.
>>>> Dataframes are lists that look like matrices, but to me the
matrix
>>>> aspect is usually more intuitive.
>>>>
>>>> Duncan Murdoch
>>>>
>>>>>
>>>>> I think Rui already pointed out the typo in the quoted text
below...
>>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>>>
>>>>>> The recommended syntax did not work, as listed above.
>>>>>>
>>>>>> What I want is the sort of distinct column output.
Again, the column may
>>>>>> be text or numbers. This is a huge analysis effort with
data coming at
>>>>>> me from many different sources.
>>>>>>
>>>>>>
>>>>>> *Stephen Dawson, DSL*
>>>>>> /Executive Strategy Consultant/
>>>>>> Business & Technology
>>>>>> +1 (865) 804-3454
>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>
>>>>>>
>>>>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
>>>>>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL
via R-help wrote:
>>>>>>>> Thanks everyone for the replies.
>>>>>>>>
>>>>>>>> It is clear one either needs to write a
function or put the unique
>>>>>>>> entries into another dataframe.
>>>>>>>>
>>>>>>>> It seems odd R cannot sort a list of unique
column entries with ease.
>>>>>>>> Python and SQL can do it with ease.
>>>>>>>
>>>>>>> I've seen several responses that looked pretty
simple.? It's hard to
>>>>>>> beat sort(unique(x)), though there's a fair bit
of confusion about
>>>>>>> what you actually want.? Maybe you should post an
example of the code
>>>>>>> you'd use in Python?
>>>>>>>
>>>>>>> Duncan Murdoch
>>>>>>>
>>>>>>>>
>>>>>>>> QUESTION
>>>>>>>> Is there a simpler means than other than the
unique function to capture
>>>>>>>> distinct column entries, then sort that list?
>>>>>>>>
>>>>>>>>
>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>> /Executive Strategy Consultant/
>>>>>>>> Business & Technology
>>>>>>>> +1 (865) 804-3454
>>>>>>>> http://www.shdawson.com
<http://www.shdawson.com>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Inline.
>>>>>>>>>
>>>>>>>>> ?s 21:18 de 20/12/21, Stephen H. Dawson,
DSL via R-help escreveu:
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> sort(unique(Data[[1]]))
>>>>>>>>>>
>>>>>>>>>> This syntax provides row numbers, not
column values.
>>>>>>>>>
>>>>>>>>> This is not right.
>>>>>>>>> The syntax Data[1] extracts a
sub-data.frame, the syntax Data[[1]]
>>>>>>>>> extracts the column vector.
>>>>>>>>>
>>>>>>>>> As for my previous answer, it was not
addressing the question, I
>>>>>>>>> misinterpreted it as being a question on
how to sort by numeric order
>>>>>>>>> when the data is not numeric. Here is a,
hopefully, complete answer.
>>>>>>>>> Still with package stringr.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> cols_to_sort <- 1:4
>>>>>>>>>
>>>>>>>>> Data2 <- lapply(Data[cols_to_sort],
\(x){
>>>>>>>>>     ?? stringr::str_sort(unique(x), numeric
= TRUE)
>>>>>>>>> })
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Or using Avi's suggestion of writing a
function to do all the work and
>>>>>>>>> simplify the lapply loop later,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> unisort2 <- function(vec, ...)
stringr::str_sort(unique(vec), ...)
>>>>>>>>> Data2 <- lapply(Data[cols_to_sort],
unisort, numeric = TRUE)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hope this helps,
>>>>>>>>>
>>>>>>>>> Rui Barradas
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>>> /Executive Strategy Consultant/
>>>>>>>>>> Business & Technology
>>>>>>>>>> +1 (865) 804-3454
>>>>>>>>>> http://www.shdawson.com
<http://www.shdawson.com>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 12/20/21 11:58 AM, Stephen H.
Dawson, DSL via R-help wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Running a simple syntax set to
review entries in dataframe columns.
>>>>>>>>>>> Here is the working code.
>>>>>>>>>>>
>>>>>>>>>>> Data <-
read.csv("./input/Source.csv", header=T)
>>>>>>>>>>> describe(Data)
>>>>>>>>>>> summary(Data)
>>>>>>>>>>> unique(Data[1])
>>>>>>>>>>> unique(Data[2])
>>>>>>>>>>> unique(Data[3])
>>>>>>>>>>> unique(Data[4])
>>>>>>>>>>>
>>>>>>>>>>> I would like to add sort the unique
entries. The data in the various
>>>>>>>>>>> columns are not defined as numbers,
but also text. I realize 1 and
>>>>>>>>>>> 10 will not sort properly, as the
column is not defined as a number,
>>>>>>>>>>> but want to see what I have in the
columns viewed as sorted.
>>>>>>>>>>>
>>>>>>>>>>> QUESTION
>>>>>>>>>>> What is the best process to sort
unique output, please?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>>
______________________________________________
>>>>>>>>>> R-help at r-project.org mailing list --
To UNSUBSCRIBE and more, see
>>>>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>>
http://www.R-project.org/posting-guide.html
>>>>>>>>>> and provide commented, minimal,
self-contained, reproducible code.
>>>>>>>>>
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide
>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>
>>
>

Duncan Murdoch

2021-Dec-21 18:09 UTC

head link

[R] Adding SORT to UNIQUE

On 21/12/2021 12:53 p.m., Duncan Murdoch wrote:> On 21/12/2021 12:29 p.m., Jeff Newmiller wrote:
>> It is a very rational choice, not a design flaw. I don't like every
choice they have made for that class, but this one is very solid, and treating
data frames as lists of columns consistently helps all of us.
> I think outlawing matrix notation is a really bad idea.  It makes code
> harder to read, and makes it much harder to switch to matrices, which
> sometimes gives a huge speed boost to code.
> 
> For example, John Fox posted an example that showed that operations on
> whole columns of dataframes is about twice as fast using list notation
> as using matrix notation.  But for operating on whole rows, 
... or on individual elements ...

 > matrices are> about 100 times faster than dataframes.  You shouldn't use notation
that
> makes the switch to matrices more difficult.
> 
> Duncan Murdoch
> 
>>
>> On December 21, 2021 9:02:56 AM PST, Duncan Murdoch <murdoch.duncan
at gmail.com> wrote:
>>> On 21/12/2021 11:59 a.m., Jeff Newmiller wrote:
>>>> Intuitive, perhaps, but noticably slower. And it doesn't
work on tibbles by design. Data frames are lists of columns.
>>>
>>> That's just one of the design flaws in tibbles, but not the
worst one.
>>>
>>> Duncan Murdoch
>>>
>>>>
>>>> On December 21, 2021 8:38:35 AM PST, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
>>>>> On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>>>>>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>>>>>>> Thanks for the reply.
>>>>>>>
>>>>>>> sort(unique(Data[1]))
>>>>>>> Error in `[.data.frame`(x, order(x, na.last =
na.last, decreasing >>>>>>> decreasing)) :
>>>>>>>       ? undefined columns selected
>>>>>>
>>>>>> That's the wrong syntax:  Data[1] is not
"column one of Data".  Use
>>>>>> Data[[1]] for that, so
>>>>>>
>>>>>>        sort(unique(Data[[1]]))
>>>>>
>>>>> Actually, I'd probably recommend
>>>>>
>>>>>      sort(unique(Data[, 1]))
>>>>>
>>>>> instead.  This treats Data as a matrix rather than as a
list.
>>>>> Dataframes are lists that look like matrices, but to me the
matrix
>>>>> aspect is usually more intuitive.
>>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>>>
>>>>>> I think Rui already pointed out the typo in the quoted
text below...
>>>>>>
>>>>>> Duncan Murdoch
>>>>>>
>>>>>>>
>>>>>>> The recommended syntax did not work, as listed
above.
>>>>>>>
>>>>>>> What I want is the sort of distinct column output.
Again, the column may
>>>>>>> be text or numbers. This is a huge analysis effort
with data coming at
>>>>>>> me from many different sources.
>>>>>>>
>>>>>>>
>>>>>>> *Stephen Dawson, DSL*
>>>>>>> /Executive Strategy Consultant/
>>>>>>> Business & Technology
>>>>>>> +1 (865) 804-3454
>>>>>>> http://www.shdawson.com
<http://www.shdawson.com>
>>>>>>>
>>>>>>>
>>>>>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
>>>>>>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson,
DSL via R-help wrote:
>>>>>>>>> Thanks everyone for the replies.
>>>>>>>>>
>>>>>>>>> It is clear one either needs to write a
function or put the unique
>>>>>>>>> entries into another dataframe.
>>>>>>>>>
>>>>>>>>> It seems odd R cannot sort a list of unique
column entries with ease.
>>>>>>>>> Python and SQL can do it with ease.
>>>>>>>>
>>>>>>>> I've seen several responses that looked
pretty simple.? It's hard to
>>>>>>>> beat sort(unique(x)), though there's a fair
bit of confusion about
>>>>>>>> what you actually want.? Maybe you should post
an example of the code
>>>>>>>> you'd use in Python?
>>>>>>>>
>>>>>>>> Duncan Murdoch
>>>>>>>>
>>>>>>>>>
>>>>>>>>> QUESTION
>>>>>>>>> Is there a simpler means than other than
the unique function to capture
>>>>>>>>> distinct column entries, then sort that
list?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>> /Executive Strategy Consultant/
>>>>>>>>> Business & Technology
>>>>>>>>> +1 (865) 804-3454
>>>>>>>>> http://www.shdawson.com
<http://www.shdawson.com>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> Inline.
>>>>>>>>>>
>>>>>>>>>> ?s 21:18 de 20/12/21, Stephen H.
Dawson, DSL via R-help escreveu:
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> sort(unique(Data[[1]]))
>>>>>>>>>>>
>>>>>>>>>>> This syntax provides row numbers,
not column values.
>>>>>>>>>>
>>>>>>>>>> This is not right.
>>>>>>>>>> The syntax Data[1] extracts a
sub-data.frame, the syntax Data[[1]]
>>>>>>>>>> extracts the column vector.
>>>>>>>>>>
>>>>>>>>>> As for my previous answer, it was not
addressing the question, I
>>>>>>>>>> misinterpreted it as being a question
on how to sort by numeric order
>>>>>>>>>> when the data is not numeric. Here is
a, hopefully, complete answer.
>>>>>>>>>> Still with package stringr.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> cols_to_sort <- 1:4
>>>>>>>>>>
>>>>>>>>>> Data2 <- lapply(Data[cols_to_sort],
\(x){
>>>>>>>>>>      ?? stringr::str_sort(unique(x),
numeric = TRUE)
>>>>>>>>>> })
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Or using Avi's suggestion of
writing a function to do all the work and
>>>>>>>>>> simplify the lapply loop later,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> unisort2 <- function(vec, ...)
stringr::str_sort(unique(vec), ...)
>>>>>>>>>> Data2 <- lapply(Data[cols_to_sort],
unisort, numeric = TRUE)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hope this helps,
>>>>>>>>>>
>>>>>>>>>> Rui Barradas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>>>>> /Executive Strategy Consultant/
>>>>>>>>>>> Business & Technology
>>>>>>>>>>> +1 (865) 804-3454
>>>>>>>>>>> http://www.shdawson.com
<http://www.shdawson.com>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 12/20/21 11:58 AM, Stephen H.
Dawson, DSL via R-help wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Running a simple syntax set to
review entries in dataframe columns.
>>>>>>>>>>>> Here is the working code.
>>>>>>>>>>>>
>>>>>>>>>>>> Data <-
read.csv("./input/Source.csv", header=T)
>>>>>>>>>>>> describe(Data)
>>>>>>>>>>>> summary(Data)
>>>>>>>>>>>> unique(Data[1])
>>>>>>>>>>>> unique(Data[2])
>>>>>>>>>>>> unique(Data[3])
>>>>>>>>>>>> unique(Data[4])
>>>>>>>>>>>>
>>>>>>>>>>>> I would like to add sort the
unique entries. The data in the various
>>>>>>>>>>>> columns are not defined as
numbers, but also text. I realize 1 and
>>>>>>>>>>>> 10 will not sort properly, as
the column is not defined as a number,
>>>>>>>>>>>> but want to see what I have in
the columns viewed as sorted.
>>>>>>>>>>>>
>>>>>>>>>>>> QUESTION
>>>>>>>>>>>> What is the best process to
sort unique output, please?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>>
______________________________________________
>>>>>>>>>>> R-help at r-project.org mailing
list -- To UNSUBSCRIBE and more, see
>>>>>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>>>
http://www.R-project.org/posting-guide.html
>>>>>>>>>>> and provide commented, minimal,
self-contained, reproducible code.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
______________________________________________
>>>>>>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
>>>>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>> and provide commented, minimal,
self-contained, reproducible code.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>
>>>
>>
>

R help - Dec 2021 - Adding SORT to UNIQUE

[R] Adding SORT to UNIQUE

[R] Adding SORT to UNIQUE