On 21/12/2021 11:59 a.m., Jeff Newmiller wrote:> Intuitive, perhaps, but noticably slower. And it doesn't work on tibbles by design. Data frames are lists of columns.That's just one of the design flaws in tibbles, but not the worst one. Duncan Murdoch> > On December 21, 2021 8:38:35 AM PST, Duncan Murdoch <murdoch.duncan at gmail.com> wrote: >> On 21/12/2021 11:31 a.m., Duncan Murdoch wrote: >>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote: >>>> Thanks for the reply. >>>> >>>> sort(unique(Data[1])) >>>> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing >>>> decreasing)) : >>>> ? undefined columns selected >>> >>> That's the wrong syntax: Data[1] is not "column one of Data". Use >>> Data[[1]] for that, so >>> >>> sort(unique(Data[[1]])) >> >> Actually, I'd probably recommend >> >> sort(unique(Data[, 1])) >> >> instead. This treats Data as a matrix rather than as a list. >> Dataframes are lists that look like matrices, but to me the matrix >> aspect is usually more intuitive. >> >> Duncan Murdoch >> >>> >>> I think Rui already pointed out the typo in the quoted text below... >>> >>> Duncan Murdoch >>> >>>> >>>> The recommended syntax did not work, as listed above. >>>> >>>> What I want is the sort of distinct column output. Again, the column may >>>> be text or numbers. This is a huge analysis effort with data coming at >>>> me from many different sources. >>>> >>>> >>>> *Stephen Dawson, DSL* >>>> /Executive Strategy Consultant/ >>>> Business & Technology >>>> +1 (865) 804-3454 >>>> http://www.shdawson.com <http://www.shdawson.com> >>>> >>>> >>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote: >>>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote: >>>>>> Thanks everyone for the replies. >>>>>> >>>>>> It is clear one either needs to write a function or put the unique >>>>>> entries into another dataframe. >>>>>> >>>>>> It seems odd R cannot sort a list of unique column entries with ease. >>>>>> Python and SQL can do it with ease. >>>>> >>>>> I've seen several responses that looked pretty simple.? It's hard to >>>>> beat sort(unique(x)), though there's a fair bit of confusion about >>>>> what you actually want.? Maybe you should post an example of the code >>>>> you'd use in Python? >>>>> >>>>> Duncan Murdoch >>>>> >>>>>> >>>>>> QUESTION >>>>>> Is there a simpler means than other than the unique function to capture >>>>>> distinct column entries, then sort that list? >>>>>> >>>>>> >>>>>> *Stephen Dawson, DSL* >>>>>> /Executive Strategy Consultant/ >>>>>> Business & Technology >>>>>> +1 (865) 804-3454 >>>>>> http://www.shdawson.com <http://www.shdawson.com> >>>>>> >>>>>> >>>>>> On 12/20/21 5:53 PM, Rui Barradas wrote: >>>>>>> Hello, >>>>>>> >>>>>>> Inline. >>>>>>> >>>>>>> ?s 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu: >>>>>>>> Thanks. >>>>>>>> >>>>>>>> sort(unique(Data[[1]])) >>>>>>>> >>>>>>>> This syntax provides row numbers, not column values. >>>>>>> >>>>>>> This is not right. >>>>>>> The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]] >>>>>>> extracts the column vector. >>>>>>> >>>>>>> As for my previous answer, it was not addressing the question, I >>>>>>> misinterpreted it as being a question on how to sort by numeric order >>>>>>> when the data is not numeric. Here is a, hopefully, complete answer. >>>>>>> Still with package stringr. >>>>>>> >>>>>>> >>>>>>> cols_to_sort <- 1:4 >>>>>>> >>>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){ >>>>>>> ?? stringr::str_sort(unique(x), numeric = TRUE) >>>>>>> }) >>>>>>> >>>>>>> >>>>>>> Or using Avi's suggestion of writing a function to do all the work and >>>>>>> simplify the lapply loop later, >>>>>>> >>>>>>> >>>>>>> unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...) >>>>>>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE) >>>>>>> >>>>>>> >>>>>>> Hope this helps, >>>>>>> >>>>>>> Rui Barradas >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> *Stephen Dawson, DSL* >>>>>>>> /Executive Strategy Consultant/ >>>>>>>> Business & Technology >>>>>>>> +1 (865) 804-3454 >>>>>>>> http://www.shdawson.com <http://www.shdawson.com> >>>>>>>> >>>>>>>> >>>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> >>>>>>>>> Running a simple syntax set to review entries in dataframe columns. >>>>>>>>> Here is the working code. >>>>>>>>> >>>>>>>>> Data <- read.csv("./input/Source.csv", header=T) >>>>>>>>> describe(Data) >>>>>>>>> summary(Data) >>>>>>>>> unique(Data[1]) >>>>>>>>> unique(Data[2]) >>>>>>>>> unique(Data[3]) >>>>>>>>> unique(Data[4]) >>>>>>>>> >>>>>>>>> I would like to add sort the unique entries. The data in the various >>>>>>>>> columns are not defined as numbers, but also text. I realize 1 and >>>>>>>>> 10 will not sort properly, as the column is not defined as a number, >>>>>>>>> but want to see what I have in the columns viewed as sorted. >>>>>>>>> >>>>>>>>> QUESTION >>>>>>>>> What is the best process to sort unique output, please? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>> >>>>>>>> ______________________________________________ >>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>>> PLEASE do read the posting guide >>>>>>>> http://www.R-project.org/posting-guide.html >>>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>>> >>>>>> >>>>>> ______________________________________________ >>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> >>>> >>>> >>> >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >
It is a very rational choice, not a design flaw. I don't like every choice they have made for that class, but this one is very solid, and treating data frames as lists of columns consistently helps all of us. On December 21, 2021 9:02:56 AM PST, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:>On 21/12/2021 11:59 a.m., Jeff Newmiller wrote: >> Intuitive, perhaps, but noticably slower. And it doesn't work on tibbles by design. Data frames are lists of columns. > >That's just one of the design flaws in tibbles, but not the worst one. > >Duncan Murdoch > >> >> On December 21, 2021 8:38:35 AM PST, Duncan Murdoch <murdoch.duncan at gmail.com> wrote: >>> On 21/12/2021 11:31 a.m., Duncan Murdoch wrote: >>>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote: >>>>> Thanks for the reply. >>>>> >>>>> sort(unique(Data[1])) >>>>> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing >>>>> decreasing)) : >>>>> ? undefined columns selected >>>> >>>> That's the wrong syntax: Data[1] is not "column one of Data". Use >>>> Data[[1]] for that, so >>>> >>>> sort(unique(Data[[1]])) >>> >>> Actually, I'd probably recommend >>> >>> sort(unique(Data[, 1])) >>> >>> instead. This treats Data as a matrix rather than as a list. >>> Dataframes are lists that look like matrices, but to me the matrix >>> aspect is usually more intuitive. >>> >>> Duncan Murdoch >>> >>>> >>>> I think Rui already pointed out the typo in the quoted text below... >>>> >>>> Duncan Murdoch >>>> >>>>> >>>>> The recommended syntax did not work, as listed above. >>>>> >>>>> What I want is the sort of distinct column output. Again, the column may >>>>> be text or numbers. This is a huge analysis effort with data coming at >>>>> me from many different sources. >>>>> >>>>> >>>>> *Stephen Dawson, DSL* >>>>> /Executive Strategy Consultant/ >>>>> Business & Technology >>>>> +1 (865) 804-3454 >>>>> http://www.shdawson.com <http://www.shdawson.com> >>>>> >>>>> >>>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote: >>>>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help wrote: >>>>>>> Thanks everyone for the replies. >>>>>>> >>>>>>> It is clear one either needs to write a function or put the unique >>>>>>> entries into another dataframe. >>>>>>> >>>>>>> It seems odd R cannot sort a list of unique column entries with ease. >>>>>>> Python and SQL can do it with ease. >>>>>> >>>>>> I've seen several responses that looked pretty simple.? It's hard to >>>>>> beat sort(unique(x)), though there's a fair bit of confusion about >>>>>> what you actually want.? Maybe you should post an example of the code >>>>>> you'd use in Python? >>>>>> >>>>>> Duncan Murdoch >>>>>> >>>>>>> >>>>>>> QUESTION >>>>>>> Is there a simpler means than other than the unique function to capture >>>>>>> distinct column entries, then sort that list? >>>>>>> >>>>>>> >>>>>>> *Stephen Dawson, DSL* >>>>>>> /Executive Strategy Consultant/ >>>>>>> Business & Technology >>>>>>> +1 (865) 804-3454 >>>>>>> http://www.shdawson.com <http://www.shdawson.com> >>>>>>> >>>>>>> >>>>>>> On 12/20/21 5:53 PM, Rui Barradas wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> Inline. >>>>>>>> >>>>>>>> ?s 21:18 de 20/12/21, Stephen H. Dawson, DSL via R-help escreveu: >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> sort(unique(Data[[1]])) >>>>>>>>> >>>>>>>>> This syntax provides row numbers, not column values. >>>>>>>> >>>>>>>> This is not right. >>>>>>>> The syntax Data[1] extracts a sub-data.frame, the syntax Data[[1]] >>>>>>>> extracts the column vector. >>>>>>>> >>>>>>>> As for my previous answer, it was not addressing the question, I >>>>>>>> misinterpreted it as being a question on how to sort by numeric order >>>>>>>> when the data is not numeric. Here is a, hopefully, complete answer. >>>>>>>> Still with package stringr. >>>>>>>> >>>>>>>> >>>>>>>> cols_to_sort <- 1:4 >>>>>>>> >>>>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){ >>>>>>>> ?? stringr::str_sort(unique(x), numeric = TRUE) >>>>>>>> }) >>>>>>>> >>>>>>>> >>>>>>>> Or using Avi's suggestion of writing a function to do all the work and >>>>>>>> simplify the lapply loop later, >>>>>>>> >>>>>>>> >>>>>>>> unisort2 <- function(vec, ...) stringr::str_sort(unique(vec), ...) >>>>>>>> Data2 <- lapply(Data[cols_to_sort], unisort, numeric = TRUE) >>>>>>>> >>>>>>>> >>>>>>>> Hope this helps, >>>>>>>> >>>>>>>> Rui Barradas >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> *Stephen Dawson, DSL* >>>>>>>>> /Executive Strategy Consultant/ >>>>>>>>> Business & Technology >>>>>>>>> +1 (865) 804-3454 >>>>>>>>> http://www.shdawson.com <http://www.shdawson.com> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL via R-help wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Running a simple syntax set to review entries in dataframe columns. >>>>>>>>>> Here is the working code. >>>>>>>>>> >>>>>>>>>> Data <- read.csv("./input/Source.csv", header=T) >>>>>>>>>> describe(Data) >>>>>>>>>> summary(Data) >>>>>>>>>> unique(Data[1]) >>>>>>>>>> unique(Data[2]) >>>>>>>>>> unique(Data[3]) >>>>>>>>>> unique(Data[4]) >>>>>>>>>> >>>>>>>>>> I would like to add sort the unique entries. The data in the various >>>>>>>>>> columns are not defined as numbers, but also text. I realize 1 and >>>>>>>>>> 10 will not sort properly, as the column is not defined as a number, >>>>>>>>>> but want to see what I have in the columns viewed as sorted. >>>>>>>>>> >>>>>>>>>> QUESTION >>>>>>>>>> What is the best process to sort unique output, please? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> ______________________________________________ >>>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>>>> PLEASE do read the posting guide >>>>>>>>> http://www.R-project.org/posting-guide.html >>>>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>>>> >>>>>>> >>>>>>> ______________________________________________ >>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>> PLEASE do read the posting guide >>>>>>> http://www.R-project.org/posting-guide.html >>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >-- Sent from my phone. Please excuse my brevity.
Duncan,
Let's not go there discussing the trouble with tibbles when the topic asked
how to do things in more native R.
The reality is that tibbles when used in the tidyverse often use somewhat
different ways to select what columns you want including some very quite
sophisticated ones like:
select(mydf, wed:fri, ends_with(".xyz), everything())
So it is often not really used to select columns by number but you can do that
too. What you re talking about is using [] notation which is often not needed as
you use verbs like filter and select independently.
I find it often way more intuitive to solve things the dplyr way but I agree you
sometimes want to convert tibbles back to data.frames before using base R
techniques on them.
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Duncan Murdoch
Sent: Tuesday, December 21, 2021 12:03 PM
To: Jeff Newmiller <jdnewmil at dcn.davis.ca.us>; r-help at r-project.org;
service at shdawson.com; Rui Barradas <ruipbarradas at sapo.pt>
Subject: Re: [R] Adding SORT to UNIQUE
On 21/12/2021 11:59 a.m., Jeff Newmiller wrote:> Intuitive, perhaps, but noticably slower. And it doesn't work on
tibbles by design. Data frames are lists of columns.
That's just one of the design flaws in tibbles, but not the worst one.
Duncan Murdoch
>
> On December 21, 2021 8:38:35 AM PST, Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
>> On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>>>> Thanks for the reply.
>>>>
>>>> sort(unique(Data[1]))
>>>> Error in `[.data.frame`(x, order(x, na.last = na.last,
decreasing >>>> decreasing)) :
>>>> undefined columns selected
>>>
>>> That's the wrong syntax: Data[1] is not "column one of
Data". Use
>>> Data[[1]] for that, so
>>>
>>> sort(unique(Data[[1]]))
>>
>> Actually, I'd probably recommend
>>
>> sort(unique(Data[, 1]))
>>
>> instead. This treats Data as a matrix rather than as a list.
>> Dataframes are lists that look like matrices, but to me the matrix
>> aspect is usually more intuitive.
>>
>> Duncan Murdoch
>>
>>>
>>> I think Rui already pointed out the typo in the quoted text
below...
>>>
>>> Duncan Murdoch
>>>
>>>>
>>>> The recommended syntax did not work, as listed above.
>>>>
>>>> What I want is the sort of distinct column output. Again, the
>>>> column may be text or numbers. This is a huge analysis effort
with
>>>> data coming at me from many different sources.
>>>>
>>>>
>>>> *Stephen Dawson, DSL*
>>>> /Executive Strategy Consultant/
>>>> Business & Technology
>>>> +1 (865) 804-3454
>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>
>>>>
>>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
>>>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help
wrote:
>>>>>> Thanks everyone for the replies.
>>>>>>
>>>>>> It is clear one either needs to write a function or put
the
>>>>>> unique entries into another dataframe.
>>>>>>
>>>>>> It seems odd R cannot sort a list of unique column
entries with ease.
>>>>>> Python and SQL can do it with ease.
>>>>>
>>>>> I've seen several responses that looked pretty simple.
It's hard
>>>>> to beat sort(unique(x)), though there's a fair bit of
confusion
>>>>> about what you actually want. Maybe you should post an
example of
>>>>> the code you'd use in Python?
>>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>>>
>>>>>> QUESTION
>>>>>> Is there a simpler means than other than the unique
function to
>>>>>> capture distinct column entries, then sort that list?
>>>>>>
>>>>>>
>>>>>> *Stephen Dawson, DSL*
>>>>>> /Executive Strategy Consultant/
>>>>>> Business & Technology
>>>>>> +1 (865) 804-3454
>>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>>
>>>>>>
>>>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> Inline.
>>>>>>>
>>>>>>> ?s 21:18 de 20/12/21, Stephen H. Dawson, DSL via
R-help escreveu:
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> sort(unique(Data[[1]]))
>>>>>>>>
>>>>>>>> This syntax provides row numbers, not column
values.
>>>>>>>
>>>>>>> This is not right.
>>>>>>> The syntax Data[1] extracts a sub-data.frame, the
syntax
>>>>>>> Data[[1]] extracts the column vector.
>>>>>>>
>>>>>>> As for my previous answer, it was not addressing
the question, I
>>>>>>> misinterpreted it as being a question on how to
sort by numeric
>>>>>>> order when the data is not numeric. Here is a,
hopefully, complete answer.
>>>>>>> Still with package stringr.
>>>>>>>
>>>>>>>
>>>>>>> cols_to_sort <- 1:4
>>>>>>>
>>>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){
>>>>>>> stringr::str_sort(unique(x), numeric = TRUE)
>>>>>>> })
>>>>>>>
>>>>>>>
>>>>>>> Or using Avi's suggestion of writing a function
to do all the
>>>>>>> work and simplify the lapply loop later,
>>>>>>>
>>>>>>>
>>>>>>> unisort2 <- function(vec, ...)
stringr::str_sort(unique(vec),
>>>>>>> ...)
>>>>>>> Data2 <- lapply(Data[cols_to_sort], unisort,
numeric = TRUE)
>>>>>>>
>>>>>>>
>>>>>>> Hope this helps,
>>>>>>>
>>>>>>> Rui Barradas
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> *Stephen Dawson, DSL*
>>>>>>>> /Executive Strategy Consultant/ Business &
Technology
>>>>>>>> +1 (865) 804-3454
>>>>>>>> http://www.shdawson.com
<http://www.shdawson.com>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL
via R-help wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Running a simple syntax set to review
entries in dataframe columns.
>>>>>>>>> Here is the working code.
>>>>>>>>>
>>>>>>>>> Data <-
read.csv("./input/Source.csv", header=T)
>>>>>>>>> describe(Data)
>>>>>>>>> summary(Data)
>>>>>>>>> unique(Data[1])
>>>>>>>>> unique(Data[2])
>>>>>>>>> unique(Data[3])
>>>>>>>>> unique(Data[4])
>>>>>>>>>
>>>>>>>>> I would like to add sort the unique
entries. The data in the
>>>>>>>>> various columns are not defined as numbers,
but also text. I
>>>>>>>>> realize 1 and
>>>>>>>>> 10 will not sort properly, as the column is
not defined as a
>>>>>>>>> number, but want to see what I have in the
columns viewed as sorted.
>>>>>>>>>
>>>>>>>>> QUESTION
>>>>>>>>> What is the best process to sort unique
output, please?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more,
>>>>>>>> see
https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide
>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>>>
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.