Dear Jeff,
?On 2021-12-21, 11:59 AM, "R-help on behalf of Jeff Newmiller"
<r-help-bounces at r-project.org on behalf of jdnewmil at dcn.davis.ca.us>
wrote:
Intuitive, perhaps, but noticably slower.
I think that in most applications, one wouldn't notice the difference; for
example:
> D <- data.frame(matrix(rnorm(1000*1e6), 1e6, 1000))
> microbenchmark(D[, 1])
Unit: microseconds
expr min lq mean median uq max neval
D[, 1] 3.321 3.362 3.98561 3.444 3.5875 51.291 100
> microbenchmark(D[[1]])
Unit: microseconds
expr min lq mean median uq max neval
D[[1]] 1.722 1.763 1.99137 1.804 1.8655 17.876 100
Best,
John
And it doesn't work on tibbles by design. Data frames are lists of
columns.
On December 21, 2021 8:38:35 AM PST, Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
>On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
>> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
>>> Thanks for the reply.
>>>
>>> sort(unique(Data[1]))
>>> Error in `[.data.frame`(x, order(x, na.last = na.last,
decreasing >>> decreasing)) :
>>> undefined columns selected
>>
>> That's the wrong syntax: Data[1] is not "column one of
Data". Use
>> Data[[1]] for that, so
>>
>> sort(unique(Data[[1]]))
>
>Actually, I'd probably recommend
>
> sort(unique(Data[, 1]))
>
>instead. This treats Data as a matrix rather than as a list.
>Dataframes are lists that look like matrices, but to me the matrix
>aspect is usually more intuitive.
>
>Duncan Murdoch
>
>>
>> I think Rui already pointed out the typo in the quoted text
below...
>>
>> Duncan Murdoch
>>
>>>
>>> The recommended syntax did not work, as listed above.
>>>
>>> What I want is the sort of distinct column output. Again, the
column may
>>> be text or numbers. This is a huge analysis effort with data
coming at
>>> me from many different sources.
>>>
>>>
>>> *Stephen Dawson, DSL*
>>> /Executive Strategy Consultant/
>>> Business & Technology
>>> +1 (865) 804-3454
>>> http://www.shdawson.com <http://www.shdawson.com>
>>>
>>>
>>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
>>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via R-help
wrote:
>>>>> Thanks everyone for the replies.
>>>>>
>>>>> It is clear one either needs to write a function or put
the unique
>>>>> entries into another dataframe.
>>>>>
>>>>> It seems odd R cannot sort a list of unique column
entries with ease.
>>>>> Python and SQL can do it with ease.
>>>>
>>>> I've seen several responses that looked pretty simple.
It's hard to
>>>> beat sort(unique(x)), though there's a fair bit of
confusion about
>>>> what you actually want. Maybe you should post an example
of the code
>>>> you'd use in Python?
>>>>
>>>> Duncan Murdoch
>>>>
>>>>>
>>>>> QUESTION
>>>>> Is there a simpler means than other than the unique
function to capture
>>>>> distinct column entries, then sort that list?
>>>>>
>>>>>
>>>>> *Stephen Dawson, DSL*
>>>>> /Executive Strategy Consultant/
>>>>> Business & Technology
>>>>> +1 (865) 804-3454
>>>>> http://www.shdawson.com <http://www.shdawson.com>
>>>>>
>>>>>
>>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
>>>>>> Hello,
>>>>>>
>>>>>> Inline.
>>>>>>
>>>>>> ?s 21:18 de 20/12/21, Stephen H. Dawson, DSL via
R-help escreveu:
>>>>>>> Thanks.
>>>>>>>
>>>>>>> sort(unique(Data[[1]]))
>>>>>>>
>>>>>>> This syntax provides row numbers, not column
values.
>>>>>>
>>>>>> This is not right.
>>>>>> The syntax Data[1] extracts a sub-data.frame, the
syntax Data[[1]]
>>>>>> extracts the column vector.
>>>>>>
>>>>>> As for my previous answer, it was not addressing
the question, I
>>>>>> misinterpreted it as being a question on how to
sort by numeric order
>>>>>> when the data is not numeric. Here is a, hopefully,
complete answer.
>>>>>> Still with package stringr.
>>>>>>
>>>>>>
>>>>>> cols_to_sort <- 1:4
>>>>>>
>>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){
>>>>>> stringr::str_sort(unique(x), numeric = TRUE)
>>>>>> })
>>>>>>
>>>>>>
>>>>>> Or using Avi's suggestion of writing a function
to do all the work and
>>>>>> simplify the lapply loop later,
>>>>>>
>>>>>>
>>>>>> unisort2 <- function(vec, ...)
stringr::str_sort(unique(vec), ...)
>>>>>> Data2 <- lapply(Data[cols_to_sort], unisort,
numeric = TRUE)
>>>>>>
>>>>>>
>>>>>> Hope this helps,
>>>>>>
>>>>>> Rui Barradas
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> *Stephen Dawson, DSL*
>>>>>>> /Executive Strategy Consultant/
>>>>>>> Business & Technology
>>>>>>> +1 (865) 804-3454
>>>>>>> http://www.shdawson.com
<http://www.shdawson.com>
>>>>>>>
>>>>>>>
>>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson, DSL
via R-help wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>> Running a simple syntax set to review
entries in dataframe columns.
>>>>>>>> Here is the working code.
>>>>>>>>
>>>>>>>> Data <-
read.csv("./input/Source.csv", header=T)
>>>>>>>> describe(Data)
>>>>>>>> summary(Data)
>>>>>>>> unique(Data[1])
>>>>>>>> unique(Data[2])
>>>>>>>> unique(Data[3])
>>>>>>>> unique(Data[4])
>>>>>>>>
>>>>>>>> I would like to add sort the unique
entries. The data in the various
>>>>>>>> columns are not defined as numbers, but
also text. I realize 1 and
>>>>>>>> 10 will not sort properly, as the column is
not defined as a number,
>>>>>>>> but want to see what I have in the columns
viewed as sorted.
>>>>>>>>
>>>>>>>> QUESTION
>>>>>>>> What is the best process to sort unique
output, please?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>
>>>>
>>>
>>>
>>
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
--
Sent from my phone. Please excuse my brevity.
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
When your brain is wired to treat a data frame like a matrix, then you think
things like
for ( col in colnames( col ) ) {
idx <- expr
D[ col, idx ] <- otherexpr
}
are reasonable, when
for ( col in colnames( col ) ) {
idx <- expr
D[[ col ]][ idx ] <- otherexpr
}
does actually run significantly faster.
On December 21, 2021 9:28:52 AM PST, "Fox, John" <jfox at
mcmaster.ca> wrote:>Dear Jeff,
>
>?On 2021-12-21, 11:59 AM, "R-help on behalf of Jeff Newmiller"
<r-help-bounces at r-project.org on behalf of jdnewmil at dcn.davis.ca.us>
wrote:
>
> Intuitive, perhaps, but noticably slower.
>
>I think that in most applications, one wouldn't notice the difference;
for example:
>
>> D <- data.frame(matrix(rnorm(1000*1e6), 1e6, 1000))
>
>> microbenchmark(D[, 1])
>Unit: microseconds
> expr min lq mean median uq max neval
> D[, 1] 3.321 3.362 3.98561 3.444 3.5875 51.291 100
>
>> microbenchmark(D[[1]])
>Unit: microseconds
> expr min lq mean median uq max neval
> D[[1]] 1.722 1.763 1.99137 1.804 1.8655 17.876 100
>
>Best,
> John
>
>
> And it doesn't work on tibbles by design. Data frames are lists of
columns.
>
>
> On December 21, 2021 8:38:35 AM PST, Duncan Murdoch <murdoch.duncan
at gmail.com> wrote:
> >On 21/12/2021 11:31 a.m., Duncan Murdoch wrote:
> >> On 21/12/2021 11:20 a.m., Stephen H. Dawson, DSL wrote:
> >>> Thanks for the reply.
> >>>
> >>> sort(unique(Data[1]))
> >>> Error in `[.data.frame`(x, order(x, na.last = na.last,
decreasing > >>> decreasing)) :
> >>> undefined columns selected
> >>
> >> That's the wrong syntax: Data[1] is not "column one
of Data". Use
> >> Data[[1]] for that, so
> >>
> >> sort(unique(Data[[1]]))
> >
> >Actually, I'd probably recommend
> >
> > sort(unique(Data[, 1]))
> >
> >instead. This treats Data as a matrix rather than as a list.
> >Dataframes are lists that look like matrices, but to me the matrix
> >aspect is usually more intuitive.
> >
> >Duncan Murdoch
> >
> >>
> >> I think Rui already pointed out the typo in the quoted text
below...
> >>
> >> Duncan Murdoch
> >>
> >>>
> >>> The recommended syntax did not work, as listed above.
> >>>
> >>> What I want is the sort of distinct column output. Again,
the column may
> >>> be text or numbers. This is a huge analysis effort with
data coming at
> >>> me from many different sources.
> >>>
> >>>
> >>> *Stephen Dawson, DSL*
> >>> /Executive Strategy Consultant/
> >>> Business & Technology
> >>> +1 (865) 804-3454
> >>> http://www.shdawson.com <http://www.shdawson.com>
> >>>
> >>>
> >>> On 12/21/21 11:07 AM, Duncan Murdoch wrote:
> >>>> On 21/12/2021 10:16 a.m., Stephen H. Dawson, DSL via
R-help wrote:
> >>>>> Thanks everyone for the replies.
> >>>>>
> >>>>> It is clear one either needs to write a function or
put the unique
> >>>>> entries into another dataframe.
> >>>>>
> >>>>> It seems odd R cannot sort a list of unique column
entries with ease.
> >>>>> Python and SQL can do it with ease.
> >>>>
> >>>> I've seen several responses that looked pretty
simple. It's hard to
> >>>> beat sort(unique(x)), though there's a fair bit of
confusion about
> >>>> what you actually want. Maybe you should post an
example of the code
> >>>> you'd use in Python?
> >>>>
> >>>> Duncan Murdoch
> >>>>
> >>>>>
> >>>>> QUESTION
> >>>>> Is there a simpler means than other than the unique
function to capture
> >>>>> distinct column entries, then sort that list?
> >>>>>
> >>>>>
> >>>>> *Stephen Dawson, DSL*
> >>>>> /Executive Strategy Consultant/
> >>>>> Business & Technology
> >>>>> +1 (865) 804-3454
> >>>>> http://www.shdawson.com
<http://www.shdawson.com>
> >>>>>
> >>>>>
> >>>>> On 12/20/21 5:53 PM, Rui Barradas wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> Inline.
> >>>>>>
> >>>>>> ?s 21:18 de 20/12/21, Stephen H. Dawson, DSL
via R-help escreveu:
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>> sort(unique(Data[[1]]))
> >>>>>>>
> >>>>>>> This syntax provides row numbers, not
column values.
> >>>>>>
> >>>>>> This is not right.
> >>>>>> The syntax Data[1] extracts a sub-data.frame,
the syntax Data[[1]]
> >>>>>> extracts the column vector.
> >>>>>>
> >>>>>> As for my previous answer, it was not
addressing the question, I
> >>>>>> misinterpreted it as being a question on how to
sort by numeric order
> >>>>>> when the data is not numeric. Here is a,
hopefully, complete answer.
> >>>>>> Still with package stringr.
> >>>>>>
> >>>>>>
> >>>>>> cols_to_sort <- 1:4
> >>>>>>
> >>>>>> Data2 <- lapply(Data[cols_to_sort], \(x){
> >>>>>> stringr::str_sort(unique(x), numeric =
TRUE)
> >>>>>> })
> >>>>>>
> >>>>>>
> >>>>>> Or using Avi's suggestion of writing a
function to do all the work and
> >>>>>> simplify the lapply loop later,
> >>>>>>
> >>>>>>
> >>>>>> unisort2 <- function(vec, ...)
stringr::str_sort(unique(vec), ...)
> >>>>>> Data2 <- lapply(Data[cols_to_sort], unisort,
numeric = TRUE)
> >>>>>>
> >>>>>>
> >>>>>> Hope this helps,
> >>>>>>
> >>>>>> Rui Barradas
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> *Stephen Dawson, DSL*
> >>>>>>> /Executive Strategy Consultant/
> >>>>>>> Business & Technology
> >>>>>>> +1 (865) 804-3454
> >>>>>>> http://www.shdawson.com
<http://www.shdawson.com>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 12/20/21 11:58 AM, Stephen H. Dawson,
DSL via R-help wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Running a simple syntax set to review
entries in dataframe columns.
> >>>>>>>> Here is the working code.
> >>>>>>>>
> >>>>>>>> Data <-
read.csv("./input/Source.csv", header=T)
> >>>>>>>> describe(Data)
> >>>>>>>> summary(Data)
> >>>>>>>> unique(Data[1])
> >>>>>>>> unique(Data[2])
> >>>>>>>> unique(Data[3])
> >>>>>>>> unique(Data[4])
> >>>>>>>>
> >>>>>>>> I would like to add sort the unique
entries. The data in the various
> >>>>>>>> columns are not defined as numbers, but
also text. I realize 1 and
> >>>>>>>> 10 will not sort properly, as the
column is not defined as a number,
> >>>>>>>> but want to see what I have in the
columns viewed as sorted.
> >>>>>>>>
> >>>>>>>> QUESTION
> >>>>>>>> What is the best process to sort unique
output, please?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks.
> >>>>>>>
> >>>>>>>
______________________________________________
> >>>>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
> >>>>>>>
https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>> PLEASE do read the posting guide
> >>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>> and provide commented, minimal,
self-contained, reproducible code.
> >>>>>>
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide
> >>>>> http://www.R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained,
reproducible code.
> >>>>
> >>>>
> >>>
> >>>
> >>
> >
> >______________________________________________
> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Sent from my phone. Please excuse my brevity.