thr3ads.net - R help - [R] dplyr's arrange function [Jun 2016]

If this information is useful, please help other people find it:
Share via:

Muhuri, Pradip (AHRQ/CFACT)

2016-Jun-15 21:08 UTC

[R] dplyr's arrange function

Hello,

I am using the dplyr's arrange() function to sort  one of the  many data
frames  on a character variable (named "prevalence").

Issue: I am not getting the desired output  (line 7 is the problem, which should
be the very last line in the sorted data frame) because the sorted field is
character, not numeric.

The reproducible example and the output are appended below. 

Is there any work-around  to convert/treat  this character variable (named
"prevalence" in the data frame below)  as numeric before using the
arrange() function within the dplyr package?

Any hints will be appreciated.

Thanks,

Pradip Muhuri

# Reproducible Example 

library("readr")
testdata <- read_csv(
"indicator,  prevalence
1. Health check-up, 77.2 (1.19)
2. Blood cholesterol checked,  84.5 (1.14)
3. Recieved flu vaccine, 50.0 (1.33)
4. Blood pressure checked, 88.7 (0.88)
5. Aspirin use-problems, 11.7 (1.02)
6.Colonoscopy, 60.2 (1.41)
7. Sigmoidoscopy,  6.1 (0.61)
8. Blood stool test, 14.6 (1.00)
9.Mammogram,  72.6 (1.82)
10. Pap Smear test, 73.3 (2.37)")

# Sort on the character variable in descending order
arrange(testdata, desc(prevalence))

# Results from Console

                      indicator  prevalence
                          (chr)       (chr)
1     4. Blood pressure checked 88.7 (0.88)
2  2. Blood cholesterol checked 84.5 (1.14)
3            1. Health check-up 77.2 (1.19)
4            10. Pap Smear test 73.3 (2.37)
5                   9.Mammogram 72.6 (1.82)
6                 6.Colonoscopy 60.2 (1.41)
7              7. Sigmoidoscopy  6.1 (0.61)
8       3. Recieved flu vaccine 50.0 (1.33)
9           8. Blood stool test 14.6 (1.00)
10      5. Aspirin use-problems 11.7 (1.02)


Pradip K. Muhuri,  AHRQ/CFACT
 5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564

Daniel Nordlund

2016-Jun-15 22:37 UTC

head link

[R] dplyr's arrange function

On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:> Hello,
>
> I am using the dplyr's arrange() function to sort  one of the  many
data frames  on a character variable (named "prevalence").
>
> Issue: I am not getting the desired output  (line 7 is the problem, which
should be the very last line in the sorted data frame) because the sorted field
is character, not numeric.
>
> The reproducible example and the output are appended below.
>
> Is there any work-around  to convert/treat  this character variable (named
"prevalence" in the data frame below)  as numeric before using the
arrange() function within the dplyr package?
>
> Any hints will be appreciated.
>
> Thanks,
>
> Pradip Muhuri
>
> # Reproducible Example
>
> library("readr")
> testdata <- read_csv(
> "indicator,  prevalence
> 1. Health check-up, 77.2 (1.19)
> 2. Blood cholesterol checked,  84.5 (1.14)
> 3. Recieved flu vaccine, 50.0 (1.33)
> 4. Blood pressure checked, 88.7 (0.88)
> 5. Aspirin use-problems, 11.7 (1.02)
> 6.Colonoscopy, 60.2 (1.41)
> 7. Sigmoidoscopy,  6.1 (0.61)
> 8. Blood stool test, 14.6 (1.00)
> 9.Mammogram,  72.6 (1.82)
> 10. Pap Smear test, 73.3 (2.37)")
>
> # Sort on the character variable in descending order
> arrange(testdata, desc(prevalence))
>
> # Results from Console
>
>                       indicator  prevalence
>                           (chr)       (chr)
> 1     4. Blood pressure checked 88.7 (0.88)
> 2  2. Blood cholesterol checked 84.5 (1.14)
> 3            1. Health check-up 77.2 (1.19)
> 4            10. Pap Smear test 73.3 (2.37)
> 5                   9.Mammogram 72.6 (1.82)
> 6                 6.Colonoscopy 60.2 (1.41)
> 7              7. Sigmoidoscopy  6.1 (0.61)
> 8       3. Recieved flu vaccine 50.0 (1.33)
> 9           8. Blood stool test 14.6 (1.00)
> 10      5. Aspirin use-problems 11.7 (1.02)
>
>
> Pradip K. Muhuri,  AHRQ/CFACT
>  5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
>
>
>
The problem is that you are sorting a character variable.
> testdata$prevalence  [1] "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)"
"88.7 (0.88)" "11.7 (1.02)"
  [6] "60.2 (1.41)" "6.1 (0.61)"  "14.6 (1.00)"
"72.6 (1.82)" "73.3 (2.37)">
Notice that the 7th element is "6.1 (0.61)".  The first CHARACTER is a
"6", so it is going to sort BEFORE the "50.0 (1.33)" (in
descending
order).  If you want the character value of line 7 to sort last, it 
would need to be "06.1 (0.61)" or " 6.1 (0.61)" (notice the
leading space).

Hope this is helpful,

Dan

Daniel Nordlund
Port Townsend, WA USA

Jim Lemon

2016-Jun-15 23:14 UTC

head link

[R] dplyr's arrange function

Hi Pradip,
I'll assume that you are reading the data from a file:

pm.df<-read.csv("pmdat.txt",stringsAsFactors=FALSE)
# create a vector of numeric values of prevalence
numprev<-as.numeric(sapply(strsplit(trimws(pm.df$prevalence),"
"),"[",1))
# order the data frame by that vector
pm.df[order(numprev),]

Jim


On Thu, Jun 16, 2016 at 7:08 AM, Muhuri, Pradip (AHRQ/CFACT)
<Pradip.Muhuri at ahrq.hhs.gov> wrote:> Hello,
>
> I am using the dplyr's arrange() function to sort  one of the  many
data frames  on a character variable (named "prevalence").
>
> Issue: I am not getting the desired output  (line 7 is the problem, which
should be the very last line in the sorted data frame) because the sorted field
is character, not numeric.
>
> The reproducible example and the output are appended below.
>
> Is there any work-around  to convert/treat  this character variable (named
"prevalence" in the data frame below)  as numeric before using the
arrange() function within the dplyr package?
>
> Any hints will be appreciated.
>
> Thanks,
>
> Pradip Muhuri
>
> # Reproducible Example
>
> library("readr")
> testdata <- read_csv(
> "indicator,  prevalence
> 1. Health check-up, 77.2 (1.19)
> 2. Blood cholesterol checked,  84.5 (1.14)
> 3. Recieved flu vaccine, 50.0 (1.33)
> 4. Blood pressure checked, 88.7 (0.88)
> 5. Aspirin use-problems, 11.7 (1.02)
> 6.Colonoscopy, 60.2 (1.41)
> 7. Sigmoidoscopy,  6.1 (0.61)
> 8. Blood stool test, 14.6 (1.00)
> 9.Mammogram,  72.6 (1.82)
> 10. Pap Smear test, 73.3 (2.37)")
>
> # Sort on the character variable in descending order
> arrange(testdata, desc(prevalence))
>
> # Results from Console
>
>                       indicator  prevalence
>                           (chr)       (chr)
> 1     4. Blood pressure checked 88.7 (0.88)
> 2  2. Blood cholesterol checked 84.5 (1.14)
> 3            1. Health check-up 77.2 (1.19)
> 4            10. Pap Smear test 73.3 (2.37)
> 5                   9.Mammogram 72.6 (1.82)
> 6                 6.Colonoscopy 60.2 (1.41)
> 7              7. Sigmoidoscopy  6.1 (0.61)
> 8       3. Recieved flu vaccine 50.0 (1.33)
> 9           8. Blood stool test 14.6 (1.00)
> 10      5. Aspirin use-problems 11.7 (1.02)
>
>
> Pradip K. Muhuri,  AHRQ/CFACT
>  5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius

2016-Jun-15 23:16 UTC

head link

[R] dplyr's arrange function

> On Jun 15, 2016, at 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) <Pradip.Muhuri
at ahrq.hhs.gov> wrote:
> 
> Hello,
> 
> I am using the dplyr's arrange() function to sort  one of the  many
data frames  on a character variable (named "prevalence").
> 
> Issue: I am not getting the desired output  (line 7 is the problem, which
should be the very last line in the sorted data frame) because the sorted field
is character, not numeric.
> 
> The reproducible example and the output are appended below. 
> 
> Is there any work-around  to convert/treat  this character variable (named
"prevalence" in the data frame below)  as numeric before using the
arrange() function within the dplyr package?
> 
> Any hints will be appreciated.
> 
> Thanks,
> 
> Pradip Muhuri
> 
> # Reproducible Example 
> 
> library("readr")
> testdata <- read_csv(
> "indicator,  prevalence
> 1. Health check-up, 77.2 (1.19)
> 2. Blood cholesterol checked,  84.5 (1.14)
> 3. Recieved flu vaccine, 50.0 (1.33)
> 4. Blood pressure checked, 88.7 (0.88)
> 5. Aspirin use-problems, 11.7 (1.02)
> 6.Colonoscopy, 60.2 (1.41)
> 7. Sigmoidoscopy,  6.1 (0.61)
> 8. Blood stool test, 14.6 (1.00)
> 9.Mammogram,  72.6 (1.82)
> 10. Pap Smear test, 73.3 (2.37)")
> 
> # Sort on the character variable in descending order
> arrange(testdata, desc(prevalence))
> 
> # Results from Console
> 
>                      indicator  prevalence
>                          (chr)       (chr)
> 1     4. Blood pressure checked 88.7 (0.88)
> 2  2. Blood cholesterol checked 84.5 (1.14)
> 3            1. Health check-up 77.2 (1.19)
> 4            10. Pap Smear test 73.3 (2.37)
> 5                   9.Mammogram 72.6 (1.82)
> 6                 6.Colonoscopy 60.2 (1.41)
> 7              7. Sigmoidoscopy  6.1 (0.61)
> 8       3. Recieved flu vaccine 50.0 (1.33)
> 9           8. Blood stool test 14.6 (1.00)
> 10      5. Aspirin use-problems 11.7 (1.02)
Despite the fact that the prevalence columns is not really the  mixed
numeric/alpha , it still can be sorted quite easily with the very handy
gtools::mixedorder function:
> > require(gtools)
> Loading required package: gtools
> > testdata[ mixedorder(testdata$prevalence), ]
>                       indicator  prevalence
> 7              7. Sigmoidoscopy  6.1 (0.61)
> 5       5. Aspirin use-problems 11.7 (1.02)
> 8           8. Blood stool test 14.6 (1.00)
> 3       3. Recieved flu vaccine 50.0 (1.33)
> 6                 6.Colonoscopy 60.2 (1.41)
> 9                   9.Mammogram 72.6 (1.82)
> 10           10. Pap Smear test 73.3 (2.37)
> 1            1. Health check-up 77.2 (1.19)
> 2  2. Blood cholesterol checked 84.5 (1.14)
> 4     4. Blood pressure checked 88.7 (0.88)
The mixedorder function splits the strings at the space boundaries and tests for
numeric or alpha.
> 
> 
> Pradip K. Muhuri,  AHRQ/CFACT
> 5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
> 
-- 

David Winsemius
Alameda, CA, USA

R help - Jun 2016 - dplyr's arrange function

[R] dplyr's arrange function

[R] dplyr's arrange function

[R] dplyr's arrange function

[R] dplyr's arrange function