Muhuri, Pradip (AHRQ/CFACT)
2016-Jun-16 13:12 UTC
[R] dplyr's arrange function - 3 solutions received - 1 New Question
Hello, I got 3 solutions to my earlier code. Thanks to the contributors. May I bring your attention to a new question below (with respect to David's solution)? 1) Thanks to Daniel Nordlund for the tips - replacing leading space with a 0 in the data. 2) Thanks to David Winsemius for his solution with the gtools::mixedorder function. I have added an argument to his. mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), ] 3) Thanks to Jim Lemon's for his solution. I have prepended a minus sign to reverse the order. numprev<-as.numeric(sapply(strsplit(trimws(mydata$prevalence_c)," "),"[",1)) mydata[order(-numprev), ] (New)Question for solution 2: I want to keep only 2 variables (say, indicator and prevalence_c) in the output. Where to insert the additional code? Why does the following code fail?> mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c(mydata$indicator, mydata$prevalence_c) ]Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c, decreasing = TRUE), : undefined columns selected ********************> str(mydata)Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of 10 variables: $ indicator : chr "1. Health check-up" "2. Blood cholesterol checked " "3. Recieved flu vaccine" "4. Blood pressure checked" ... $ subgroup : chr "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ ... $ n : num 2117 2127 2124 2135 1027 ... $ prevalence_c: chr "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2 (0.70)" ... $ prevalence_p: chr "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" ... $ sensitivity : chr "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0 (0.27)" ... $ specificity : chr "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7 (3.90)" ... $ ppv : chr "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3 (0.63)" ... $ npv : chr "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5 (3.27)" ... $ kappa : chr "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)" "0.626 (0.035)" ... Pradip K. Muhuri, AHRQ/CFACT 5600 Fishers Lane # 7N142A, Rockville, MD 20857 Tel: 301-427-1564 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel Nordlund Sent: Wednesday, June 15, 2016 6:37 PM To: r-help at r-project.org Subject: Re: [R] dplyr's arrange function On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:> Hello, > > I am using the dplyr's arrange() function to sort one of the many data frames on a character variable (named "prevalence"). > > Issue: I am not getting the desired output (line 7 is the problem, which should be the very last line in the sorted data frame) because the sorted field is character, not numeric. > > The reproducible example and the output are appended below. > > Is there any work-around to convert/treat this character variable (named "prevalence" in the data frame below) as numeric before using the arrange() function within the dplyr package? > > Any hints will be appreciated. > > Thanks, > > Pradip Muhuri > > # Reproducible Example > > library("readr") > testdata <- read_csv( > "indicator, prevalence > 1. Health check-up, 77.2 (1.19) > 2. Blood cholesterol checked, 84.5 (1.14) 3. Recieved flu vaccine, > 50.0 (1.33) 4. Blood pressure checked, 88.7 (0.88) 5. Aspirin > use-problems, 11.7 (1.02) 6.Colonoscopy, 60.2 (1.41) 7. Sigmoidoscopy, > 6.1 (0.61) 8. Blood stool test, 14.6 (1.00) 9.Mammogram, 72.6 (1.82) > 10. Pap Smear test, 73.3 (2.37)") > > # Sort on the character variable in descending order arrange(testdata, > desc(prevalence)) > > # Results from Console > > indicator prevalence > (chr) (chr) > 1 4. Blood pressure checked 88.7 (0.88) > 2 2. Blood cholesterol checked 84.5 (1.14) > 3 1. Health check-up 77.2 (1.19) > 4 10. Pap Smear test 73.3 (2.37) > 5 9.Mammogram 72.6 (1.82) > 6 6.Colonoscopy 60.2 (1.41) > 7 7. Sigmoidoscopy 6.1 (0.61) > 8 3. Recieved flu vaccine 50.0 (1.33) > 9 8. Blood stool test 14.6 (1.00) > 10 5. Aspirin use-problems 11.7 (1.02) > > > Pradip K. Muhuri, AHRQ/CFACT > 5600 Fishers Lane # 7N142A, Rockville, MD 20857 > Tel: 301-427-1564 > > >The problem is that you are sorting a character variable.> testdata$prevalence[1] "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" "11.7 (1.02)" [6] "60.2 (1.41)" "6.1 (0.61)" "14.6 (1.00)" "72.6 (1.82)" "73.3 (2.37)">Notice that the 7th element is "6.1 (0.61)". The first CHARACTER is a "6", so it is going to sort BEFORE the "50.0 (1.33)" (in descending order). If you want the character value of line 7 to sort last, it would need to be "06.1 (0.61)" or " 6.1 (0.61)" (notice the leading space). Hope this is helpful, Dan Daniel Nordlund Port Townsend, WA USA ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2016-Jun-16 16:54 UTC
[R] dplyr's arrange function - 3 solutions received - 1 New Question
> On Jun 16, 2016, at 6:12 AM, Muhuri, Pradip (AHRQ/CFACT) <Pradip.Muhuri at ahrq.hhs.gov> wrote: > > Hello, > > I got 3 solutions to my earlier code. Thanks to the contributors. May I bring your attention to a new question below (with respect to David's solution)? > > 1) Thanks to Daniel Nordlund for the tips - replacing leading space with a 0 in the data. > > 2) Thanks to David Winsemius for his solution with the gtools::mixedorder function. I have added an argument to his. > > mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), ] > > 3) Thanks to Jim Lemon's for his solution. I have prepended a minus sign to reverse the order. > > numprev<-as.numeric(sapply(strsplit(trimws(mydata$prevalence_c)," "),"[",1)) > mydata[order(-numprev), ] > > > (New)Question for solution 2: > > I want to keep only 2 variables (say, indicator and prevalence_c) in the output. Where to insert the additional code? Why does the following code fail? > >> mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c(mydata$indicator, mydata$prevalence_c) ] >Try instead just a vector of names for the second argument to "[" mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c("indicator", "prevalence_c") ]> Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c, decreasing = TRUE), : > undefined columns selected > > ******************** >> str(mydata) > Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of 10 variables: > $ indicator : chr "1. Health check-up" "2. Blood cholesterol checked " "3. Recieved flu vaccine" "4. Blood pressure checked" ... > $ subgroup : chr "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ ... > $ n : num 2117 2127 2124 2135 1027 ... > $ prevalence_c: chr "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2 (0.70)" ... > $ prevalence_p: chr "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" ... > $ sensitivity : chr "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0 (0.27)" ... > $ specificity : chr "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7 (3.90)" ... > $ ppv : chr "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3 (0.63)" ... > $ npv : chr "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5 (3.27)" ... > $ kappa : chr "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)" "0.626 (0.035)" ... > > Pradip K. Muhuri, AHRQ/CFACT > 5600 Fishers Lane # 7N142A, Rockville, MD 20857 > Tel: 301-427-1564 > > > > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel Nordlund > Sent: Wednesday, June 15, 2016 6:37 PM > To: r-help at r-project.org > Subject: Re: [R] dplyr's arrange function > > On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote: >> Hello, >> >> I am using the dplyr's arrange() function to sort one of the many data frames on a character variable (named "prevalence"). >> >> Issue: I am not getting the desired output (line 7 is the problem, which should be the very last line in the sorted data frame) because the sorted field is character, not numeric. >> >> The reproducible example and the output are appended below. >> >> Is there any work-around to convert/treat this character variable (named "prevalence" in the data frame below) as numeric before using the arrange() function within the dplyr package? >> >> Any hints will be appreciated. >> >> Thanks, >> >> Pradip Muhuri >> >> # Reproducible Example >> >> library("readr") >> testdata <- read_csv( >> "indicator, prevalence >> 1. Health check-up, 77.2 (1.19) >> 2. Blood cholesterol checked, 84.5 (1.14) 3. Recieved flu vaccine, >> 50.0 (1.33) 4. Blood pressure checked, 88.7 (0.88) 5. Aspirin >> use-problems, 11.7 (1.02) 6.Colonoscopy, 60.2 (1.41) 7. Sigmoidoscopy, >> 6.1 (0.61) 8. Blood stool test, 14.6 (1.00) 9.Mammogram, 72.6 (1.82) >> 10. Pap Smear test, 73.3 (2.37)") >> >> # Sort on the character variable in descending order arrange(testdata, >> desc(prevalence)) >> >> # Results from Console >> >> indicator prevalence >> (chr) (chr) >> 1 4. Blood pressure checked 88.7 (0.88) >> 2 2. Blood cholesterol checked 84.5 (1.14) >> 3 1. Health check-up 77.2 (1.19) >> 4 10. Pap Smear test 73.3 (2.37) >> 5 9.Mammogram 72.6 (1.82) >> 6 6.Colonoscopy 60.2 (1.41) >> 7 7. Sigmoidoscopy 6.1 (0.61) >> 8 3. Recieved flu vaccine 50.0 (1.33) >> 9 8. Blood stool test 14.6 (1.00) >> 10 5. Aspirin use-problems 11.7 (1.02) >> >> >> Pradip K. Muhuri, AHRQ/CFACT >> 5600 Fishers Lane # 7N142A, Rockville, MD 20857 >> Tel: 301-427-1564 >> >> >> > > The problem is that you are sorting a character variable. > >> testdata$prevalence > [1] "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" "11.7 (1.02)" > [6] "60.2 (1.41)" "6.1 (0.61)" "14.6 (1.00)" "72.6 (1.82)" "73.3 (2.37)" >> > > Notice that the 7th element is "6.1 (0.61)". The first CHARACTER is a "6", so it is going to sort BEFORE the "50.0 (1.33)" (in descending order). If you want the character value of line 7 to sort last, it would need to be "06.1 (0.61)" or " 6.1 (0.61)" (notice the leading space). > > Hope this is helpful, > > Dan > > Daniel Nordlund > Port Townsend, WA USA > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Muhuri, Pradip (AHRQ/CFACT)
2016-Jun-16 18:06 UTC
[R] dplyr's arrange function - 3 solutions received - 1 New Question
Hello David, Your revisions to the earlier code have given me desired results. library("gtools") mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c("indicator", "prevalence_c") ] Thanks, Pradip Pradip K. Muhuri, AHRQ/CFACT 5600 Fishers Lane # 7N142A, Rockville, MD 20857 Tel: 301-427-1564 -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: Thursday, June 16, 2016 12:54 PM To: Muhuri, Pradip (AHRQ/CFACT) Cc: r-help at r-project.org Subject: Re: [R] dplyr's arrange function - 3 solutions received - 1 New Question> On Jun 16, 2016, at 6:12 AM, Muhuri, Pradip (AHRQ/CFACT) <Pradip.Muhuri at ahrq.hhs.gov> wrote: > > Hello, > > I got 3 solutions to my earlier code. Thanks to the contributors. May I bring your attention to a new question below (with respect to David's solution)? > > 1) Thanks to Daniel Nordlund for the tips - replacing leading space with a 0 in the data. > > 2) Thanks to David Winsemius for his solution with the gtools::mixedorder function. I have added an argument to his. > > mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), ] > > 3) Thanks to Jim Lemon's for his solution. I have prepended a minus sign to reverse the order. > > numprev<-as.numeric(sapply(strsplit(trimws(mydata$prevalence_c)," > "),"[",1)) mydata[order(-numprev), ] > > > (New)Question for solution 2: > > I want to keep only 2 variables (say, indicator and prevalence_c) in the output. Where to insert the additional code? Why does the following code fail? > >> mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), >> c(mydata$indicator, mydata$prevalence_c) ] >Try instead just a vector of names for the second argument to "[" mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c("indicator", "prevalence_c") ]> Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c, decreasing = TRUE), : > undefined columns selected > > ******************** >> str(mydata) > Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of 10 variables: > $ indicator : chr "1. Health check-up" "2. Blood cholesterol checked " "3. Recieved flu vaccine" "4. Blood pressure checked" ... > $ subgroup : chr "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ "Both sexes, ages =35 yrs""| __truncated__ ... > $ n : num 2117 2127 2124 2135 1027 ... > $ prevalence_c: chr "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2 (0.70)" ... > $ prevalence_p: chr "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" ... > $ sensitivity : chr "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0 (0.27)" ... > $ specificity : chr "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7 (3.90)" ... > $ ppv : chr "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3 (0.63)" ... > $ npv : chr "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5 (3.27)" ... > $ kappa : chr "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)" "0.626 (0.035)" ... > > Pradip K. Muhuri, AHRQ/CFACT > 5600 Fishers Lane # 7N142A, Rockville, MD 20857 > Tel: 301-427-1564 > > > > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel > Nordlund > Sent: Wednesday, June 15, 2016 6:37 PM > To: r-help at r-project.org > Subject: Re: [R] dplyr's arrange function > > On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote: >> Hello, >> >> I am using the dplyr's arrange() function to sort one of the many data frames on a character variable (named "prevalence"). >> >> Issue: I am not getting the desired output (line 7 is the problem, which should be the very last line in the sorted data frame) because the sorted field is character, not numeric. >> >> The reproducible example and the output are appended below. >> >> Is there any work-around to convert/treat this character variable (named "prevalence" in the data frame below) as numeric before using the arrange() function within the dplyr package? >> >> Any hints will be appreciated. >> >> Thanks, >> >> Pradip Muhuri >> >> # Reproducible Example >> >> library("readr") >> testdata <- read_csv( >> "indicator, prevalence >> 1. Health check-up, 77.2 (1.19) >> 2. Blood cholesterol checked, 84.5 (1.14) 3. Recieved flu vaccine, >> 50.0 (1.33) 4. Blood pressure checked, 88.7 (0.88) 5. Aspirin >> use-problems, 11.7 (1.02) 6.Colonoscopy, 60.2 (1.41) 7. >> Sigmoidoscopy, >> 6.1 (0.61) 8. Blood stool test, 14.6 (1.00) 9.Mammogram, 72.6 (1.82) >> 10. Pap Smear test, 73.3 (2.37)") >> >> # Sort on the character variable in descending order >> arrange(testdata, >> desc(prevalence)) >> >> # Results from Console >> >> indicator prevalence >> (chr) (chr) >> 1 4. Blood pressure checked 88.7 (0.88) >> 2 2. Blood cholesterol checked 84.5 (1.14) >> 3 1. Health check-up 77.2 (1.19) >> 4 10. Pap Smear test 73.3 (2.37) >> 5 9.Mammogram 72.6 (1.82) >> 6 6.Colonoscopy 60.2 (1.41) >> 7 7. Sigmoidoscopy 6.1 (0.61) >> 8 3. Recieved flu vaccine 50.0 (1.33) >> 9 8. Blood stool test 14.6 (1.00) >> 10 5. Aspirin use-problems 11.7 (1.02) >> >> >> Pradip K. Muhuri, AHRQ/CFACT >> 5600 Fishers Lane # 7N142A, Rockville, MD 20857 >> Tel: 301-427-1564 >> >> >> > > The problem is that you are sorting a character variable. > >> testdata$prevalence > [1] "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" "11.7 (1.02)" > [6] "60.2 (1.41)" "6.1 (0.61)" "14.6 (1.00)" "72.6 (1.82)" "73.3 (2.37)" >> > > Notice that the 7th element is "6.1 (0.61)". The first CHARACTER is a "6", so it is going to sort BEFORE the "50.0 (1.33)" (in descending order). If you want the character value of line 7 to sort last, it would need to be "06.1 (0.61)" or " 6.1 (0.61)" (notice the leading space). > > Hope this is helpful, > > Dan > > Daniel Nordlund > Port Townsend, WA USA > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA