thr3ads.net - R help - [R] Help with Kmeans output and using broom to tidy etc.. [May 2020]

If this information is useful, please help other people find it:
Share via:

Poling, William

2020-May-11 15:58 UTC

[R] Help with Kmeans output and using broom to tidy etc..

#RStudio Version Version 1.2.1335 need this one--> 1.2.5019
sessionInfo() 
# R version 4.0.0 Patched (2020-05-03 r78349)
#Platform: x86_64-w64-mingw32/x64 (64-bit)
#Running under: Windows 10 x64 (build 17763)

Hello:

I have data that I am trying to manipulate for Kmeans clustering.

Original data looks like this

str(geo1) 
# 'data.frame':	2352 obs. of  5 variables:
# $ ID: Factor w/ 2352 levels "101040199600",..: 590 908 976 509 1674
690 1336 86 726 1702 ...
# $ state           : Factor w/ 41 levels
"AL","AR","AZ",..: 32 10 25 11 9 32 13 31 12 12
...
# $ city            : Factor w/ 1337 levels "ABBOTTSTOWN",..: 932 156
230 698 965 1330 515 727 1127 1304 ...
# $ latitude        : num  40.4 31.2 40.8 42.1 26.8 ...
# $ longitude       : num  -79.9 -81.5 -74 -91.6 -82.1 ...

I created a subset adding column prop_of_total 
str(trnd1_tbl)
tibble [1,457 x 5] (S3: tbl_df/tbl/data.frame)
 $ city         : Factor w/ 1337 levels "ABBOTTSTOWN",..: 1 2 3 4 5 6
7 8 9 10 ...
 $ state        : Factor w/ 41 levels
"AL","AR","AZ",..: 32 36 10 28 12 36 10 11 26 38
...
 $ Basecountsum : num [1:1457] 2352 2352 2352 2352 2352 ...
 $ Basecount2   : num [1:1457] 1 1 1 1 1 2 1 1 2 1 ...
 $ prop_of_total: num [1:1457] 0.000425 0.000425 0.000425 0.000425 0.000425 ...


Then I spread it

trnd2_tbl <- trnd1_tbl %>% 
    dplyr::select(city, state, prop_of_total) %>% 
    spread(key = city, value = prop_of_total, fill = 0) #remove the NA's
with fill

str(trnd2_tbl)#tibble [41 x 1,338] (S3: tbl_df/tbl/data.frame)

Then I run a Kmeans

kmeans_obj1 <- trnd2_tbl  %>% 
  dplyr::select(- state) %>% 
  kmeans(centers = 20, nstart = 100)

str(kmeans_obj1)
List of 9
 $ cluster     : int [1:41] 11 11 9 11 11 4 11 11 16 2 ...
 $ centers     : num [1:20, 1:1337] 0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:20] "1" "2" "3" "4"
...
  .. ..$ : chr [1:1337] "ABBOTTSTOWN" "ABILENE"
"ACWORTH" "ADAMS" ...
 $ totss       : num 0.00158
 $ withinss    : num [1:20] 0 0 0 0 0 0 0 0 0 0 ...
 $ tot.withinss: num 0.0000848
 $ betweenss   : num 0.0015
 $ size        : int [1:20] 1 1 1 1 1 1 1 1 1 1 ...
 $ iter        : int 3
 $ ifault      : int 0
 - attr(*, "class")= chr "kmeans"

Then I go and try to tidy:

#Tidy, glance, augment
#Just makes it easier to use or view the obj's in the obj list
  
  broom::tidy(kmeans_obj1) %>% glimpse()

	broom::glance(kmeans_obj1)
##A tibble: 1 x 4
# totss tot.withinss betweenss  iter
# <dbl>        <dbl>     <dbl> <int>
#   1 0.00158    0.0000848   0.00150     3

However, when I run this piece I get an error:

broom::augment(kmeans_obj1, trnd2_tbl) %>% 
  dplyr::select(city, .cluster)             

#Error: Must subset columns with a valid subscript vector.
# The subscript has the wrong type `data.frame<
 # u: double
#  x: double>`.i It must be numeric or character.

Here is the back trace:

rlang::last_error()

# Backtrace:
#   1. broom::augment(kmeans_obj1, trnd2_tbl)
# 9. dplyr::select(., city, .cluster)
# 11. tidyselect::vars_select(tbl_vars(.data), !!!enquos(...))
# 12. tidyselect:::eval_select_impl(...)
# 20. tidyselect:::vars_select_eval(...)
# 21. tidyselect:::walk_data_tree(expr, data_mask, context_mask)
# 22. tidyselect:::eval_c(expr, data_mask, context_mask)
# 23. tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
# 24. tidyselect:::walk_data_tree(new, data_mask, context_mask)
# 25. tidyselect:::as_indices_sel_impl(...)
# 26. tidyselect:::as_indices_impl(x, vars, strict = strict)
# 27. vctrs::vec_as_subscript(x, logical = "error")

I am not sure what I am supposed to fix?

Maybe someone has had similar error and can advise me please?

Thank you.

WHP







Proprietary

NOTICE TO RECIPIENT OF INFORMATION:\ This e-mail may con...{{dropped:16}}

Eric Berger

2020-May-12 13:39 UTC

head link

[R] Help with Kmeans output and using broom to tidy etc..

Can you create a reproducible example?
Your question involves objects that are unknown to us. (geo1, trnd1_tbl)

On Tue, May 12, 2020 at 2:41 PM Poling, William via R-help <
r-help at r-project.org> wrote:
> #RStudio Version Version 1.2.1335 need this one--> 1.2.5019
> sessionInfo()
> # R version 4.0.0 Patched (2020-05-03 r78349)
> #Platform: x86_64-w64-mingw32/x64 (64-bit)
> #Running under: Windows 10 x64 (build 17763)
>
> Hello:
>
> I have data that I am trying to manipulate for Kmeans clustering.
>
> Original data looks like this
>
> str(geo1)
> # 'data.frame': 2352 obs. of  5 variables:
> # $ ID: Factor w/ 2352 levels "101040199600",..: 590 908 976 509
1674 690
> 1336 86 726 1702 ...
> # $ state           : Factor w/ 41 levels
"AL","AR","AZ",..: 32 10 25 11 9
> 32 13 31 12 12 ...
> # $ city            : Factor w/ 1337 levels "ABBOTTSTOWN",..: 932
156 230
> 698 965 1330 515 727 1127 1304 ...
> # $ latitude        : num  40.4 31.2 40.8 42.1 26.8 ...
> # $ longitude       : num  -79.9 -81.5 -74 -91.6 -82.1 ...
>
> I created a subset adding column prop_of_total
> str(trnd1_tbl)
> tibble [1,457 x 5] (S3: tbl_df/tbl/data.frame)
>  $ city         : Factor w/ 1337 levels "ABBOTTSTOWN",..: 1 2 3 4
5 6 7 8
> 9 10 ...
>  $ state        : Factor w/ 41 levels
"AL","AR","AZ",..: 32 36 10 28 12 36
> 10 11 26 38 ...
>  $ Basecountsum : num [1:1457] 2352 2352 2352 2352 2352 ...
>  $ Basecount2   : num [1:1457] 1 1 1 1 1 2 1 1 2 1 ...
>  $ prop_of_total: num [1:1457] 0.000425 0.000425 0.000425 0.000425
> 0.000425 ...
>
>
> Then I spread it
>
> trnd2_tbl <- trnd1_tbl %>%
>     dplyr::select(city, state, prop_of_total) %>%
>     spread(key = city, value = prop_of_total, fill = 0) #remove the
NA's
> with fill
>
> str(trnd2_tbl)#tibble [41 x 1,338] (S3: tbl_df/tbl/data.frame)
>
> Then I run a Kmeans
>
> kmeans_obj1 <- trnd2_tbl  %>%
>   dplyr::select(- state) %>%
>   kmeans(centers = 20, nstart = 100)
>
> str(kmeans_obj1)
> List of 9
>  $ cluster     : int [1:41] 11 11 9 11 11 4 11 11 16 2 ...
>  $ centers     : num [1:20, 1:1337] 0 0 0 0 0 0 0 0 0 0 ...
>   ..- attr(*, "dimnames")=List of 2
>   .. ..$ : chr [1:20] "1" "2" "3"
"4" ...
>   .. ..$ : chr [1:1337] "ABBOTTSTOWN" "ABILENE"
"ACWORTH" "ADAMS" ...
>  $ totss       : num 0.00158
>  $ withinss    : num [1:20] 0 0 0 0 0 0 0 0 0 0 ...
>  $ tot.withinss: num 0.0000848
>  $ betweenss   : num 0.0015
>  $ size        : int [1:20] 1 1 1 1 1 1 1 1 1 1 ...
>  $ iter        : int 3
>  $ ifault      : int 0
>  - attr(*, "class")= chr "kmeans"
>
> Then I go and try to tidy:
>
> #Tidy, glance, augment
> #Just makes it easier to use or view the obj's in the obj list
>
>   broom::tidy(kmeans_obj1) %>% glimpse()
>
>         broom::glance(kmeans_obj1)
> ##A tibble: 1 x 4
> # totss tot.withinss betweenss  iter
> # <dbl>        <dbl>     <dbl> <int>
> #   1 0.00158    0.0000848   0.00150     3
>
> However, when I run this piece I get an error:
>
> broom::augment(kmeans_obj1, trnd2_tbl) %>%
>   dplyr::select(city, .cluster)
>
> #Error: Must subset columns with a valid subscript vector.
> # The subscript has the wrong type `data.frame<
>  # u: double
> #  x: double
> >`.
> i It must be numeric or character.
>
> Here is the back trace:
>
> rlang::last_error()
>
> # Backtrace:
> #   1. broom::augment(kmeans_obj1, trnd2_tbl)
> # 9. dplyr::select(., city, .cluster)
> # 11. tidyselect::vars_select(tbl_vars(.data), !!!enquos(...))
> # 12. tidyselect:::eval_select_impl(...)
> # 20. tidyselect:::vars_select_eval(...)
> # 21. tidyselect:::walk_data_tree(expr, data_mask, context_mask)
> # 22. tidyselect:::eval_c(expr, data_mask, context_mask)
> # 23. tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
> # 24. tidyselect:::walk_data_tree(new, data_mask, context_mask)
> # 25. tidyselect:::as_indices_sel_impl(...)
> # 26. tidyselect:::as_indices_impl(x, vars, strict = strict)
> # 27. vctrs::vec_as_subscript(x, logical = "error")
>
> I am not sure what I am supposed to fix?
>
> Maybe someone has had similar error and can advise me please?
>
> Thank you.
>
> WHP
>
>
>
>
>
>
>
> Proprietary
>
> NOTICE TO RECIPIENT OF INFORMATION:\ This e-mail may con...{{dropped:16}}
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Poling, William

2020-May-12 16:10 UTC

head link

[R] Help with Kmeans output and using broom to tidy etc..

Hello Eric, thank you so much for your consideration.

Here are snippets of data that I hope will be helpful

WHP 

geo1a <- geo1[, c(2:5)] <-- eliminating ID which is not useful for my
purposes anyway

#This is for R-Help use
geo1a <- geo1a %>% top_n(25)

state           city latitude longitude
1     ME      FAIRFIELD 44.64485 -69.65948
2     ME      JONESPORT 44.57935 -67.56743
3     ME        CASWELL 46.97529 -67.83023
4     ME      ELLSWORTH 44.52916 -68.38717
5     ME     VASSALBORO 44.45095 -69.60629
6     ME          UNION 44.20059 -69.26123
7     ME        PALERMO 44.45142 -69.41115
8     ME          ORONO 44.87426 -68.68327
9     ME    SANGERVILLE 45.10138 -69.33580
10    ME      ISLESBORO 44.29015 -68.90812
11    ME        TOPSHAM 43.93600 -69.96565
12    ME       FREEPORT 43.84089 -70.11160
13    ME      SKOWHEGAN 44.76687 -69.71644
14    ME    MILLINOCKET 45.65501 -68.70261
15    ME      ORRINGTON 44.72417 -68.74026
16    ME     ST. GEORGE 43.96726 -69.20827
17    ME FORT FAIRFIELD 46.80911 -67.88079
18    ME      MARS HILL 46.56580 -67.89006
19    ME       FREEPORT 43.85302 -70.03726
20    ME         EASTON 46.64143 -67.91203
21    ME     WATERVILLE 44.53621 -69.65913
22    ME      BRUNSWICK 43.87771 -69.96297
23    ME      BRUNSWICK 43.91719 -69.89905
24    ME      BUCKSPORT 44.60665 -68.81892
25    ME        FAYETTE 44.46380 -70.12047


trnd1_tbla <- trnd1_tbl %>% top_n(25)
print(trnd1_tbla)
head(trnd1_tbla,n=25)

A tibble: 25 x 5
   city      state Basecountsum Basecount2 prop_of_total
   <fct>     <fct>        <dbl>      <dbl>        
<dbl>
 1 ATLANTA   GA            2352         12       0.00510
 2 BRADENTON FL            2352          8       0.00340
 3 BROOKLYN  NY            2352         30       0.0128 
 4 CHARLOTTE NC            2352          8       0.00340
 5 CHICAGO   IL            2352         17       0.00723
 6 COLUMBUS  OH            2352         11       0.00468
 7 CUMMING   GA            2352          8       0.00340
 8 DALLAS    TX            2352          8       0.00340
 9 ERIE      PA            2352         12       0.00510
10 HOUSTON   TX            2352         12       0.00510
# ... with 15 more rows

WHP

From: Eric Berger <ericjberger at gmail.com> 
Sent: Tuesday, May 12, 2020 8:39 AM
To: Poling, William <PolingW at aetna.com>
Cc: r-help at r-project.org
Subject: [EXTERNAL] Re: [R] Help with Kmeans output and using broom to tidy
etc..

**** External Email - Use Caution ****
Can you create a reproducible example??
Your question involves objects that are unknown to us. (geo1, trnd1_tbl)

On Tue, May 12, 2020 at 2:41 PM Poling, William via R-help <mailto:r-help at
r-project.org> wrote:
#RStudio Version Version 1.2.1335 need this one--> 1.2.5019
sessionInfo() 
# R version 4.0.0 Patched (2020-05-03 r78349)
#Platform: x86_64-w64-mingw32/x64 (64-bit)
#Running under: Windows 10 x64 (build 17763)

Hello:

I have data that I am trying to manipulate for Kmeans clustering.

Original data looks like this

str(geo1) 
# 'data.frame': 2352 obs. of? 5 variables:
# $ ID: Factor w/ 2352 levels "101040199600",..: 590 908 976 509 1674
690 1336 86 726 1702 ...
# $ state? ? ? ? ? ?: Factor w/ 41 levels
"AL","AR","AZ",..: 32 10 25 11 9 32 13 31 12 12
...
# $ city? ? ? ? ? ? : Factor w/ 1337 levels "ABBOTTSTOWN",..: 932 156
230 698 965 1330 515 727 1127 1304 ...
# $ latitude? ? ? ? : num? 40.4 31.2 40.8 42.1 26.8 ...
# $ longitude? ? ? ?: num? -79.9 -81.5 -74 -91.6 -82.1 ...

I created a subset adding column prop_of_total 
str(trnd1_tbl)
tibble [1,457 x 5] (S3: tbl_df/tbl/data.frame)
?$ city? ? ? ? ?: Factor w/ 1337 levels "ABBOTTSTOWN",..: 1 2 3 4 5 6
7 8 9 10 ...
?$ state? ? ? ? : Factor w/ 41 levels
"AL","AR","AZ",..: 32 36 10 28 12 36 10 11 26 38
...
?$ Basecountsum : num [1:1457] 2352 2352 2352 2352 2352 ...
?$ Basecount2? ?: num [1:1457] 1 1 1 1 1 2 1 1 2 1 ...
?$ prop_of_total: num [1:1457] 0.000425 0.000425 0.000425 0.000425 0.000425 ...


Then I spread it

trnd2_tbl <- trnd1_tbl %>% 
? ? dplyr::select(city, state, prop_of_total) %>% 
? ? spread(key = city, value = prop_of_total, fill = 0) #remove the NA's
with fill

str(trnd2_tbl)#tibble [41 x 1,338] (S3: tbl_df/tbl/data.frame)

Then I run a Kmeans

kmeans_obj1 <- trnd2_tbl? %>% 
? dplyr::select(- state) %>% 
? kmeans(centers = 20, nstart = 100)

str(kmeans_obj1)
List of 9
?$ cluster? ? ?: int [1:41] 11 11 9 11 11 4 11 11 16 2 ...
?$ centers? ? ?: num [1:20, 1:1337] 0 0 0 0 0 0 0 0 0 0 ...
? ..- attr(*, "dimnames")=List of 2
? .. ..$ : chr [1:20] "1" "2" "3" "4"
...
? .. ..$ : chr [1:1337] "ABBOTTSTOWN" "ABILENE"
"ACWORTH" "ADAMS" ...
?$ totss? ? ? ?: num 0.00158
?$ withinss? ? : num [1:20] 0 0 0 0 0 0 0 0 0 0 ...
?$ tot.withinss: num 0.0000848
?$ betweenss? ?: num 0.0015
?$ size? ? ? ? : int [1:20] 1 1 1 1 1 1 1 1 1 1 ...
?$ iter? ? ? ? : int 3
?$ ifault? ? ? : int 0
?- attr(*, "class")= chr "kmeans"

Then I go and try to tidy:

#Tidy, glance, augment
#Just makes it easier to use or view the obj's in the obj list

? broom::tidy(kmeans_obj1) %>% glimpse()

? ? ? ? broom::glance(kmeans_obj1)
##A tibble: 1 x 4
# totss tot.withinss betweenss? iter
# <dbl>? ? ? ? <dbl>? ? ?<dbl> <int>
#? ?1 0.00158? ? 0.0000848? ?0.00150? ? ?3

However, when I run this piece I get an error:

broom::augment(kmeans_obj1, trnd2_tbl) %>% 
? dplyr::select(city, .cluster)? ? ? ? ? ? ?

#Error: Must subset columns with a valid subscript vector.
# The subscript has the wrong type `data.frame<
?# u: double
#? x: double>`.i It must be numeric or character.

Here is the back trace:

rlang::last_error()

# Backtrace:
#? ?1. broom::augment(kmeans_obj1, trnd2_tbl)
# 9. dplyr::select(., city, .cluster)
# 11. tidyselect::vars_select(tbl_vars(.data), !!!enquos(...))
# 12. tidyselect:::eval_select_impl(...)
# 20. tidyselect:::vars_select_eval(...)
# 21. tidyselect:::walk_data_tree(expr, data_mask, context_mask)
# 22. tidyselect:::eval_c(expr, data_mask, context_mask)
# 23. tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
# 24. tidyselect:::walk_data_tree(new, data_mask, context_mask)
# 25. tidyselect:::as_indices_sel_impl(...)
# 26. tidyselect:::as_indices_impl(x, vars, strict = strict)
# 27. vctrs::vec_as_subscript(x, logical = "error")

I am not sure what I am supposed to fix?

Maybe someone has had similar error and can advise me please?

Thank you.

WHP







Proprietary

NOTICE TO RECIPIENT OF INFORMATION:\ This e-mail may con...{{dropped:16}}

______________________________________________
mailto:R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwMFaQ&c=wluqKIiwffOpZ6k5sqMWMBOn0vyYnlulRJmmvOXCFpM&r=j7MrcIQm2xjHa8v-2mTpmTCtKvneM2ExlYvnUWbsByY&m=sMhCVDVDKajwJ9te2qVsWXQ2aq4kAe7150EICM51Pw4&s=eSV6ISkAsnmonaRvNdtmx4Lr9vumgXwMYF87DoRP86s&ePLEASE
do read the posting guide
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwMFaQ&c=wluqKIiwffOpZ6k5sqMWMBOn0vyYnlulRJmmvOXCFpM&r=j7MrcIQm2xjHa8v-2mTpmTCtKvneM2ExlYvnUWbsByY&m=sMhCVDVDKajwJ9te2qVsWXQ2aq4kAe7150EICM51Pw4&s=8wmXM73ofNcrn1i9gF-qxOzj7zRJZSPcaA5qg0vggG4&eand
provide commented, minimal, self-contained, reproducible code.

Proprietary

NOTICE TO RECIPIENT OF INFORMATION:
This e-mail may contain confidential or privileged information. If you think you
have received this e-mail in error, please advise the sender by reply e-mail and
then delete this e-mail immediately.
This e-mail may also contain protected health information (PHI) with information
about sensitive medical conditions, including, but not limited to, treatment for
substance use disorders, behavioral health, HIV/AIDS, or pregnancy. This type of
information may be protected by various federal and/or state laws which prohibit
any further disclosure without the express written consent of the person to whom
it pertains or as otherwise permitted by law. Any unauthorized further
disclosure may be considered a violation of federal and/or state law. A general
authorization for the release of medical or other information may NOT be
sufficient consent for release of this type of information.
Thank you. Aetna

R help - May 2020 - Help with Kmeans output and using broom to tidy etc..

[R] Help with Kmeans output and using broom to tidy etc..

[R] Help with Kmeans output and using broom to tidy etc..

[R] Help with Kmeans output and using broom to tidy etc..