thr3ads.net - R help - [R] find unique and summerize [Feb 2018]

If this information is useful, please help other people find it:
Share via:

Val

2018-Feb-03 03:00 UTC

[R] find unique and summerize

Hi all,

I have a data set  need to be summarized by unique ID (count and sum of a
variable)
A unique individual ID (country name  Abbreviation  followed by an integer
numbers)  may  have observation in several countries. Then the  ID was
changed by adding the country code as a prefix  and  new ID was constructed
or recorded like (country code, + the original unique ID  Example
original ID   "CAN1540164" , if this ID has an observation in CANADA
then
the ID was changed to    "1CAN1540164".   From this new ID I want get
out
the country code  get the  original unique ID  and   summarize the data by
unique ID and country code

The data set look like
mydata <- read.table(textConnection("GR ID iflag Y
A 1CAN1540164 1 20
A 1CAN1540164 1 12
A 1CAN1540164 1 15
A 44CAN1540164 1 30
A 44CAN1540164 1 24
A 44CAN1540164 1 25
A 44CAN1540164 1 11
A 33CAN1540164 1 12
A 33CAN1540164 1 23
A 33CAN1540164 1 65
A 33CAN1540164 1 41
A 358CAN1540164 1 28
A 358CAN1540164 1 32
A 358CAN1540164 1 41
A 358CAN1540164 1 54
A 358CAN1540164 1 29
A 358CAN1540164 1 64
B 1USA1540165 1 125
B 1USA1540165 1 165
B 44USA1540165 1 171
B 33USA1540165 1 254
B 33USA1540165 1 241
B 33USA1540165 1 262
B 358USA1540165 1 321
C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE)
>From the above data there are three unique IDs and  four country codes (1,44, 33 and 358)

I want the following two tables

Table 1. count  the  unique ID by country code
                          1   44   33   358     TOT
CAN1540164     3    4     4      6        17
USA1540165      2   1      3     1          7
FIN1540166       -     -       -      1         1
           TOT         5    5      7      8       25


Table 2  Sum of Y variable by unique ID and country. code

                          1       44       33      358      TOT
CAN1540164    47     90      141      248       526
USA1540165   290   171      757      321     1539
FIN1540166        -        -         -         225       225
            TOT      337     261      898    794     2290


How do I do it in R?

 The first step is to get the unique country codes unique ID by splitting
the new ID

Thank you in advance

	[[alternative HTML version deleted]]

Rui Barradas

2018-Feb-03 05:26 UTC

head link

[R] find unique and summerize

Hello,

Thanks for the reproducible example.
See if the following does what you want.

IDNum <- sub("^(\\d+).*", "\\1", mydata$ID)
Country <- sub("^\\d+(.*)", "\\1", mydata$ID)

tbl1 <- table(Country, IDNum)
addmargins(tbl1)

tbl2 <- xtabs(Y ~ Country + IDNum, mydata)
addmargins(tbl2)


Hope this helps,

Rui Barradas

On 2/3/2018 3:00 AM, Val wrote:> Hi all,
> 
> I have a data set  need to be summarized by unique ID (count and sum of a
> variable)
> A unique individual ID (country name  Abbreviation  followed by an integer
> numbers)  may  have observation in several countries. Then the  ID was
> changed by adding the country code as a prefix  and  new ID was constructed
> or recorded like (country code, + the original unique ID  Example
> original ID   "CAN1540164" , if this ID has an observation in
CANADA then
> the ID was changed to    "1CAN1540164".   From this new ID I want
get out
> the country code  get the  original unique ID  and   summarize the data by
> unique ID and country code
> 
> The data set look like
> mydata <- read.table(textConnection("GR ID iflag Y
> A 1CAN1540164 1 20
> A 1CAN1540164 1 12
> A 1CAN1540164 1 15
> A 44CAN1540164 1 30
> A 44CAN1540164 1 24
> A 44CAN1540164 1 25
> A 44CAN1540164 1 11
> A 33CAN1540164 1 12
> A 33CAN1540164 1 23
> A 33CAN1540164 1 65
> A 33CAN1540164 1 41
> A 358CAN1540164 1 28
> A 358CAN1540164 1 32
> A 358CAN1540164 1 41
> A 358CAN1540164 1 54
> A 358CAN1540164 1 29
> A 358CAN1540164 1 64
> B 1USA1540165 1 125
> B 1USA1540165 1 165
> B 44USA1540165 1 171
> B 33USA1540165 1 254
> B 33USA1540165 1 241
> B 33USA1540165 1 262
> B 358USA1540165 1 321
> C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE)
> 
>  From the above data there are three unique IDs and  four country codes (1,
> 44, 33 and 358)
> 
> I want the following two tables
> 
> Table 1. count  the  unique ID by country code
>                            1   44   33   358     TOT
> CAN1540164     3    4     4      6        17
> USA1540165      2   1      3     1          7
> FIN1540166       -     -       -      1         1
>             TOT         5    5      7      8       25
> 
> 
> Table 2  Sum of Y variable by unique ID and country. code
> 
>                            1       44       33      358      TOT
> CAN1540164    47     90      141      248       526
> USA1540165   290   171      757      321     1539
> FIN1540166        -        -         -         225       225
>              TOT      337     261      898    794     2290
> 
> 
> How do I do it in R?
> 
>   The first step is to get the unique country codes unique ID by splitting
> the new ID
> 
> Thank you in advance
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Val

2018-Feb-03 16:42 UTC

head link

[R] find unique and summerize

Thank you so much Rui.

1. How do I export this table to excel file?
I used this
      tbl1 <- table(Country, IDNum)
      tbl2=addmargins(tbl1)
      write.xlsx(tbl2,"tt1.xlsx"),sheetName="summary",
row.names=FALSE)
The above did not give me that table.


2. I want select those unique Ids that do have records in all countries.
 From the above data set, this ID  "FIN1540166"  should be excluded
from
the summary table and the table looks like as follow

                IDNum
Country         1   33  358   44  Sum
  CAN1540164   47  141  248   90  526
  USA1540165  290  757  321  171 1539
  Sum         337  898  569  261 2065

Thank you again


On Fri, Feb 2, 2018 at 11:26 PM, Rui Barradas <ruipbarradas at sapo.pt>
wrote:
> Hello,
>
> Thanks for the reproducible example.
> See if the following does what you want.
>
> IDNum <- sub("^(\\d+).*", "\\1", mydata$ID)
> Country <- sub("^\\d+(.*)", "\\1", mydata$ID)
>
> tbl1 <- table(Country, IDNum)
> addmargins(tbl1)
>
> tbl2 <- xtabs(Y ~ Country + IDNum, mydata)
> addmargins(tbl2)
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> On 2/3/2018 3:00 AM, Val wrote:
>
>> Hi all,
>>
>> I have a data set  need to be summarized by unique ID (count and sum of
a
>> variable)
>> A unique individual ID (country name  Abbreviation  followed by an
integer
>> numbers)  may  have observation in several countries. Then the  ID was
>> changed by adding the country code as a prefix  and  new ID was
>> constructed
>> or recorded like (country code, + the original unique ID  Example
>> original ID   "CAN1540164" , if this ID has an observation in
CANADA then
>> the ID was changed to    "1CAN1540164".   From this new ID I
want get out
>> the country code  get the  original unique ID  and   summarize the data
by
>> unique ID and country code
>>
>> The data set look like
>> mydata <- read.table(textConnection("GR ID iflag Y
>> A 1CAN1540164 1 20
>> A 1CAN1540164 1 12
>> A 1CAN1540164 1 15
>> A 44CAN1540164 1 30
>> A 44CAN1540164 1 24
>> A 44CAN1540164 1 25
>> A 44CAN1540164 1 11
>> A 33CAN1540164 1 12
>> A 33CAN1540164 1 23
>> A 33CAN1540164 1 65
>> A 33CAN1540164 1 41
>> A 358CAN1540164 1 28
>> A 358CAN1540164 1 32
>> A 358CAN1540164 1 41
>> A 358CAN1540164 1 54
>> A 358CAN1540164 1 29
>> A 358CAN1540164 1 64
>> B 1USA1540165 1 125
>> B 1USA1540165 1 165
>> B 44USA1540165 1 171
>> B 33USA1540165 1 254
>> B 33USA1540165 1 241
>> B 33USA1540165 1 262
>> B 358USA1540165 1 321
>> C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE)
>>
>>  From the above data there are three unique IDs and  four country codes
>> (1,
>> 44, 33 and 358)
>>
>> I want the following two tables
>>
>> Table 1. count  the  unique ID by country code
>>                            1   44   33   358     TOT
>> CAN1540164     3    4     4      6        17
>> USA1540165      2   1      3     1          7
>> FIN1540166       -     -       -      1         1
>>             TOT         5    5      7      8       25
>>
>>
>> Table 2  Sum of Y variable by unique ID and country. code
>>
>>                            1       44       33      358      TOT
>> CAN1540164    47     90      141      248       526
>> USA1540165   290   171      757      321     1539
>> FIN1540166        -        -         -         225       225
>>              TOT      337     261      898    794     2290
>>
>>
>> How do I do it in R?
>>
>>   The first step is to get the unique country codes unique ID by
splitting
>> the new ID
>>
>> Thank you in advance
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more reasonably related threads

R help - Feb 2018 - find unique and summerize

[R] find unique and summerize

[R] find unique and summerize

[R] find unique and summerize

Maybe Matching Threads