Thank you so much Rui. 1. How do I export this table to excel file? I used this tbl1 <- table(Country, IDNum) tbl2=addmargins(tbl1) write.xlsx(tbl2,"tt1.xlsx"),sheetName="summary", row.names=FALSE) The above did not give me that table. 2. I want select those unique Ids that do have records in all countries. From the above data set, this ID "FIN1540166" should be excluded from the summary table and the table looks like as follow IDNum Country 1 33 358 44 Sum CAN1540164 47 141 248 90 526 USA1540165 290 757 321 171 1539 Sum 337 898 569 261 2065 Thank you again On Fri, Feb 2, 2018 at 11:26 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > Thanks for the reproducible example. > See if the following does what you want. > > IDNum <- sub("^(\\d+).*", "\\1", mydata$ID) > Country <- sub("^\\d+(.*)", "\\1", mydata$ID) > > tbl1 <- table(Country, IDNum) > addmargins(tbl1) > > tbl2 <- xtabs(Y ~ Country + IDNum, mydata) > addmargins(tbl2) > > > Hope this helps, > > Rui Barradas > > > On 2/3/2018 3:00 AM, Val wrote: > >> Hi all, >> >> I have a data set need to be summarized by unique ID (count and sum of a >> variable) >> A unique individual ID (country name Abbreviation followed by an integer >> numbers) may have observation in several countries. Then the ID was >> changed by adding the country code as a prefix and new ID was >> constructed >> or recorded like (country code, + the original unique ID Example >> original ID "CAN1540164" , if this ID has an observation in CANADA then >> the ID was changed to "1CAN1540164". From this new ID I want get out >> the country code get the original unique ID and summarize the data by >> unique ID and country code >> >> The data set look like >> mydata <- read.table(textConnection("GR ID iflag Y >> A 1CAN1540164 1 20 >> A 1CAN1540164 1 12 >> A 1CAN1540164 1 15 >> A 44CAN1540164 1 30 >> A 44CAN1540164 1 24 >> A 44CAN1540164 1 25 >> A 44CAN1540164 1 11 >> A 33CAN1540164 1 12 >> A 33CAN1540164 1 23 >> A 33CAN1540164 1 65 >> A 33CAN1540164 1 41 >> A 358CAN1540164 1 28 >> A 358CAN1540164 1 32 >> A 358CAN1540164 1 41 >> A 358CAN1540164 1 54 >> A 358CAN1540164 1 29 >> A 358CAN1540164 1 64 >> B 1USA1540165 1 125 >> B 1USA1540165 1 165 >> B 44USA1540165 1 171 >> B 33USA1540165 1 254 >> B 33USA1540165 1 241 >> B 33USA1540165 1 262 >> B 358USA1540165 1 321 >> C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE) >> >> From the above data there are three unique IDs and four country codes >> (1, >> 44, 33 and 358) >> >> I want the following two tables >> >> Table 1. count the unique ID by country code >> 1 44 33 358 TOT >> CAN1540164 3 4 4 6 17 >> USA1540165 2 1 3 1 7 >> FIN1540166 - - - 1 1 >> TOT 5 5 7 8 25 >> >> >> Table 2 Sum of Y variable by unique ID and country. code >> >> 1 44 33 358 TOT >> CAN1540164 47 90 141 248 526 >> USA1540165 290 171 757 321 1539 >> FIN1540166 - - - 225 225 >> TOT 337 261 898 794 2290 >> >> >> How do I do it in R? >> >> The first step is to get the unique country codes unique ID by splitting >> the new ID >> >> Thank you in advance >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >>[[alternative HTML version deleted]]
Hello, As for the first question, instead of writing a xlsx file, maybe it is easier to write a csv file and then open it with Excel. tbl2 <- addmargins(tbl1) write.csv(tbl2, "tt1.csv") As for the second question, the following does it. inx <- apply(tbl1, 1, function(x) all(x != 0)) tbl1b <- addmargins(tbl1[inx, ]) tbl1b Hope this helps, Rui Barradas On 2/3/2018 4:42 PM, Val wrote:> Thank you so much Rui. > > 1. How do I export this table to excel file? > I used this > ????? tbl1 <- table(Country, IDNum) > ????? tbl2=addmargins(tbl1) > ????? write.xlsx(tbl2,"tt1.xlsx"),sheetName="summary", row.names=FALSE) > The above did not give me that table. > > > 2. I want select those unique Ids that do have records in all countries. > ?From the above data set, this ID? "FIN1540166"? should be excluded > from the summary table and the table looks like as follow > > IDNum Country 1 33 358 44 Sum CAN1540164 47 141 248 90 526 USA1540165 > 290 757 321 171 1539 Sum 337 898 569 261 2065 > > Thank you again > > > On Fri, Feb 2, 2018 at 11:26 PM, Rui Barradas <ruipbarradas at sapo.pt > <mailto:ruipbarradas at sapo.pt>> wrote: > > Hello, > > Thanks for the reproducible example. > See if the following does what you want. > > IDNum <- sub("^(\\d+).*", "\\1", mydata$ID) > Country <- sub("^\\d+(.*)", "\\1", mydata$ID) > > tbl1 <- table(Country, IDNum) > addmargins(tbl1) > > tbl2 <- xtabs(Y ~ Country + IDNum, mydata) > addmargins(tbl2) > > > Hope this helps, > > Rui Barradas > > > On 2/3/2018 3:00 AM, Val wrote: > > Hi all, > > I have a data set? need to be summarized by unique ID (count and > sum of a > variable) > A unique individual ID (country name? Abbreviation? followed by > an integer > numbers)? may? have observation in several countries. Then the > ID was > changed by adding the country code as a prefix? and? new ID was > constructed > or recorded like (country code, + the original unique ID? Example > original ID? ?"CAN1540164" , if this ID has an observation in > CANADA then > the ID was changed to? ? "1CAN1540164".? ?From this new ID I > want get out > the country code? get the? original unique ID? and? ?summarize > the data by > unique ID and country code > > The data set look like > mydata <- read.table(textConnection("GR ID iflag Y > A 1CAN1540164 1 20 > A 1CAN1540164 1 12 > A 1CAN1540164 1 15 > A 44CAN1540164 1 30 > A 44CAN1540164 1 24 > A 44CAN1540164 1 25 > A 44CAN1540164 1 11 > A 33CAN1540164 1 12 > A 33CAN1540164 1 23 > A 33CAN1540164 1 65 > A 33CAN1540164 1 41 > A 358CAN1540164 1 28 > A 358CAN1540164 1 32 > A 358CAN1540164 1 41 > A 358CAN1540164 1 54 > A 358CAN1540164 1 29 > A 358CAN1540164 1 64 > B 1USA1540165 1 125 > B 1USA1540165 1 165 > B 44USA1540165 1 171 > B 33USA1540165 1 254 > B 33USA1540165 1 241 > B 33USA1540165 1 262 > B 358USA1540165 1 321 > C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE) > > ?From the above data there are three unique IDs and? four > country codes (1, > 44, 33 and 358) > > I want the following two tables > > Table 1. count? the? unique ID by country code > ? ? ? ? ? ? ? ? ? ? ? ? ? ?1? ?44? ?33? ?358? ? ?TOT > CAN1540164? ? ?3? ? 4? ? ?4? ? ? 6? ? ? ? 17 > USA1540165? ? ? 2? ?1? ? ? 3? ? ?1? ? ? ? ? 7 > FIN1540166? ? ? ?-? ? ?-? ? ? ?-? ? ? 1? ? ? ? ?1 > ? ? ? ? ? ? TOT? ? ? ? ?5? ? 5? ? ? 7? ? ? 8? ? ? ?25 > > > Table 2? Sum of Y variable by unique ID and country. code > > ? ? ? ? ? ? ? ? ? ? ? ? ? ?1? ? ? ?44? ? ? ?33? ? ? 358? ? ? TOT > CAN1540164? ? 47? ? ?90? ? ? 141? ? ? 248? ? ? ?526 > USA1540165? ?290? ?171? ? ? 757? ? ? 321? ? ?1539 > FIN1540166? ? ? ? -? ? ? ? -? ? ? ? ?-? ? ? ? ?225? ? ? ?225 > ? ? ? ? ? ? ?TOT? ? ? 337? ? ?261? ? ? 898? ? 794? ? ?2290 > > > How do I do it in R? > > ? The first step is to get the unique country codes unique ID > by splitting > the new ID > > Thank you in advance > > ? ? ? ? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> mailing list > -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > >
Thank you so much Rui! On Sun, Feb 4, 2018 at 12:20 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > Please always cc the list. > > As for the question, I believe the following does it. > > a <- strsplit(mydata$ID, "[[:alpha:]]+") > b <- strsplit(mydata$ID, "[[:digit:]]+") > > a <- sapply(a, `[`, 1) > c <- sapply(a, `[`, 2) > b <- sapply(b, function(x) x[x != ""]) > > c2 <- sprintf("%010d", as.integer(c)) > > newID <- paste0(a, b, c2) > > > Hope this helps, > > Rui Barradas > > On 2/4/2018 2:01 AM, Val wrote: > >> Thank you so much again for your help! >> >> I have one more question related to this. >> >> 1. How do I further split this "358USA1540165 " into three parts. >> a) 358 >> b) USA >> c) 1540165 >> >> I want to add leading zeros to the third part like "0001540165" >> and then combine b and c to get this USA1540165 >> so USA1540165 changed to USA1540165 >> >> The other one is that the data set has several country codes and if I >> want to limit my data set to only certain country codes , how do I do that. >> >> Thank you again >> >> >> >> >> On Sat, Feb 3, 2018 at 1:05 PM, Rui Barradas <ruipbarradas at sapo.pt >> <mailto:ruipbarradas at sapo.pt>> wrote: >> >> Hello, >> >> As for the first question, instead of writing a xlsx file, maybe it >> is easier to write a csv file and then open it with Excel. >> >> tbl2 <- addmargins(tbl1) >> write.csv(tbl2, "tt1.csv") >> >> As for the second question, the following does it. >> >> inx <- apply(tbl1, 1, function(x) all(x != 0)) >> tbl1b <- addmargins(tbl1[inx, ]) >> tbl1b >> >> >> Hope this helps, >> >> Rui Barradas >> >> On 2/3/2018 4:42 PM, Val wrote: >> >> Thank you so much Rui. >> >> 1. How do I export this table to excel file? >> I used this >> tbl1 <- table(Country, IDNum) >> tbl2=addmargins(tbl1) >> write.xlsx(tbl2,"tt1.xlsx"),sheetName="summary", >> row.names=FALSE) >> The above did not give me that table. >> >> >> 2. I want select those unique Ids that do have records in all >> countries. >> From the above data set, this ID "FIN1540166" should be >> excluded from the summary table and the table looks like as follow >> >> IDNum Country 1 33 358 44 Sum CAN1540164 47 141 248 90 526 >> USA1540165 290 757 321 171 1539 Sum 337 898 569 261 2065 >> >> Thank you again >> >> >> On Fri, Feb 2, 2018 at 11:26 PM, Rui Barradas >> <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt> >> <mailto:ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>>> >> wrote: >> >> Hello, >> >> Thanks for the reproducible example. >> See if the following does what you want. >> >> IDNum <- sub("^(\\d+).*", "\\1", mydata$ID) >> Country <- sub("^\\d+(.*)", "\\1", mydata$ID) >> >> tbl1 <- table(Country, IDNum) >> addmargins(tbl1) >> >> tbl2 <- xtabs(Y ~ Country + IDNum, mydata) >> addmargins(tbl2) >> >> >> Hope this helps, >> >> Rui Barradas >> >> >> On 2/3/2018 3:00 AM, Val wrote: >> >> Hi all, >> >> I have a data set need to be summarized by unique ID >> (count and >> sum of a >> variable) >> A unique individual ID (country name Abbreviation >> followed by >> an integer >> numbers) may have observation in several countries. >> Then the ID was >> changed by adding the country code as a prefix and >> new ID was >> constructed >> or recorded like (country code, + the original unique >> ID Example >> original ID "CAN1540164" , if this ID has an >> observation in >> CANADA then >> the ID was changed to "1CAN1540164". From this new >> ID I >> want get out >> the country code get the original unique ID and >> summarize >> the data by >> unique ID and country code >> >> The data set look like >> mydata <- read.table(textConnection("GR ID iflag Y >> A 1CAN1540164 1 20 >> A 1CAN1540164 1 12 >> A 1CAN1540164 1 15 >> A 44CAN1540164 1 30 >> A 44CAN1540164 1 24 >> A 44CAN1540164 1 25 >> A 44CAN1540164 1 11 >> A 33CAN1540164 1 12 >> A 33CAN1540164 1 23 >> A 33CAN1540164 1 65 >> A 33CAN1540164 1 41 >> A 358CAN1540164 1 28 >> A 358CAN1540164 1 32 >> A 358CAN1540164 1 41 >> A 358CAN1540164 1 54 >> A 358CAN1540164 1 29 >> A 358CAN1540164 1 64 >> B 1USA1540165 1 125 >> B 1USA1540165 1 165 >> B 44USA1540165 1 171 >> B 33USA1540165 1 254 >> B 33USA1540165 1 241 >> B 33USA1540165 1 262 >> B 358USA1540165 1 321 >> C 358FIN1540166 1 225 "),header = TRUE >> ,stringsAsFactors = FALSE) >> >> From the above data there are three unique IDs and >> four >> country codes (1, >> 44, 33 and 358) >> >> I want the following two tables >> >> Table 1. count the unique ID by country code >> 1 44 33 358 TOT >> CAN1540164 3 4 4 6 17 >> USA1540165 2 1 3 1 7 >> FIN1540166 - - - 1 1 >> TOT 5 5 7 8 25 >> >> >> Table 2 Sum of Y variable by unique ID and country. code >> >> 1 44 33 >> 358 TOT >> CAN1540164 47 90 141 248 526 >> USA1540165 290 171 757 321 1539 >> FIN1540166 - - - 225 >> 225 >> TOT 337 261 898 794 >> 2290 >> >> >> How do I do it in R? >> >> The first step is to get the unique country codes >> unique ID >> by splitting >> the new ID >> >> Thank you in advance >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org <mailto:R-help at r-project.org> >> <mailto:R-help at r-project.org <mailto:R-help at r-project.org>> >> mailing list >> -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> <https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help>> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> <http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html>> >> and provide commented, minimal, self-contained, >> reproducible code. >> >> >> >>[[alternative HTML version deleted]]