The R translation teams have done a great job in making R usable for people who do not have English as their mother tongue. However, even within English speaking countries, there are groups which have trouble with the language, and it may be valuable to support the Sociolects of these groups too. Thanks to a generous contribution from Lars Polifo, these features will be made available in an upcoming version of R. As it turns out, there are some particularly interesting challenges that needs to be addressed. Consider for instance the translation of the t test in the locale en_SF_US.UTF8 (notice the interjection of the code "SF" to denote "San Fernando Valley") t.test(extra ~ group, oh, baby, data = sleep) Welch Two Sample t-test data: extra by group t = -1.8608, like, df = 17.776, like, wow, p-value = 0.0794 alternative hypothesis: true difference in means is like, ya know, not equal to 0 95 percent confidence interval: -3.3654832 0.2054832 sample estimates: mean in group 1 mean in group 2 0.75 2.33 Notice that in addition to the simple message string modifications, it has been necessary to modify the parser so as to delete obviously superfluous arguments such as "oh" or "baby" (a particular issue here is that the argument "like" might actually be intended to mean likelihood). Similarly, for se_KC_SE.UTF8 (KC for "kitchen") we have alternate spellings of arguments like "data": t.test(ixtra ~ gruoop, deta = sleep) Velch Tvu Semple-a t-test deta: ixtra by gruoop t = -1.8608, dff = 17.776, p-felooe-a = 0.0794 elterneteefe-a hypuzeesees: trooe-a deefffference-a in meuns is nut iqooel tu 0 95 percent cunffeedence-a interfel: -3.3654832 0.2054832 semple-a isteemetes: meun in gruoop 1 meun in gruoop 2 0.75 2.33 Canadian English poses particular problems, which have not yet been resolved. If we are to do it properly, it would entail modifications to the R language itself. For instance we'd have to introduce a "four" loop and change the end-brace to the four-character string "eh?}". -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Groovy!!! Charles Annis, P.E. Charles.Annis at StatisticalEngineering.com phone: 561-352-9699 eFax: 614-455-3265 http://www.StatisticalEngineering.com -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Peter Dalgaard Sent: Tuesday, April 01, 2008 10:19 AM To: R help Subject: [R] NEW: Sociolects in R The R translation teams have done a great job in making R usable for people who do not have English as their mother tongue. However, even within English speaking countries, there are groups which have trouble with the language, and it may be valuable to support the Sociolects of these groups too. Thanks to a generous contribution from Lars Polifo, these features will be made available in an upcoming version of R. As it turns out, there are some particularly interesting challenges that needs to be addressed. Consider for instance the translation of the t test in the locale en_SF_US.UTF8 (notice the interjection of the code "SF" to denote "San Fernando Valley") t.test(extra ~ group, oh, baby, data = sleep) Welch Two Sample t-test data: extra by group t = -1.8608, like, df = 17.776, like, wow, p-value = 0.0794 alternative hypothesis: true difference in means is like, ya know, not equal to 0 95 percent confidence interval: -3.3654832 0.2054832 sample estimates: mean in group 1 mean in group 2 0.75 2.33 Notice that in addition to the simple message string modifications, it has been necessary to modify the parser so as to delete obviously superfluous arguments such as "oh" or "baby" (a particular issue here is that the argument "like" might actually be intended to mean likelihood). Similarly, for se_KC_SE.UTF8 (KC for "kitchen") we have alternate spellings of arguments like "data": t.test(ixtra ~ gruoop, deta = sleep) Velch Tvu Semple-a t-test deta: ixtra by gruoop t = -1.8608, dff = 17.776, p-felooe-a = 0.0794 elterneteefe-a hypuzeesees: trooe-a deefffference-a in meuns is nut iqooel tu 0 95 percent cunffeedence-a interfel: -3.3654832 0.2054832 semple-a isteemetes: meun in gruoop 1 meun in gruoop 2 0.75 2.33 Canadian English poses particular problems, which have not yet been resolved. If we are to do it properly, it would entail modifications to the R language itself. For instance we'd have to introduce a "four" loop and change the end-brace to the four-character string "eh?}". -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I can see that R_help mailing list always has the same quality and educative interest, even at some recurrent dates. Keep up the good job, eh. Best wishes, Eric 2008/4/1, Peter Dalgaard <P.Dalgaard@biostat.ku.dk>:> > The R translation teams have done a great job in making R usable for > people who do not have English as their mother tongue. However, even > within English speaking countries, there are groups which have trouble > with the language, and it may be valuable to support the Sociolects of > these groups too. > Thanks to a generous contribution from Lars Polifo, these features will > be made available in an upcoming version of R. > > As it turns out, there are some particularly interesting challenges that > needs to be addressed. Consider for instance the translation of the t > test in the locale en_SF_US.UTF8 (notice the interjection of the code > "SF" to denote "San Fernando Valley") > > t.test(extra ~ group, oh, baby, data = sleep) > > Welch Two Sample t-test > > data: extra by group > t = -1.8608, like, df = 17.776, like, wow, p-value = 0.0794 > alternative hypothesis: true difference in means is like, ya know, not > equal to 0 > 95 percent confidence interval: > -3.3654832 0.2054832 > sample estimates: > mean in group 1 mean in group 2 > 0.75 2.33 > > > > Notice that in addition to the simple message string modifications, it > has been necessary to modify the parser so as to delete obviously > superfluous arguments such as "oh" or "baby" (a particular issue here is > that the argument "like" might actually be intended to mean likelihood). > Similarly, for se_KC_SE.UTF8 (KC for "kitchen") we have alternate > spellings of arguments like "data": > > t.test(ixtra ~ gruoop, deta = sleep) > > Velch Tvu Semple-a t-test > > deta: ixtra by gruoop > t = -1.8608, dff = 17.776, p-felooe-a = 0.0794 > elterneteefe-a hypuzeesees: trooe-a deefffference-a in meuns is nut iqooel > tu 0 > 95 percent cunffeedence-a interfel: > -3.3654832 0.2054832 > semple-a isteemetes: > meun in gruoop 1 meun in gruoop 2 > 0.75 2.33 > > Canadian English poses particular problems, which have not yet been > resolved. If we are to do it properly, it would entail modifications to > the R language itself. For instance we'd have to introduce a "four" loop > and change the end-brace to the four-character string "eh?}". > > -- > O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 > ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Eric Lecoutre Consultant - Business & Decision Business Intelligence & Customer Intelligence [[alternative HTML version deleted]]
Dear Peter, congratulations. Looks very impressive. Seems like you guys in Denmark are very productive this time of the year. This brings me to my actual problem: isn't Lars Polifo a close relative of Rolf Poalis? Has there been any recent progress with the 'sas2r' parser? http://tolstoy.newcastle.edu.au/R/help/04/04/0009.html Best, Roland Peter Dalgaard wrote:> The R translation teams have done a great job in making R usable for > people who do not have English as their mother tongue. However, even > within English speaking countries, there are groups which have trouble > with the language, and it may be valuable to support the Sociolects of > these groups too. > Thanks to a generous contribution from Lars Polifo, these features will > be made available in an upcoming version of R. > > As it turns out, there are some particularly interesting challenges that > needs to be addressed. Consider for instance the translation of the t > test in the locale en_SF_US.UTF8 (notice the interjection of the code > "SF" to denote "San Fernando Valley") > > t.test(extra ~ group, oh, baby, data = sleep) > > Welch Two Sample t-test > > data: extra by group > t = -1.8608, like, df = 17.776, like, wow, p-value = 0.0794 > alternative hypothesis: true difference in means is like, ya know, not equal to 0 > 95 percent confidence interval: > -3.3654832 0.2054832 > sample estimates: > mean in group 1 mean in group 2 > 0.75 2.33 > > > > Notice that in addition to the simple message string modifications, it > has been necessary to modify the parser so as to delete obviously > superfluous arguments such as "oh" or "baby" (a particular issue here is > that the argument "like" might actually be intended to mean likelihood). > Similarly, for se_KC_SE.UTF8 (KC for "kitchen") we have alternate > spellings of arguments like "data": > > t.test(ixtra ~ gruoop, deta = sleep) > > Velch Tvu Semple-a t-test > > deta: ixtra by gruoop > t = -1.8608, dff = 17.776, p-felooe-a = 0.0794 > elterneteefe-a hypuzeesees: trooe-a deefffference-a in meuns is nut iqooel tu 0 > 95 percent cunffeedence-a interfel: > -3.3654832 0.2054832 > semple-a isteemetes: > meun in gruoop 1 meun in gruoop 2 > 0.75 2.33 > > Canadian English poses particular problems, which have not yet been > resolved. If we are to do it properly, it would entail modifications to > the R language itself. For instance we'd have to introduce a "four" loop > and change the end-brace to the four-character string "eh?}". >
On Tuesday 01 April 2008 04:18:55 pm Peter Dalgaard wrote: PD> The R translation teams have done a great job in making R usable for PD> people who do not have English as their mother tongue. However, even PD> within English speaking countries, there are groups which have trouble PD> with the language, and it may be valuable to support the Sociolects of PD> these groups too. Great news! I would love to see something like german saxonian accent: d.d?ssd(?gschdro ~ grubbe, doodn = schlofm) ? ? ? ? W?lsch Zwou S?mbel d-d?ssd Doodn: ??gschdro bro grubbe t = -1.8608, Froiheedgroode = 17.776, b-W?rd = 0.0794 Ald?rnadivve Hippoth??se: D?r Und?rschidd da Durschnidde is n?sch Null F?mneunzsch Brodzend Gonfiddenzind?rwall: -3.3654832 0.2054832 S?mbel Sch?ddzung; Dorchschnidd in Grubbe 1 Dorchschnidd in Grubbe 2 0.75 2.33 ;-) Stefan