R 3.2.0 OS X Colleagues, Earlier today, I initiated a series of emails regarding SASxport (which was removed from CRAN). David Winsemius proposed downloading the source code and installing with the following command: install.packages('~/Downloads/SASxport_1.5.0.tar.gz', repos = NULL , type="source?)Th That works and I am grateful to David for his recommendation. However, the package fails on some of the many objects that I attempted to write with: write.xport The error message was: Error in nchar(var) : invalid multibyte string 3157 One work-around would be to edit out multibyte strings. Is there a simple way to find and replace them? Or is there some other clever approach that bypasses the problem? Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com
On Sep 25, 2015, at 2:23 PM, Dennis Fisher wrote:> R 3.2.0 > OS X > > Colleagues, > > Earlier today, I initiated a series of emails regarding SASxport (which was removed from CRAN). David Winsemius proposed downloading the source code and installing with the following command: > install.packages('~/Downloads/SASxport_1.5.0.tar.gz', repos = NULL , type="source?)Th > > That works and I am grateful to David for his recommendation. However, the package fails on some of the many objects that I attempted to write with: > write.xport > > The error message was: > Error in nchar(var) : invalid multibyte string 3157Consider using traceback() to see what section of code is actually reporting? Since the error reported in your earlier message indicated a problem with a particular word starting with DIARRH and ending in ?????A. When I try to drop that unquoted into an R console line I get:> DIARRH??????AError: unexpected input in "DIARRH?" My word process tells me that little comma-like glyph is a cedilla. However I'm not sure this is reproducible problem since I am unable to produce a similar error with the toy file that is built with the write.xport help page code:> abc <- data.frame( x=c(1, 2, NA, NA ), y=c('a', 'DIARRH??????A', NA, '*' ) ) > abcx y 1 1 a 2 2 DIARRH??????A 3 NA <NA> 4 NA *> SASformat(abc$x) <- 'date7.' > label(abc$y) <- 'character variable' > label(abc) <- 'Simple example' > SAStype(abc) <- 'MYTYPE' > str(abc)'data.frame': 4 obs. of 2 variables: $ x: atomic 1 2 NA NA ..- attr(*, "SASformat")= chr "date7." $ y: Factor w/ 3 levels "*","a","DIARRH??????A": 2 3 NA 1 ..- attr(*, "label")= chr "character variable" - attr(*, "label")= chr "Simple example" - attr(*, "SAStype")= chr "MYTYPE"> write.xport( abc, file="xxx.dat" ) > abc <- data.frame( x=c(1, 2, NA, NA ), y=c('a', 'DIARRH??????A', NA, '*' ) ) > abcx y 1 1 a 2 2 DIARRH??????A 3 NA <NA> 4 NA *> SASformat(abc$x) <- 'date7.' > label(abc$y) <- '"DIARRH??????A"' > label(abc) <- 'Simple example' > SAStype(abc) <- 'MYTYPE' > str(abc)'data.frame': 4 obs. of 2 variables: $ x: atomic 1 2 NA NA ..- attr(*, "SASformat")= chr "date7." $ y: Factor w/ 3 levels "*","a","DIARRH??????A": 2 3 NA 1 ..- attr(*, "label")= chr "\"DIARRH??????A\"" - attr(*, "label")= chr "Simple example" - attr(*, "SAStype")= chr "MYTYPE"> write.xport( abc, file="xxx.dat" )> > One work-around would be to edit out multibyte strings. Is there a simple way to find and replace them?On a Mac I have used the Zap Gremlins option in TextWrangler.app. It would change the spelling of words that were originally constructed using ligature characters. Best of luck; David.> Or is there some other clever approach that bypasses the problem? > > Dennis > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Dennis, The invalid multibyte issue is almost certainly a symptom of being in a UTF-8 locale and trying to handle strings that aren't in UTF-8. (UTF uses particular 8 bit patterns to say that the following k bytes contain a Unicode value outside ASCII, other "8 bit ASCII" encodings, like Latin-1, just use the extra 128 character codes for special characters. Treating the latter as the former causes errors, the other way around just looks weird. So perhaps you should try diddling your locale settings and/or look for encoding arguments for the functions that you use. Then again, the XPT format may not be happy with non-ASCII characters, whatever the encoding, in which case you may need to massage the input data sets and change variable names and factor labels (iconv() should be your friend). By the way, I don't think the FDA "requests" XPT files. As far as I recall, they say somewhere that they _accept_ them (possibly defending themselves against the platform-specific SAS files that once abunded), but I think even Excel goes for submissions - the important thing is that they can get at the actual data reasonably easy. I can see the attraction of taking the well-trodden path, though. -pd> On 25 Sep 2015, at 23:23 , Dennis Fisher <fisher at plessthan.com> wrote: > > R 3.2.0 > OS X > > Colleagues, > > Earlier today, I initiated a series of emails regarding SASxport (which was removed from CRAN). David Winsemius proposed downloading the source code and installing with the following command: > install.packages('~/Downloads/SASxport_1.5.0.tar.gz', repos = NULL , type="source?)Th > > That works and I am grateful to David for his recommendation. However, the package fails on some of the many objects that I attempted to write with: > write.xport > > The error message was: > Error in nchar(var) : invalid multibyte string 3157 > > One work-around would be to edit out multibyte strings. Is there a simple way to find and replace them? Or is there some other clever approach that bypasses the problem? > > Dennis > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Peter Thanks for the explanation. One further comment ? you wrote:> I don't think the FDA "requests" XPT filesIn fact, they do make such a request. Here is the actual language received this week (and repeatedly in the past):> Program/script files should be submitted using text files (*.TXT) and the data should be submitted using SAS transport files (*.XPT).Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com> On Sep 26, 2015, at 5:52 AM, peter dalgaard <pdalgd at gmail.com> wrote: > > Dennis, > > The invalid multibyte issue is almost certainly a symptom of being in a UTF-8 locale and trying to handle strings that aren't in UTF-8. (UTF uses particular 8 bit patterns to say that the following k bytes contain a Unicode value outside ASCII, other "8 bit ASCII" encodings, like Latin-1, just use the extra 128 character codes for special characters. Treating the latter as the former causes errors, the other way around just looks weird. > > So perhaps you should try diddling your locale settings and/or look for encoding arguments for the functions that you use. Then again, the XPT format may not be happy with non-ASCII characters, whatever the encoding, in which case you may need to massage the input data sets and change variable names and factor labels (iconv() should be your friend). > > By the way, I don't think the FDA "requests" XPT files. As far as I recall, they say somewhere that they _accept_ them (possibly defending themselves against the platform-specific SAS files that once abunded), but I think even Excel goes for submissions - the important thing is that they can get at the actual data reasonably easy. I can see the attraction of taking the well-trodden path, though. > > -pd > >> On 25 Sep 2015, at 23:23 , Dennis Fisher <fisher at plessthan.com> wrote: >> >> R 3.2.0 >> OS X >> >> Colleagues, >> >> Earlier today, I initiated a series of emails regarding SASxport (which was removed from CRAN). David Winsemius proposed downloading the source code and installing with the following command: >> install.packages('~/Downloads/SASxport_1.5.0.tar.gz', repos = NULL , type="source?)Th >> >> That works and I am grateful to David for his recommendation. However, the package fails on some of the many objects that I attempted to write with: >> write.xport >> >> The error message was: >> Error in nchar(var) : invalid multibyte string 3157 >> >> One work-around would be to edit out multibyte strings. Is there a simple way to find and replace them? Or is there some other clever approach that bypasses the problem? >> >> Dennis >> >> Dennis Fisher MD >> P < (The "P Less Than" Company) >> Phone: 1-866-PLessThan (1-866-753-7784) >> Fax: 1-866-PLessThan (1-866-753-7784) >> www.PLessThan.com >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > >