thr3ads.net - similar to: "How to print UTF-8 encoded strings from a C routine to R's output?"

Displaying 20 results from an estimated 1000 matches similar to: "How to print UTF-8 encoded strings from a C routine to R's output?"

R 2.7.0, match() and strings containing \0 - bug?

2008 Apr 28

R 2.7.0, match() and strings containing \0 - bug?

Hi, A piece of my code that uses readBin() to read a certain file type is behaving strangely with R 2.7.0. This seems to be because of a failure to match() strings after using rawToChar() when the original was terminated with a "\0" character. Direct equality testing with == still works as expected. I can reproduce this as follows: > x <- "foo" > y <-

rawToChar(raw(0))

2008 May 21

rawToChar(raw(0))

Hi, right now we have (on R v2.7.0 patched (2008-04-23 r45466)) that: > rawToChar(raw(0)) [1] "" > rawToChar(raw(0), multiple=TRUE) character(0) Is this intended or should both return character(0)? Personally, I would prefer that an empty input vector returns an empty output vector. Same should then apply to charToRaw(), but right now we get: > x <- character(0) >

In C, a fast way to slice a vector?

2009 May 10

In C, a fast way to slice a vector?

Hello, Suppose in the following code, PROTECT(sr = R_tryEval( .... )) sr is a RAWSXP vector. I wish to return another RAWSXP starting at position 13 onwards (base=0). I could create another RAWSXP of the correct length and then memcpy the required bytes and length to this new one. However is there a more efficient method? Regards Saptarshi Guha

subRaw?

2012 Jul 20

subRaw?

Hello, All: Do you know of any capability to substitute more then one byte in an object of class Raw? Consider the following: > let4 <- paste(letters[1:4], collapse='') > (let4Raw <- charToRaw(let4)) [1] 61 62 63 64 > (let. <- sub('bc', '--', let4Raw)) [1] "61" "62" "63" "64" > # no

memDecompress and zlib compressed base64 encoded string

2010 Jan 14

memDecompress and zlib compressed base64 encoded string

Hi, I have zlib compressed strings (example is attached) and would like to decompress them using memDecompress ... I try this: > connection <- file("compressed.txt","r") > compressed <- readLines(connection) > memDecompress(as.raw(compressed),type="g") Error in memDecompress(as.raw(compressed), type = "g") : internal error -3 in

getting corrupted data when using readBin() after seek() on a gzfile connection

2013 May 08

getting corrupted data when using readBin() after seek() on a gzfile connection

Hi, I'm running into more issues when reading data from a gzfile connection. If I read the data sequentially with successive calls to readBin(), the data I get looks ok. But if I call seek() between the successive calls to readBin(), I get corrupted data. Here is a (hopefully) reproducible example. See my sessionInfo() at the end (I'm not on Windows, where, according to the man page,

Consistency of serialize(): please enlighten me

2007 Aug 31

Consistency of serialize(): please enlighten me

Hi, I am puzzled with serialize(). It comes down generating identical hash codes for (apparently) identical objects using digest::digest(), which in turn relies on serialize(). Here is an example illustration the issue: ser <- function(object, ...) { list( names = names(object), namesRaw = charToRaw(names(object)), ser = serialize(names(object), connection=NULL, ascii=FALSE)

Bug Report: read.table with UTF-8 encoded file imports infinity symbol as Integer 8

2019 Feb 07

Bug Report: read.table with UTF-8 encoded file imports infinity symbol as Integer 8

I can confirm that it doesn't happen on Ubuntu 18.04.1 so Peter is most likely correct; it looks like its Windows specific. On Thu, 7 Feb 2019 at 12:55, peter dalgaard <pdalgd at gmail.com> wrote: > > This doesn't seem to be happening on MacOS, neither in Terminal nor RStudio, (R 3.5.1, R-devel, R-patched). So probably Windows specific. > > -pd > > > On 7 Feb

Compress string memCompress/Decompress

2010 Jul 09

Compress string memCompress/Decompress

Hello, I would like to compress a long string (character vector), store the compressed string in the text field of a SQLite database (using RSQLite), and then load the text back into memory and decompress it back into the the original string. My character vector can be compressed considerably using standard gzip/bzip2 compression. In theory it should be much faster for me to compress/decompress

Match() on raw objects ?

2010 Jul 27

Match() on raw objects ?

Un texte encapsul? et encod? dans un jeu de caract?res inconnu a ?t? nettoy?... Nom : non disponible URL : <https://stat.ethz.ch/pipermail/r-help/attachments/20100727/2e19110f/attachment.pl>

String processing - is there a better way

2010 Jul 21

String processing - is there a better way

I have a two part question Part 1) I am trying to remove characters in a string based on the position of a key character in another string.? I have a solution that works but it requires a for-loop.? A vectorized way of doing this has alluded me.? CleanRead<-function(x,y) { ? if (!is.character(x)) ??? x <- as.character(x) ? if (!is.character(y)) ??? y <- as.character(y) ?

Bug Report: read.table with UTF-8 encoded file imports infinity symbol as Integer 8

2019 Feb 08

Bug Report: read.table with UTF-8 encoded file imports infinity symbol as Integer 8

I can reproduce this behavior on my Windows 10 system in RGui (cp1252): when I paste the Unicode infinity symbol into the console, it is treated as number 8. This is caused by Windows "best fit" default behavior in conversion of unicode characters to characters in the current native encoding: at some point in the past, 8 has been chosen as a good fit for infinity in Windows. In my

How to include the documentation of a function in a Sweave document?

2008 Feb 25

How to include the documentation of a function in a Sweave document?

Dear R-help, I would like to include the documentation of an R function in an *.rnw document processed by Sweave. Because I'm sharing my *.rnw files with colleagues under Linux and Windows (I'm on Mac OS X), I would like a pure R solution. The naive approach doesn't work, because Sweaving this *.rnw file: -------- tmp.rnw -------- \documentclass{article} \begin{document}

rm() deletes 'c' if c('a','b') is the argument (PR#9399)

2006 Nov 29

rm() deletes 'c' if c('a','b') is the argument (PR#9399)

Full_Name: Lixin Han Version: 2.4.0 OS: Windows 2000 Submission from: (NULL) (155.94.110.222) A character vector c('a','b') is supplied to rm(). As a result, 'c' is deleted unintentionally. > a <- 1:5 > b <- 'abc' > c <- letters > ls() [1] "a" "b" "c" > rm(c('a','b')) > ls() character(0)

Embedded nuls in strings

2007 Aug 07

Embedded nuls in strings

Hi, ?rawToChar 'rawToChar' converts raw bytes either to a single character string or a character vector of single bytes. (Note that a single character string could contain embedded nuls.) Allowing embedded nuls in a string might be an interesting experiment but it seems to cause some troubles to most of the string manipulation functions. A string with an embedded 0:

writeLines argument useBytes = TRUE still making conversions

2018 Feb 15

writeLines argument useBytes = TRUE still making conversions

On Thu, Feb 15, 2018 at 11:19 AM, Kevin Ushey <kevinushey at gmail.com> wrote: > I suspect your UTF-8 string is being stripped of its encoding before > write, and so assumed to be in the system native encoding, and then > re-encoded as UTF-8 when written to the file. You can see something > similar with: > > > tmp <- '?' > > tmp <- iconv(tmp,

Output mis-encoded on Windows w/ RGui 3.5.1 in strange case

2018 Jul 16

Output mis-encoded on Windows w/ RGui 3.5.1 in strange case

Given the following R script: x <- 1 print(list()) save(x, file = tempfile()) output <- encodeString("apple") print(output) If I source this script from RGui on Windows, I see the output: > source("encoding.R") list() [1] "\002??apple\003??" That is, it's as though R has injected what looks like byte order marks around the

untar() error

2013 May 03

untar() error

Dear List, I have a list of 600+ *.gz files that I would like to extract and read the geotiffs contained within them. I tried using the untar() function to simplify this task but I am stumped by an error. I've combed the Internet for a solution without luck. The details are below, and any help in solving this matter is appreciated. > files = list.files(path = "J:/GIMMS/NDVI",

writeLines argument useBytes = TRUE still making conversions

2018 Feb 17

writeLines argument useBytes = TRUE still making conversions

Of course, right after writing this e-mail I tested on my Windows machine and did not see what I expected: > charToRaw(before) [1] c3 a9 > charToRaw(after) [1] e9 so obviously I'm misunderstanding something as well. Best, Kevin On Sat, Feb 17, 2018 at 2:19 PM, Kevin Ushey <kevinushey at gmail.com> wrote: > From my understanding, translation is implied in this line of ?file

problem with white space

2008 Mar 30

problem with white space

Hi, I need to resample characters from a dataset that consists of an extremely long string that is written over hundreds of thousands of lines, each of length 50 characters. I am currently doing this by first inserting a space after each character in the dataset and then using the following commands: y <- as.matrix(read.table("data.txt"), stringsAsFactors=FALSE) bstrap <-

similar to: How to print UTF-8 encoded strings from a C routine to R's output?