On 12-04-13 6:20 PM, Lee Hachadoorian wrote:> I'm use RPostgreSQL to access data on a Postgres server. I would like
to
> keep my SQL statements in external files, as they're easier to write
and
> debug in pgAdmin, then I use readLines to bring them into R and feed to
> dbGetQuery.
>
> Here's the problem. When I create a SQL script with pgAdmin, then load
it
> in R, the ensuing script fails when I feed it to dbGetQuery. When I inspect
> the string in R on Linux, it *looks* OK, but fails. When I inspect the
> string in R on Windows, each file begins with "???". (It took me
a while to
> figure this out since I usually run R on Linux and the characters
weren't
> displaying there.)
>
> I haven't tested a large number of applications but it appears to be a
> pgAdmin problem. Characters appear in SQL scripts created by pgAdmin on
> both Linux and Windows, do not appear in scripts created in Notepad on
> Windows or gedit on Linux. On Windows I think I can clean the string
because
>
> sql = readLines("SELECT 1.sql")
> sql
>
> [1] "???SELECT 1;"
>
> But on Linux
>
> sql = readLines("SELECT 1.sql")
> sql
>
> [1] "SELECT 1;"
>
> So how do I remove something that isn't even there? And yet the query
fails.
>
> So I would like to know if someone knows why this is happening and how to
> avoid it, or how to clean these characters when working in R on Linux where
> they aren't visible in the string. Also, when I open the
pgAdmin-created
> files in Notepad or gedit, I don't see anything unusual. But they are
still
> there, because if I edit a pgAdmin-created file in Notepad or gedit, then
> save and read into R with readLines, they are there.
That's a byte-order mark. Unicode standards say it should be invisible
when printed, so it's probably there in Linux, but you can't see it.
Windows is reading the file as though it is Latin1 when really it's
UTF-8, so you see junk characters.
I believe it would be represented in a string in R as \uFEFF, and you
should be able to do
sql <- sub("\uFEFF", "", sql)
to get rid of it.
Duncan Murdoch