I'm trying to insert rows of a data.frame into a database table, or update
where the key fields of a record already exist in the table. I've come up
with a possible solution below, but would like to hear if anyone has a better
solution.
# The problem demonstrated:
# Create a data.frame with test values
library(RODBC)
tbl <- data.frame(
key1 = rep(1:3, each = 2),
key2 = rep(LETTERS[1:2], 3),
somevalue = rnorm(6)
)
# Create table in database using the following SQL
CREATE TABLE tbl
(
key1 integer NOT NULL,
key2 character varying(1) NOT NULL,
somevalue double precision,
CONSTRAINT pktbl PRIMARY KEY (key1, key2)
)
# Continue in R
pg <- odbcConnect("testdb")
sqlSave(pg, tbl[1:2, ], append = TRUE, rownames = FALSE)
sqlSave(pg, tbl[3, ], append = TRUE, rownames = FALSE)
tbl[1, 3] <- 1
sqlUpdate(pg, tbl[1:4, ], index = c("key1", "key2")) # Fails
# Can replace the above sqlUpdate with:
sqlUpdate(pg, tbl[1:3, ], index = c("key1", "key2"))
sqlSave(pg, tbl[4, ], append = TRUE, rownames = FALSE)
# Proposed solution:
tbl[1, 3] <- 0
tmp <- tbl
yes <- sqlQuery(pg, "SELECT key1, key2 FROM tabl", as.is = TRUE)
for (i in seq(along = present$key1)) {
sqlUpdate(pg, tmp[tmp$key1 == yes$key1[i] & tmp$key2 == yes$key2[i], ],
"tbl", index = c("key1", "key2"))
tmp <- tmp[!(tmp$key1 == yes$key1[i] & tmp$key2 == yes$key2[i]), ]
}
sqlSave(pg, tmp, "tbl", append = TRUE, rownames = FALSE)
This is fine for small tables, where the need for updates is frequent, and there
is no risk of anyone else doing the same thing at the same time. If the table is
big and updates are rare, it seems like quite an overhead for what would
essential be inserts. Does anyone have a more rational way of doing this with
big data sets where updates are rare, e.g. only do it if sqlSave fails?
Is it possible to put a lock on the database while doing the updates and
inserts to avoid problems with concurrency?
I'm working with PostgreSQL, but the example should be generic.
Thanks in advance
Mikkel