>Hi there,
>
>I am looking for R-packages that can help me visualize properties on
>nucleotide sequences. I want to display sequences in the 1-100K base range
>as lines and plot features above and below those lines.
>
>Any ideas would be welcome.
>
>Thanks,
>
>Bernd
Hi Bernd,
not sure to understand what you mean by "properties". Let's say
the
property you are interested in is the GC-skew. You can try something
along these lines:
#
# Load DNA sequence data:
#
library(seqinr)
myseq <- read.fasta(as.string = TRUE)
#
# Define a function to compute the GC skew:
#
gcskew <- function(x) {
if (!is.character(x) || length(x) > 1)
stop("single string expected")
tmp <- tolower(s2c(x))
nC <- sum(tmp == "c")
nG <- sum(tmp == "g")
if (nC + nG == 0) return(NA)
return(100 * (nC - nG)/(nC + nG))
}
#
# Moving window along the sequence:
#
step <- 10000
wsize <- 10000
starts <- seq(from = 1, to = nchar(myseq), by = step)
starts <- starts[-length(starts)]
n <- length(starts)
result <- numeric(n)
for (i in seq_len(n)) {
result[i] <- gcskew(substr(myseq, starts[i], starts[i] + wsize - 1))
}
#
# Plot the result:
#
xx <- starts/1000 # in Kbp
yy <- result
n <- length(result)
hline <- 0
plot(yy ~ xx, type = "n", axes = FALSE, ann = FALSE)
polygon(c(xx[1], xx, xx[n]), c(min(yy), yy, min(yy)),
col = "black", border = NA)
usr <- par("usr")
rect(usr[1], usr[3], usr[2], hline, col = "white", border = NA)
lines(xx, yy)
abline(h = hline)
box()
axis(1, at = seq(0, 1000, by = 200))
axis(2, las = 1)
title(xlab = "position (Kbp)", ylab = "(C-G)/(C+G)
[percent]",
main = expression(paste("GC skew in ",
italic(Chlamydia~trachomatis))))
arrows(800, 5.5, 720, 0.5, length = 0.1, lwd = 2)
text(800, 5.5, "origin of\nreplication", pos = 4)
HTH,
Jean
--
Jean R. Lobry (lobry at biomserv.univ-lyon1.fr)
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I,
43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE
allo : +33 472 43 27 56 fax : +33 472 43 13 88
http://pbil.univ-lyon1.fr/members/lobry/