Suraaga Kulkarni
2008-Mar-24 17:40 UTC
[Rd] resampling from string when it runs across multiple lines
Hi, I need to resample from a long string, which is written in many lines with carriage-return marks at the end of each line. Perhaps because the data looks like a matrix, using the code: sample(data, 25, replace=T) gives me 25 columns of characters from the data because it is resampling whole columns. What I would like it to do is to treat the data as a vector that has just been spread across many lines, and pick single characters from random positions in randomly chosen lines. I am reproducing a sample dataset, the command and the output here:> yX..1. X..2. X..3. X..4. X..5. X..6. X..7. X..8. X..9. X..10. [1,] A C G T T G C A G C [2,] A C G F F F F F F G [3,] A C G S S S S S G A [4,] A C G T T G C A G G [5,] A B B B B B B A G T> sample(y, 20, replace=T)X..9. X..4. X..2. X..7. X..9..1 X..3. X..3..1 X..9..2 X..9..3 X..4..1 X..3..2 X..8. X..9..4 X..3..3 X..6. X..7..1 [1,] G T C C G G G G G T G A G G G C [2,] F F C F F G G F F F G F F G F F [3,] G S C S G G G G G S G S G G S S [4,] G T C C G G G G G T G A G G G C [5,] G B B B G B B G G B B A G B B B X..6..1 X..3..4 X..7..2 X..10. [1,] G G C C [2,] F G F G [3,] S G S A [4,] G G C G [5,] B B B T I wanted to try the bootstrap approach (since that's what I am doing - resampling with replacement) but that requires a "statistic" and I don't know what sense that makes for character data. Any help will be greatly appreciated. Thanks, S. [[alternative HTML version deleted]]
Dimitris Rizopoulos
2008-Mar-24 18:10 UTC
[Rd] resampling from string when it runs across multiple lines
try this: y <- as.matrix(read.table(textConnection( "A C G T T G C A G C A C G F F F F F F G A C G S S S S S G A A C G T T G C A G G A B B B B B B A G T" ), stringsAsFactors = FALSE)) ind <- sample(length(y), 20, TRUE) y[ind] I hope it helps. Best, Dimitris ps, it would be best that you send that kind of e-mails in R-help not R-devel; check http://www.r-project.org/mail.html for more info regarding the different R-mailing-lists. ---- Dimitris Rizopoulos Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm Quoting Suraaga Kulkarni <suraaga.kulkarni at gmail.com>:> Hi, > > I need to resample from a long string, which is written in many lines with > carriage-return marks at the end of each line. Perhaps because the data > looks like a matrix, using the code: sample(data, 25, replace=T) gives me 25 > columns of characters from the data because it is resampling whole columns. > What I would like it to do is to treat the data as a vector that has just > been spread across many lines, and pick single characters from random > positions in randomly chosen lines. > > I am reproducing a sample dataset, the command and the output here: > >> y > X..1. X..2. X..3. X..4. X..5. X..6. X..7. X..8. X..9. X..10. > [1,] A C G T T G C A G C > [2,] A C G F F F F F F G > [3,] A C G S S S S S G A > [4,] A C G T T G C A G G > [5,] A B B B B B B A G T > >> sample(y, 20, replace=T) > X..9. X..4. X..2. X..7. X..9..1 X..3. X..3..1 X..9..2 X..9..3 X..4..1 > X..3..2 X..8. X..9..4 X..3..3 X..6. X..7..1 > [1,] G T C C G G G G G > T G A G G G C > [2,] F F C F F G G F F > F G F F G F F > [3,] G S C S G G G G G > S G S G G S S > [4,] G T C C G G G G G > T G A G G G C > [5,] G B B B G B B G G > B B A G B B B > > X..6..1 X..3..4 X..7..2 X..10. > [1,] G G C C > [2,] F G F G > [3,] S G S A > [4,] G G C G > [5,] B B B T > > I wanted to try the bootstrap approach (since that's what I am doing - > resampling with replacement) but that requires a "statistic" and I don't > know what sense that makes for character data. > > Any help will be greatly appreciated. > > Thanks, > > S. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm