Hi I have tried many attempts but cant get the loop right, as I am not a strong programmer. What I am basically trying to do is compare 2 spreadsheets. The problem is that one of them only contain a portion of the overall data (TESTSAMP), where the other has a full datasetFULLSAMP. From the complete set I would like to remove the rows of data which are not in the TESTSAMP. Column 1 contains the sample numbers which can be used to identify samples. Does anyone have any suggestions? I have tried various things like double loops and so on, but I am sure there is an easier way or function to do this. i tried this method, but Im not sure how to only keep looping until a match is found. I dont understand how repeat loops work in R. for (i in 1:length(FULLSAMP[,1])) { if (FULLSAMP[i,1] != TESTSAMP[i,1]) { FULLSAMP <- FULLSAMP[-i,] } Thanks in advance
You don't need a loop for this, I think. Since you don't provide an example it's hard to know how your data are set up, but look at this:> FULLSAMP <- data.frame(A = 1:10, B=letters[1:10]) > TESTSAMP <- data.frame(A = c(2,4,5,8), C=1:4) > FULLSAMPA B 1 1 a 2 2 b 3 3 c 4 4 d 5 5 e 6 6 f 7 7 g 8 8 h 9 9 i 10 10 j> TESTSAMPA C 1 2 1 2 4 2 3 5 3 4 8 4> FULLSAMP[FULLSAMP$A %in% TESTSAMP$A,]A B 2 2 b 4 4 d 5 5 e 8 8 h Sarah On Thu, May 13, 2010 at 10:49 AM, Amit Patel <amitrhelp at yahoo.co.uk> wrote:> Hi > > I have tried many attempts but cant get the loop right, as I am not a strong programmer. What I am basically trying to do is compare 2 spreadsheets. The problem is that one of them only contain a portion of the overall data (TESTSAMP), where the other has a full datasetFULLSAMP. From the complete set I would like to remove the rows of data which are not in the TESTSAMP. Column 1 contains the sample numbers which can be used to identify samples. Does anyone have any suggestions? > > I have tried various things like double loops and so on, but I am sure there is an easier way or function to do this. > > i tried this method, but Im not sure how to only keep looping until a match is found. I dont understand how repeat loops work in R. > > for (i in 1:length(FULLSAMP[,1])) { > > if (FULLSAMP[i,1] != TESTSAMP[i,1]) { > FULLSAMP <- FULLSAMP[-i,] > } > > > Thanks in advance > > >-- Sarah Goslee http://www.functionaldiversity.org
Hi, On Thu, May 13, 2010 at 10:49 AM, Amit Patel <amitrhelp at yahoo.co.uk> wrote:> Hi > > I have tried many attempts but cant get the loop right, as I am not a strong programmer. What I am basically trying to do is compare 2 spreadsheets. The problem is that one of them only contain a portion of the overall data (TESTSAMP), where the other has a full datasetFULLSAMP. From the complete set I would like to remove the rows of data which are not in the TESTSAMP. Column 1 contains the sample numbers which can be used to identify samples. Does anyone have any suggestions? > > I have tried various things like double loops and so on, but I am sure there is an easier way or function to do this. > > i tried this method, but Im not sure how to only keep looping until a match is found. I dont understand how repeat loops work in R. > > for (i in 1:length(FULLSAMP[,1])) { > > if (FULLSAMP[i,1] != TESTSAMP[i,1]) { > FULLSAMP <- FULLSAMP[-i,] > }You want to not use for loops as much as possible. Imagine your samples are identified as letters, so FULLSAMP[,1] will be letters A..Z, and TESTSAMP[,1] will be some random 15 letters. Now the job is to match the rows in TESTAMP to the rows in FULLSAMP, and remove any "extra" rows in FULLSAMP that don' appear in testamp. ## Making some data R> fullsamp <- data.frame(id=LETTERS, something=sample(1:100, length(letters)), stringsAsFactors=FALSE) R> testsamp <- data.frame(id=sample(LETTERS, 15), something=sample(1:100, 15), stringsAsFactors=FALSE) ## Let's find where the "testamp" rows appear in "fullsamp" R> xref <- match(testsamp[,1], fullsamp[,1]) ## Now reduce fullsamp to have only the data corresponding to testsamp ## (and in the same order R> fullsamp.sub <- fullsamp[xref,] Notice that fullsamp.sub now has only rows with IDs appearing in testsamp and they are also in the same order as testsamp. Now go ahead and read the help you'll find in ?match -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
On May 13, 2010, at 10:49 AM, Amit Patel wrote:> Hi > > I have tried many attempts but cant get the loop right, as I am not > a strong programmer. What I am basically trying to do is compare 2 > spreadsheets. The problem is that one of them only contain a portion > of the overall data (TESTSAMP), where the other has a full > datasetFULLSAMP. From the complete set I would like to remove the > rows of data which are not in the TESTSAMP. Column 1 contains the > sample numbers which can be used to identify samples. Does anyone > have any suggestions? > > I have tried various things like double loops and so on, but I am > sure there is an easier way or function to do this. > > i tried this method, but Im not sure how to only keep looping until > a match is found. I dont understand how repeat loops work in R. > > for (i in 1:length(FULLSAMP[,1])) { > > if (FULLSAMP[i,1] != TESTSAMP[i,1]) { > FULLSAMP <- FULLSAMP[-i,] > } >Abandon the loop. Use merge. ... or the %in% function. -- David Winsemius, MD West Hartford, CT