Bansal, Vikas
2011-Jul-24 21:32 UTC
[R] Deleting rows and store the deleted rows in new data frame
Dear all, I am using grep but I did not understand the problem as I am doing something wrong.Please help me. I am using this code- sf=data.frame(sapply(df[],function(x) grep('\\.&\\,', df[,9]))) the thing is i have a data frame(df) like this- 10 135349467 g G 4 0 0 5 ,,,., 10 135349468 t T 2 0 0 5 ,,c., 10 135349469 g G 7 0 0 5 ,,a., 10 135349470 c C 8 0 0 5 ,,,., 10 135349471 a A 10 0 0 5 ,,,., 10 135349472 g G 7 0 0 6 aa,.,, 10 135349473 g G 7 0 0 6 ,,c.,, 10 135349474 g G 4 0 0 6 ,,,.,, 10 135349475 a A 8 0 0 6 ,,,.,, 10 135349476 t T 1 0 0 6 g,,.,, 10 135349477 a A 7 0 0 6 ,,,.,, 10 135349478 a A 11 0 0 6 ,,,.,, I want to delete those rows which contains only . and , in column 9. and i want to store those rows in new data frame sf. so my output should be- df 10 135349468 t T 2 0 0 5 ,,c., 10 135349469 g G 7 0 0 5 ,,a., 10 135349472 g G 7 0 0 6 aa,.,, 10 135349473 g G 7 0 0 6 ,,c.,, 10 135349476 t T 1 0 0 6 g,,.,, sf 10 135349467 g G 4 0 0 5 ,,,., 10 135349470 c C 8 0 0 5 ,,,., 10 135349471 a A 10 0 0 5 ,,,., 10 135349474 g G 4 0 0 6 ,,,.,, 10 135349475 a A 8 0 0 6 ,,,.,, 10 135349477 a A 7 0 0 6 ,,,.,, 10 135349478 a A 11 0 0 6 ,,,.,, Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London
Steven Kennedy
2011-Jul-24 22:28 UTC
[R] Deleting rows and store the deleted rows in new data frame
Hi Vikas, The following works (I'm not very good with sapply, but the for loop is ok if your data set is not huge).> df <- read.table("test.txt",stringsAsFactors=FALSE) > dfV1 V2 V3 V4 V5 V6 V7 V8 V9 1 10 135349467 g G 4 0 0 5 ,,,., 2 10 135349468 t T 2 0 0 5 ,,c., 3 10 135349469 g G 7 0 0 5 ,,a., 4 10 135349470 c C 8 0 0 5 ,,,., 5 10 135349471 a A 10 0 0 5 ,,,., 6 10 135349472 g G 7 0 0 6 aa,.,, 7 10 135349473 g G 7 0 0 6 ,,c.,, 8 10 135349474 g G 4 0 0 6 ,,,.,, 9 10 135349475 a A 8 0 0 6 ,,,.,, 10 10 135349476 t T 1 0 0 6 g,,.,, 11 10 135349477 a A 7 0 0 6 ,,,.,, 12 10 135349478 a A 11 0 0 6 ,,,.,,> > df.rows <- c() > counter <- 0 > > for (i in 1:dim(df)[1]){+ if(grepl('[[:alpha:]]',df[i,9])){ + counter <- counter + 1 + df.rows[counter] <- i + } + }> sf <- df[-df.rows,] > sfV1 V2 V3 V4 V5 V6 V7 V8 V9 1 10 135349467 g G 4 0 0 5 ,,,., 4 10 135349470 c C 8 0 0 5 ,,,., 5 10 135349471 a A 10 0 0 5 ,,,., 8 10 135349474 g G 4 0 0 6 ,,,.,, 9 10 135349475 a A 8 0 0 6 ,,,.,, 11 10 135349477 a A 7 0 0 6 ,,,.,, 12 10 135349478 a A 11 0 0 6 ,,,.,,> df <- df[df.rows,] > dfV1 V2 V3 V4 V5 V6 V7 V8 V9 2 10 135349468 t T 2 0 0 5 ,,c., 3 10 135349469 g G 7 0 0 5 ,,a., 6 10 135349472 g G 7 0 0 6 aa,.,, 7 10 135349473 g G 7 0 0 6 ,,c.,, 10 10 135349476 t T 1 0 0 6 g,,.,, Steve On Mon, Jul 25, 2011 at 7:32 AM, Bansal, Vikas <vikas.bansal at kcl.ac.uk> wrote:> Dear all, > > I am using grep but I did not understand the problem as I am doing something wrong.Please help me. > I am using this code- > > sf=data.frame(sapply(df[],function(x) grep('\\.&\\,', df[,9]))) > > the thing is i have a data frame(df) like this- > > > 10 ? ? ?135349467 ? ? ? g ? ? ? G ? ? ? 4 ? ? ? 0 ? ? ? 0 ? ? ? 5 ? ? ? ,,,., > 10 ? ? ?135349468 ? ? ? t ? ? ? T ? ? ? 2 ? ? ? 0 ? ? ? 0 ? ? ? 5 ? ? ? ,,c., > 10 ? ? ?135349469 ? ? ? g ? ? ? G ? ? ? 7 ? ? ? 0 ? ? ? 0 ? ? ? 5 ? ? ? ,,a., > 10 ? ? ?135349470 ? ? ? c ? ? ? C ? ? ? 8 ? ? ? 0 ? ? ? 0 ? ? ? 5 ? ? ? ,,,., > 10 ? ? ?135349471 ? ? ? a ? ? ? A ? ? ? 10 ? ? ?0 ? ? ? 0 ? ? ? 5 ? ? ? ,,,., > 10 ? ? ?135349472 ? ? ? g ? ? ? G ? ? ? 7 ? ? ? 0 ? ? ? 0 ? ? ? 6 ? ? ? aa,.,, > 10 ? ? ?135349473 ? ? ? g ? ? ? G ? ? ? 7 ? ? ? 0 ? ? ? 0 ? ? ? 6 ? ? ? ,,c.,, > 10 ? ? ?135349474 ? ? ? g ? ? ? G ? ? ? 4 ? ? ? 0 ? ? ? 0 ? ? ? 6 ? ? ? ,,,.,, > 10 ? ? ?135349475 ? ? ? a ? ? ? A ? ? ? 8 ? ? ? 0 ? ? ? 0 ? ? ? 6 ? ? ? ,,,.,, > 10 ? ? ?135349476 ? ? ? t ? ? ? T ? ? ? 1 ? ? ? 0 ? ? ? 0 ? ? ? 6 ? ? ? g,,.,, > 10 ? ? ?135349477 ? ? ? a ? ? ? A ? ? ? 7 ? ? ? 0 ? ? ? 0 ? ? ? 6 ? ? ? ,,,.,, > 10 ? ? ?135349478 ? ? ? a ? ? ? A ? ? ? 11 ? ? ?0 ? ? ? 0 ? ? ? 6 ? ? ? ,,,.,, > > I want to delete those rows which contains only . and , in column 9. > and i want to store those rows in new data frame sf. > > so my output should be- > > df > > > 10 ? ? ?135349468 ? ? ? t ? ? ? T ? ? ? 2 ? ? ? 0 ? ? ? 0 ? ? ? 5 ? ? ? ,,c., > 10 ? ? ?135349469 ? ? ? g ? ? ? G ? ? ? 7 ? ? ? 0 ? ? ? 0 ? ? ? 5 ? ? ? ,,a., > 10 ? ? ?135349472 ? ? ? g ? ? ? G ? ? ? 7 ? ? ? 0 ? ? ? 0 ? ? ? 6 ? ? ? aa,.,, > 10 ? ? ?135349473 ? ? ? g ? ? ? G ? ? ? 7 ? ? ? 0 ? ? ? 0 ? ? ? 6 ? ? ? ,,c.,, > 10 ? ? ?135349476 ? ? ? t ? ? ? T ? ? ? 1 ? ? ? 0 ? ? ? 0 ? ? ? 6 ? ? ? g,,.,, > > > sf > > 10 ? ? ?135349467 ? ? ? g ? ? ? G ? ? ? 4 ? ? ? 0 ? ? ? 0 ? ? ? 5 ? ? ? ,,,., > 10 ? ? ?135349470 ? ? ? c ? ? ? C ? ? ? 8 ? ? ? 0 ? ? ? 0 ? ? ? 5 ? ? ? ,,,., > 10 ? ? ?135349471 ? ? ? a ? ? ? A ? ? ? 10 ? ? ?0 ? ? ? 0 ? ? ? 5 ? ? ? ,,,., > 10 ? ? ?135349474 ? ? ? g ? ? ? G ? ? ? 4 ? ? ? 0 ? ? ? 0 ? ? ? 6 ? ? ? ,,,.,, > 10 ? ? ?135349475 ? ? ? a ? ? ? A ? ? ? 8 ? ? ? 0 ? ? ? 0 ? ? ? 6 ? ? ? ,,,.,, > 10 ? ? ?135349477 ? ? ? a ? ? ? A ? ? ? 7 ? ? ? 0 ? ? ? 0 ? ? ? 6 ? ? ? ,,,.,, > 10 ? ? ?135349478 ? ? ? a ? ? ? A ? ? ? 11 ? ? ?0 ? ? ? 0 ? ? ? 6 ? ? ? ,,,.,, > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Phil Spector
2011-Jul-24 23:29 UTC
[R] Deleting rows and store the deleted rows in new data frame
There's no need to use sapply or loops with grep -- it's already vectorized. So you can find the rows you're interested in with> wh = grep('^[.,]+$',df[,9])store them with> sf = df[wh,]and delete them with> df = df[-wh,]- Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Sun, 24 Jul 2011, Bansal, Vikas wrote:> Dear all, > > I am using grep but I did not understand the problem as I am doing something wrong.Please help me. > I am using this code- > > sf=data.frame(sapply(df[],function(x) grep('\\.&\\,', df[,9]))) > > the thing is i have a data frame(df) like this- > > > 10 135349467 g G 4 0 0 5 ,,,., > 10 135349468 t T 2 0 0 5 ,,c., > 10 135349469 g G 7 0 0 5 ,,a., > 10 135349470 c C 8 0 0 5 ,,,., > 10 135349471 a A 10 0 0 5 ,,,., > 10 135349472 g G 7 0 0 6 aa,.,, > 10 135349473 g G 7 0 0 6 ,,c.,, > 10 135349474 g G 4 0 0 6 ,,,.,, > 10 135349475 a A 8 0 0 6 ,,,.,, > 10 135349476 t T 1 0 0 6 g,,.,, > 10 135349477 a A 7 0 0 6 ,,,.,, > 10 135349478 a A 11 0 0 6 ,,,.,, > > I want to delete those rows which contains only . and , in column 9. > and i want to store those rows in new data frame sf. > > so my output should be- > > df > > > 10 135349468 t T 2 0 0 5 ,,c., > 10 135349469 g G 7 0 0 5 ,,a., > 10 135349472 g G 7 0 0 6 aa,.,, > 10 135349473 g G 7 0 0 6 ,,c.,, > 10 135349476 t T 1 0 0 6 g,,.,, > > > sf > > 10 135349467 g G 4 0 0 5 ,,,., > 10 135349470 c C 8 0 0 5 ,,,., > 10 135349471 a A 10 0 0 5 ,,,., > 10 135349474 g G 4 0 0 6 ,,,.,, > 10 135349475 a A 8 0 0 6 ,,,.,, > 10 135349477 a A 7 0 0 6 ,,,.,, > 10 135349478 a A 11 0 0 6 ,,,.,, > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >