I've been looking for how to change a certain percentage of values in a data frame, but I've been struggling to find information in R. For example: #################example data##############> dataV1 V2 V3 V4 V5 V6 V7 1 chr1 500 500 CHH 0 0.5 + 2 chr1 550 550 CHH 0 0.0 + 3 chr2 700 700 CHH 0 0.0 + 4 chr2 1000 1000 CHH 0 0.0 + 5 chr3 100 100 CHH 0 0.0 + 6 chr4 450 450 CG 0 0.0 + 7 chr5 450 450 CHH 0 0.0 + 8 chr5 50034 50034 CHG 0 0.0 + 9 chr7 50055 50055 CHG 0 0.0 + 10 chr10 50063 50063 CHH 0 0.0 +> dput(data)structure(list(V1 = structure(c(1L, 1L, 3L, 3L, 4L, 5L, 6L, 6L, 7L, 2L), .Label = c("chr1", "chr10", "chr2", "chr3", "chr4", "chr5", "chr7"), class = "factor"), V2 = c(500L, 550L, 700L, 1000L, 100L, 450L, 450L, 50034L, 50055L, 50063L), V3 = c(500L, 550L, 700L, 1000L, 100L, 450L, 450L, 50034L, 50055L, 50063L), V4 = structure(c(3L, 3L, 3L, 3L, 3L, 1L, 3L, 2L, 2L, 3L), .Label c("CG", "CHG", "CHH"), class = "factor"), V5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V6 = c(0.5, 0, 0, 0, 0, 0, 0, 0, 0, 0), V7 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "+", class = "factor")), .Names = c("V1", "V2", "V3", "V4", "V5", "V6", "V7"), class = "data.frame", row.names = c(NA, -10L))>############################ Say for instance, I'd like to change 20% of values in column 6 to '1' instead of zero or whatever value may be currently present. How would I approach this? I am working with a large data frame and I need to replace values in one of the columns for 10-20% of the entire dataset. I hope what I am trying to convey is understandable to you. -- View this message in context: http://r.789695.n4.nabble.com/replacing-percentage-of-values-in-data-frame-tp3920484p3920484.html Sent from the R help mailing list archive at Nabble.com.
Henrique Dallazuanna
2011-Oct-19 23:44 UTC
[R] replacing percentage of values in data frame
Try this: data$V6[sample(nrow(data), ceiling(length(data$V6) * 0.2))] <- 1 On Wed, Oct 19, 2011 at 9:38 PM, a217 <ajn21 at case.edu> wrote:> I've been looking for how to change a certain percentage of values in a data > frame, but I've been struggling to find information in R. > > For example: > > #################example data############## >> data > ? ? ?V1 ? ?V2 ? ?V3 ?V4 V5 ?V6 V7 > 1 ? chr1 ? 500 ? 500 CHH ?0 0.5 ?+ > 2 ? chr1 ? 550 ? 550 CHH ?0 0.0 ?+ > 3 ? chr2 ? 700 ? 700 CHH ?0 0.0 ?+ > 4 ? chr2 ?1000 ?1000 CHH ?0 0.0 ?+ > 5 ? chr3 ? 100 ? 100 CHH ?0 0.0 ?+ > 6 ? chr4 ? 450 ? 450 ?CG ?0 0.0 ?+ > 7 ? chr5 ? 450 ? 450 CHH ?0 0.0 ?+ > 8 ? chr5 50034 50034 CHG ?0 0.0 ?+ > 9 ? chr7 50055 50055 CHG ?0 0.0 ?+ > 10 chr10 50063 50063 CHH ?0 0.0 ?+ > >> dput(data) > structure(list(V1 = structure(c(1L, 1L, 3L, 3L, 4L, 5L, 6L, 6L, > 7L, 2L), .Label = c("chr1", "chr10", "chr2", "chr3", "chr4", > "chr5", "chr7"), class = "factor"), V2 = c(500L, 550L, 700L, > 1000L, 100L, 450L, 450L, 50034L, 50055L, 50063L), V3 = c(500L, > 550L, 700L, 1000L, 100L, 450L, 450L, 50034L, 50055L, 50063L), > ? ?V4 = structure(c(3L, 3L, 3L, 3L, 3L, 1L, 3L, 2L, 2L, 3L), .Label > c("CG", > ? ?"CHG", "CHH"), class = "factor"), V5 = c(0L, 0L, 0L, 0L, > ? ?0L, 0L, 0L, 0L, 0L, 0L), V6 = c(0.5, 0, 0, 0, 0, 0, 0, 0, > ? ?0, 0), V7 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, > ? ?1L), .Label = "+", class = "factor")), .Names = c("V1", "V2", > "V3", "V4", "V5", "V6", "V7"), class = "data.frame", row.names = c(NA, > -10L)) >> > ############################ > > Say for instance, I'd like to change 20% of values in column 6 to '1' > instead of zero or whatever value may be currently present. How would I > approach this? > > I am working with a large data frame and I need to replace values in one of > the columns for 10-20% of the entire dataset. I hope what I am trying to > convey is understandable to you. > > -- > View this message in context: http://r.789695.n4.nabble.com/replacing-percentage-of-values-in-data-frame-tp3920484p3920484.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O