Dear all, I would like to ask whether anyone has experience with the problem below. I want to select a subset of the sample (see data below) so that each level (1,2,3,4 in the example) for every variable (v1,v2,v3,v4 in the example) is shown at least once in the subset. I also want the sample size of the subset to be as small as possible. Any help on it is greatly appreciated. Id v1 v2 v3 v4 [1,] 1 1 2 4 3 [2,] 2 2 1 3 4 [3,] 3 4 2 4 2 [4,] 4 1 1 2 3 [5,] 5 3 2 3 4 [6,] 6 3 1 1 1 [7,] 7 3 4 3 1 [8,] 8 4 4 4 4 [9,] 9 1 2 2 1 [10,] 10 4 1 1 2 [11,] 11 2 4 3 2 [12,] 12 1 4 2 3 [13,] 13 2 3 3 4 [14,] 14 4 3 1 2 [15,] 15 3 2 1 2 [16,] 16 2 3 2 3 [17,] 17 1 4 1 4 [18,] 18 2 3 4 3 [19,] 19 4 1 4 1 [20,] 20 3 3 2 1 Thanks, Peter [[alternative HTML version deleted]]
I think there are multiple solutions that match your criteria. Here is one: dat <- structure(list(Id = 1:20, v1 = c(1L, 2L, 4L, 1L, 3L, 3L, 3L, + 4L, 1L, 4L, 2L, 1L, 2L, 4L, 3L, 2L, 1L, 2L, 4L, 3L), v2 = c(2L, + 1L, 2L, 1L, 2L, 1L, 4L, 4L, 2L, 1L, 4L, 4L, 3L, 3L, 2L, 3L, 4L, + 3L, 1L, 3L), v3 = c(4L, 3L, 4L, 2L, 3L, 1L, 3L, 4L, 2L, 1L, 3L, + 2L, 3L, 1L, 1L, 2L, 1L, 4L, 4L, 2L), v4 = c(3L, 4L, 2L, 3L, 4L, + 1L, 1L, 4L, 1L, 2L, NA, 3L, 4L, NA, 2L, 3L, 4L, 3L, 1L, 1L)), .Names = c("Id", + "v1", "v2", "v3", "v4"), class = "data.frame", row.names = c(NA, + -20L))> keep <- rowSums(apply(dat[,-1], 2, function(x) !duplicated(x))) > dat.sub <- dat[keep > 0 ,]Best, Ista On Sun, Jan 23, 2011 at 12:43 PM, Wei Yang <peterwyang1 at gmail.com> wrote:> Dear all, > > I would like to ask whether anyone has experience with the problem below. > > > I want to select a subset of the sample (see data below) so that each level > (1,2,3,4 in the example) for every variable (v1,v2,v3,v4 in the example) is > shown at least once in the subset. ?I also want the sample size of the > subset to be as small as possible. ?Any help on it is greatly appreciated. > > > ? ?Id v1 v2 v3 v4 > > [1,] ?1 1 2 4 3 > > ?[2,] ?2 2 1 3 4 > > ?[3,] ?3 4 2 4 2 > > ?[4,] ?4 1 1 2 3 > > ?[5,] ?5 3 2 3 4 > > ?[6,] ?6 3 1 1 1 > > ?[7,] ?7 3 4 3 1 > > ?[8,] ?8 4 4 4 4 > > ?[9,] ?9 1 2 2 1 > > [10,] 10 4 1 1 2 > > [11,] 11 2 4 3 2 > > [12,] 12 1 4 2 3 > > [13,] 13 2 3 3 4 > > [14,] 14 4 3 1 2 > > [15,] 15 3 2 1 2 > > [16,] 16 2 3 2 3 > > [17,] 17 1 4 1 4 > > [18,] 18 2 3 4 3 > > [19,] 19 4 1 4 1 > > [20,] 20 3 3 2 1 > > > > Thanks, > > Peter > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
Maybe that: su <- lapply(dat[2:5],function(x)table(x)) su mode(su) myBYdata <- data.frame( do.call(cbind,lapply(su, as.data.frame)) ) myBYdata ? ???, 23/01/2011 ? 07:43 -0500, Wei Yang ????:> Dear all, > > I would like to ask whether anyone has experience with the problem below. > > > I want to select a subset of the sample (see data below) so that each level > (1,2,3,4 in the example) for every variable (v1,v2,v3,v4 in the example) is > shown at least once in the subset. I also want the sample size of the > subset to be as small as possible. Any help on it is greatly appreciated. > > > Id v1 v2 v3 v4 > > [1,] 1 1 2 4 3 > > [2,] 2 2 1 3 4 > > [3,] 3 4 2 4 2 > > [4,] 4 1 1 2 3 > > [5,] 5 3 2 3 4 > > [6,] 6 3 1 1 1 > > [7,] 7 3 4 3 1 > > [8,] 8 4 4 4 4 > > [9,] 9 1 2 2 1 > > [10,] 10 4 1 1 2 > > [11,] 11 2 4 3 2 > > [12,] 12 1 4 2 3 > > [13,] 13 2 3 3 4 > > [14,] 14 4 3 1 2 > > [15,] 15 3 2 1 2 > > [16,] 16 2 3 2 3 > > [17,] 17 1 4 1 4 > > [18,] 18 2 3 4 3 > > [19,] 19 4 1 4 1 > > [20,] 20 3 3 2 1 > > > > Thanks, > > Peter > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Possibly Parallel Threads
- Subsetting problem data, 2
- Data separated by spaces, getting data into R using field lengths
- Confirmatory factor analysis using the sem package. TLI CFI and RMSEA absent from model summary.
- how to search to value to another table
- How to Group Categorical data in R?