Dear all,
I would like to ask whether anyone has experience with the problem below.
I want to select a subset of the sample (see data below) so that each level
(1,2,3,4 in the example) for every variable (v1,v2,v3,v4 in the example) is
shown at least once in the subset. I also want the sample size of the
subset to be as small as possible. Any help on it is greatly appreciated.
Id v1 v2 v3 v4
[1,] 1 1 2 4 3
[2,] 2 2 1 3 4
[3,] 3 4 2 4 2
[4,] 4 1 1 2 3
[5,] 5 3 2 3 4
[6,] 6 3 1 1 1
[7,] 7 3 4 3 1
[8,] 8 4 4 4 4
[9,] 9 1 2 2 1
[10,] 10 4 1 1 2
[11,] 11 2 4 3 2
[12,] 12 1 4 2 3
[13,] 13 2 3 3 4
[14,] 14 4 3 1 2
[15,] 15 3 2 1 2
[16,] 16 2 3 2 3
[17,] 17 1 4 1 4
[18,] 18 2 3 4 3
[19,] 19 4 1 4 1
[20,] 20 3 3 2 1
Thanks,
Peter
[[alternative HTML version deleted]]
I think there are multiple solutions that match your criteria. Here is one:
dat <- structure(list(Id = 1:20, v1 = c(1L, 2L, 4L, 1L, 3L, 3L, 3L,
+ 4L, 1L, 4L, 2L, 1L, 2L, 4L, 3L, 2L, 1L, 2L, 4L, 3L), v2 = c(2L,
+ 1L, 2L, 1L, 2L, 1L, 4L, 4L, 2L, 1L, 4L, 4L, 3L, 3L, 2L, 3L, 4L,
+ 3L, 1L, 3L), v3 = c(4L, 3L, 4L, 2L, 3L, 1L, 3L, 4L, 2L, 1L, 3L,
+ 2L, 3L, 1L, 1L, 2L, 1L, 4L, 4L, 2L), v4 = c(3L, 4L, 2L, 3L, 4L,
+ 1L, 1L, 4L, 1L, 2L, NA, 3L, 4L, NA, 2L, 3L, 4L, 3L, 1L, 1L)), .Names
= c("Id",
+ "v1", "v2", "v3", "v4"), class =
"data.frame", row.names = c(NA,
+ -20L))> keep <- rowSums(apply(dat[,-1], 2, function(x) !duplicated(x)))
> dat.sub <- dat[keep > 0 ,]
Best,
Ista
On Sun, Jan 23, 2011 at 12:43 PM, Wei Yang <peterwyang1 at gmail.com>
wrote:> Dear all,
>
> I would like to ask whether anyone has experience with the problem below.
>
>
> I want to select a subset of the sample (see data below) so that each level
> (1,2,3,4 in the example) for every variable (v1,v2,v3,v4 in the example) is
> shown at least once in the subset. ?I also want the sample size of the
> subset to be as small as possible. ?Any help on it is greatly appreciated.
>
>
> ? ?Id v1 v2 v3 v4
>
> [1,] ?1 1 2 4 3
>
> ?[2,] ?2 2 1 3 4
>
> ?[3,] ?3 4 2 4 2
>
> ?[4,] ?4 1 1 2 3
>
> ?[5,] ?5 3 2 3 4
>
> ?[6,] ?6 3 1 1 1
>
> ?[7,] ?7 3 4 3 1
>
> ?[8,] ?8 4 4 4 4
>
> ?[9,] ?9 1 2 2 1
>
> [10,] 10 4 1 1 2
>
> [11,] 11 2 4 3 2
>
> [12,] 12 1 4 2 3
>
> [13,] 13 2 3 3 4
>
> [14,] 14 4 3 1 2
>
> [15,] 15 3 2 1 2
>
> [16,] 16 2 3 2 3
>
> [17,] 17 1 4 1 4
>
> [18,] 18 2 3 4 3
>
> [19,] 19 4 1 4 1
>
> [20,] 20 3 3 2 1
>
>
>
> Thanks,
>
> Peter
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org
Maybe that: su <- lapply(dat[2:5],function(x)table(x)) su mode(su) myBYdata <- data.frame( do.call(cbind,lapply(su, as.data.frame)) ) myBYdata ? ???, 23/01/2011 ? 07:43 -0500, Wei Yang ????:> Dear all, > > I would like to ask whether anyone has experience with the problem below. > > > I want to select a subset of the sample (see data below) so that each level > (1,2,3,4 in the example) for every variable (v1,v2,v3,v4 in the example) is > shown at least once in the subset. I also want the sample size of the > subset to be as small as possible. Any help on it is greatly appreciated. > > > Id v1 v2 v3 v4 > > [1,] 1 1 2 4 3 > > [2,] 2 2 1 3 4 > > [3,] 3 4 2 4 2 > > [4,] 4 1 1 2 3 > > [5,] 5 3 2 3 4 > > [6,] 6 3 1 1 1 > > [7,] 7 3 4 3 1 > > [8,] 8 4 4 4 4 > > [9,] 9 1 2 2 1 > > [10,] 10 4 1 1 2 > > [11,] 11 2 4 3 2 > > [12,] 12 1 4 2 3 > > [13,] 13 2 3 3 4 > > [14,] 14 4 3 1 2 > > [15,] 15 3 2 1 2 > > [16,] 16 2 3 2 3 > > [17,] 17 1 4 1 4 > > [18,] 18 2 3 4 3 > > [19,] 19 4 1 4 1 > > [20,] 20 3 3 2 1 > > > > Thanks, > > Peter > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Seemingly Similar Threads
- Subsetting problem data, 2
- Data separated by spaces, getting data into R using field lengths
- Confirmatory factor analysis using the sem package. TLI CFI and RMSEA absent from model summary.
- how to search to value to another table
- How to Group Categorical data in R?