Jay
2008-Oct-13 17:06 UTC
[R] split data, but ensure each level of the factor is represented
Hello, I'll use part of the iris dataset for an example of what I want to do.> data(iris) > iris<-iris[1:10,1:4] > irisSepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 4 4.6 3.1 1.5 0.2 5 5.0 3.6 1.4 0.2 6 5.4 3.9 1.7 0.4 7 4.6 3.4 1.4 0.3 8 5.0 3.4 1.5 0.2 9 4.4 2.9 1.4 0.2 10 4.9 3.1 1.5 0.1 Now if I want to split this data using the vector> a<-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3) > a[1] 3 3 3 2 3 1 2 3 2 3 Then the function split works fine> split(iris,a)$`1` Sepal.Length Sepal.Width Petal.Length Petal.Width 6 5.4 3.9 1.7 0.4 $`2` Sepal.Length Sepal.Width Petal.Length Petal.Width 4 4.6 3.1 1.5 0.2 7 4.6 3.4 1.4 0.3 9 4.4 2.9 1.4 0.2 $`3` Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 5 5.0 3.6 1.4 0.2 8 5.0 3.4 1.5 0.2 10 4.9 3.1 1.5 0.1 My problem is when the vector lacks one of the values from 1:n. For example if the vector is> a<-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3) > a[1] 3 3 3 2 3 2 2 3 2 3 then split will return a list without a $`1`. I would like to have the $`1` be a vector of 0's with the same length as the number of columns in the dataset. In other words I want to write a function that returns> mysplit(iris,a)$`1` [1] 0 0 0 0 0 $`2` Sepal.Length Sepal.Width Petal.Length Petal.Width 4 4.6 3.1 1.5 0.2 6 5.4 3.9 1.7 0.4 7 4.6 3.4 1.4 0.3 9 4.4 2.9 1.4 0.2 $`3` Sepal.Length Sepal.Width Petal.Length Petal.Width 1 5.1 3.5 1.4 0.2 2 4.9 3.0 1.4 0.2 3 4.7 3.2 1.3 0.2 5 5.0 3.6 1.4 0.2 8 5.0 3.4 1.5 0.2 10 4.9 3.1 1.5 0.1 Thank you for your time, Jay
Henrique Dallazuanna
2008-Oct-13 17:14 UTC
[R] split data, but ensure each level of the factor is represented
Try this: a<-factor(c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3), levels = 1:3) split(iris, a) lapply(split(iris, a), dim) On Mon, Oct 13, 2008 at 2:06 PM, Jay <wilcoxjay@gmail.com> wrote:> Hello, > > I'll use part of the iris dataset for an example of what I want to > do. > > > data(iris) > > iris<-iris[1:10,1:4] > > iris > Sepal.Length Sepal.Width Petal.Length Petal.Width > 1 5.1 3.5 1.4 0.2 > 2 4.9 3.0 1.4 0.2 > 3 4.7 3.2 1.3 0.2 > 4 4.6 3.1 1.5 0.2 > 5 5.0 3.6 1.4 0.2 > 6 5.4 3.9 1.7 0.4 > 7 4.6 3.4 1.4 0.3 > 8 5.0 3.4 1.5 0.2 > 9 4.4 2.9 1.4 0.2 > 10 4.9 3.1 1.5 0.1 > > Now if I want to split this data using the vector > > a<-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3) > > a > [1] 3 3 3 2 3 1 2 3 2 3 > > Then the function split works fine > > split(iris,a) > $`1` > Sepal.Length Sepal.Width Petal.Length Petal.Width > 6 5.4 3.9 1.7 0.4 > > $`2` > Sepal.Length Sepal.Width Petal.Length Petal.Width > 4 4.6 3.1 1.5 0.2 > 7 4.6 3.4 1.4 0.3 > 9 4.4 2.9 1.4 0.2 > > $`3` > Sepal.Length Sepal.Width Petal.Length Petal.Width > 1 5.1 3.5 1.4 0.2 > 2 4.9 3.0 1.4 0.2 > 3 4.7 3.2 1.3 0.2 > 5 5.0 3.6 1.4 0.2 > 8 5.0 3.4 1.5 0.2 > 10 4.9 3.1 1.5 0.1 > > > My problem is when the vector lacks one of the values from 1:n. For > example if the vector is > > a<-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3) > > a > [1] 3 3 3 2 3 2 2 3 2 3 > > then split will return a list without a $`1`. I would like to have the > $`1` be a vector of 0's with the same length as the number of columns > in the dataset. In other words I want to write a function that returns > > > mysplit(iris,a) > $`1` > [1] 0 0 0 0 0 > > $`2` > Sepal.Length Sepal.Width Petal.Length Petal.Width > 4 4.6 3.1 1.5 0.2 > 6 5.4 3.9 1.7 0.4 > 7 4.6 3.4 1.4 0.3 > 9 4.4 2.9 1.4 0.2 > > $`3` > Sepal.Length Sepal.Width Petal.Length Petal.Width > 1 5.1 3.5 1.4 0.2 > 2 4.9 3.0 1.4 0.2 > 3 4.7 3.2 1.3 0.2 > 5 5.0 3.6 1.4 0.2 > 8 5.0 3.4 1.5 0.2 > 10 4.9 3.1 1.5 0.1 > > Thank you for your time, > > Jay > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
Gabor Grothendieck
2008-Oct-13 17:20 UTC
[R] split data, but ensure each level of the factor is represented
Try this: split(iris, factor(a, levels = 1:3)) On Mon, Oct 13, 2008 at 1:06 PM, Jay <wilcoxjay at gmail.com> wrote:> Hello, > > I'll use part of the iris dataset for an example of what I want to > do. > >> data(iris) >> iris<-iris[1:10,1:4] >> iris > Sepal.Length Sepal.Width Petal.Length Petal.Width > 1 5.1 3.5 1.4 0.2 > 2 4.9 3.0 1.4 0.2 > 3 4.7 3.2 1.3 0.2 > 4 4.6 3.1 1.5 0.2 > 5 5.0 3.6 1.4 0.2 > 6 5.4 3.9 1.7 0.4 > 7 4.6 3.4 1.4 0.3 > 8 5.0 3.4 1.5 0.2 > 9 4.4 2.9 1.4 0.2 > 10 4.9 3.1 1.5 0.1 > > Now if I want to split this data using the vector >> a<-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3) >> a > [1] 3 3 3 2 3 1 2 3 2 3 > > Then the function split works fine >> split(iris,a) > $`1` > Sepal.Length Sepal.Width Petal.Length Petal.Width > 6 5.4 3.9 1.7 0.4 > > $`2` > Sepal.Length Sepal.Width Petal.Length Petal.Width > 4 4.6 3.1 1.5 0.2 > 7 4.6 3.4 1.4 0.3 > 9 4.4 2.9 1.4 0.2 > > $`3` > Sepal.Length Sepal.Width Petal.Length Petal.Width > 1 5.1 3.5 1.4 0.2 > 2 4.9 3.0 1.4 0.2 > 3 4.7 3.2 1.3 0.2 > 5 5.0 3.6 1.4 0.2 > 8 5.0 3.4 1.5 0.2 > 10 4.9 3.1 1.5 0.1 > > > My problem is when the vector lacks one of the values from 1:n. For > example if the vector is >> a<-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3) >> a > [1] 3 3 3 2 3 2 2 3 2 3 > > then split will return a list without a $`1`. I would like to have the > $`1` be a vector of 0's with the same length as the number of columns > in the dataset. In other words I want to write a function that returns > >> mysplit(iris,a) > $`1` > [1] 0 0 0 0 0 > > $`2` > Sepal.Length Sepal.Width Petal.Length Petal.Width > 4 4.6 3.1 1.5 0.2 > 6 5.4 3.9 1.7 0.4 > 7 4.6 3.4 1.4 0.3 > 9 4.4 2.9 1.4 0.2 > > $`3` > Sepal.Length Sepal.Width Petal.Length Petal.Width > 1 5.1 3.5 1.4 0.2 > 2 4.9 3.0 1.4 0.2 > 3 4.7 3.2 1.3 0.2 > 5 5.0 3.6 1.4 0.2 > 8 5.0 3.4 1.5 0.2 > 10 4.9 3.1 1.5 0.1 > > Thank you for your time, > > Jay > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >