Hi R-Experts, I have a data.frame like this:> head(map)chr snp poscm posbp dist 1 1 M1 2.99043 3249189 NA 2 1 M2 3.06457 3273096 0.07414 3 1 M3 3.17018 3307151 0.10561 4 1 M4 3.20892 3319643 0.03874 5 1 M5 3.28120 3342947 0.07228 6 1 M6 3.29624 3347798 0.01504 I need to split this into chunks of 250 rows (there will usually be a last chunk with < 250 rows). If I only had to extract one 250-line chunk, it would be easy: map1 <- map[1:250, ] or using subset(). I tried to make it a loop iterating through num and using beg and nd for starting and ending indices, but I couldn?t figure out how to reference all the variables I needed in this:> chunksbeg nd let num 1 1 250 a 1 2 251 500 b 2 3 501 750 c 3 4 751 1000 d 4 5 1001 1250 e 5 6 1251 1500 f 6 7 1501 1750 g 7 8 1751 2000 h 8 9 2001 2250 i 9 10 2251 2500 j 10 ? Remembering that loops are not always the best answer in R, I looked at other options like split, following this example but not being able to adapt it from a vector to a data.frame version http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r <http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r> (Yes, I?ve reviewed the language documentation). I checked out ddply and data.table, but couldn?t find a way to use them with index positions instead of column values. Thanks, Liz [[alternative HTML version deleted]]
On Jun 10, 2015, at 5:39 AM, Liz Hare wrote:> Hi R-Experts, > > I have a data.frame like this: > >> head(map) > chr snp poscm posbp dist > 1 1 M1 2.99043 3249189 NA > 2 1 M2 3.06457 3273096 0.07414 > 3 1 M3 3.17018 3307151 0.10561 > 4 1 M4 3.20892 3319643 0.03874 > 5 1 M5 3.28120 3342947 0.07228 > 6 1 M6 3.29624 3347798 0.01504 > > I need to split this into chunks of 250 rows (there will usually be a last chunk with < 250 rows).split( map, trunc( 0:(nrow(map)-1 )/nrow(map) ) ) Untested. Designed to return a list with indices starting at "0".> trunc( 0:19/5)[1] 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3> > If I only had to extract one 250-line chunk, it would be easy: > > map1 <- map[1:250, ] > > or using subset(). > > I tried to make it a loop iterating through num and using beg and nd for starting and ending indices, but I couldn?t figure out how to reference all the variables I needed in this: > >> chunks > beg nd let num > 1 1 250 a 1 > 2 251 500 b 2 > 3 501 750 c 3 > 4 751 1000 d 4 > 5 1001 1250 e 5 > 6 1251 1500 f 6 > 7 1501 1750 g 7 > 8 1751 2000 h 8 > 9 2001 2250 i 9 > 10 2251 2500 j 10 > ? > > Remembering that loops are not always the best answer in R, I looked at other options like split, following this example but not being able to adapt it from a vector to a data.frame version > http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r <http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r> (Yes, I?ve reviewed the language documentation). I checked out ddply and data.table, but couldn?t find a way to use them with index positions instead of column values. > > Thanks, > Liz > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
> On Jun 10, 2015, at 7:39 AM, Liz Hare <doggene at earthlink.net> wrote: > > Hi R-Experts, > > I have a data.frame like this: > >> head(map) > chr snp poscm posbp dist > 1 1 M1 2.99043 3249189 NA > 2 1 M2 3.06457 3273096 0.07414 > 3 1 M3 3.17018 3307151 0.10561 > 4 1 M4 3.20892 3319643 0.03874 > 5 1 M5 3.28120 3342947 0.07228 > 6 1 M6 3.29624 3347798 0.01504 > > I need to split this into chunks of 250 rows (there will usually be a last chunk with < 250 rows). > > If I only had to extract one 250-line chunk, it would be easy: > > map1 <- map[1:250, ] > > or using subset(). > > I tried to make it a loop iterating through num and using beg and nd for starting and ending indices, but I couldn?t figure out how to reference all the variables I needed in this: > >> chunks > beg nd let num > 1 1 250 a 1 > 2 251 500 b 2 > 3 501 750 c 3 > 4 751 1000 d 4 > 5 1001 1250 e 5 > 6 1251 1500 f 6 > 7 1501 1750 g 7 > 8 1751 2000 h 8 > 9 2001 2250 i 9 > 10 2251 2500 j 10 > ? > > Remembering that loops are not always the best answer in R, I looked at other options like split, following this example but not being able to adapt it from a vector to a data.frame version > http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r <http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r> (Yes, I?ve reviewed the language documentation). I checked out ddply and data.table, but couldn?t find a way to use them with index positions instead of column values. > > Thanks, > LizHi, map.split <- split(x, (as.numeric(rownames(map)) - 1) %/% 250) That will create a list of data frames comprised of subsets of ?map?, each of which will have 250 records except, of course, for the last one. Essentially, you are creating a grouping variable based upon the numeric row names modulo the length of the chunks that you want. For example, using the built-in ?iris? dataset, which has 150 rows:> (as.numeric(rownames(iris)) - 1) %/% 50[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [34] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [67] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [100] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [133] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 iris.split <- split(iris, (as.numeric(rownames(iris)) - 1) %/% 50)> length(iris.split)[1] 3> lapply(iris.split, nrow)$`0` [1] 50 $`1` [1] 50 $`2` [1] 50> lapply(iris.split, head)$`0` Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa $`1` Sepal.Length Sepal.Width Petal.Length Petal.Width Species 51 7.0 3.2 4.7 1.4 versicolor 52 6.4 3.2 4.5 1.5 versicolor 53 6.9 3.1 4.9 1.5 versicolor 54 5.5 2.3 4.0 1.3 versicolor 55 6.5 2.8 4.6 1.5 versicolor 56 5.7 2.8 4.5 1.3 versicolor $`2` Sepal.Length Sepal.Width Petal.Length Petal.Width Species 101 6.3 3.3 6.0 2.5 virginica 102 5.8 2.7 5.1 1.9 virginica 103 7.1 3.0 5.9 2.1 virginica 104 6.3 2.9 5.6 1.8 virginica 105 6.5 3.0 5.8 2.2 virginica 106 7.6 3.0 6.6 2.1 virginica Regards, Marc Schwartz
> On Jun 10, 2015, at 2:21 PM, Marc Schwartz <marc_schwartz at me.com> wrote: > > >> On Jun 10, 2015, at 7:39 AM, Liz Hare <doggene at earthlink.net> wrote: >> >> Hi R-Experts, >> >> I have a data.frame like this: >> >>> head(map) >> chr snp poscm posbp dist >> 1 1 M1 2.99043 3249189 NA >> 2 1 M2 3.06457 3273096 0.07414 >> 3 1 M3 3.17018 3307151 0.10561 >> 4 1 M4 3.20892 3319643 0.03874 >> 5 1 M5 3.28120 3342947 0.07228 >> 6 1 M6 3.29624 3347798 0.01504 >> >> I need to split this into chunks of 250 rows (there will usually be a last chunk with < 250 rows). >> >> If I only had to extract one 250-line chunk, it would be easy: >> >> map1 <- map[1:250, ] >> >> or using subset(). >> >> I tried to make it a loop iterating through num and using beg and nd for starting and ending indices, but I couldn?t figure out how to reference all the variables I needed in this: >> >>> chunks >> beg nd let num >> 1 1 250 a 1 >> 2 251 500 b 2 >> 3 501 750 c 3 >> 4 751 1000 d 4 >> 5 1001 1250 e 5 >> 6 1251 1500 f 6 >> 7 1501 1750 g 7 >> 8 1751 2000 h 8 >> 9 2001 2250 i 9 >> 10 2251 2500 j 10 >> ? >> >> Remembering that loops are not always the best answer in R, I looked at other options like split, following this example but not being able to adapt it from a vector to a data.frame version >> http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r <http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r> (Yes, I?ve reviewed the language documentation). I checked out ddply and data.table, but couldn?t find a way to use them with index positions instead of column values. >> >> Thanks, >> Liz > > > Hi, > > map.split <- split(x, (as.numeric(rownames(map)) - 1) %/% 250)Shoot, typo in the above, it should be ?map?, not ?x?: map.split <- split(map, (as.numeric(rownames(map)) - 1) %/% 250) Marc> > That will create a list of data frames comprised of subsets of ?map?, each of which will have 250 records except, of course, for the last one. > > Essentially, you are creating a grouping variable based upon the numeric row names modulo the length of the chunks that you want. For example, using the built-in ?iris? dataset, which has 150 rows: > >> (as.numeric(rownames(iris)) - 1) %/% 50 > [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > [34] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > [67] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > [100] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 > [133] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 > > iris.split <- split(iris, (as.numeric(rownames(iris)) - 1) %/% 50) > >> length(iris.split) > [1] 3 > >> lapply(iris.split, nrow) > $`0` > [1] 50 > > $`1` > [1] 50 > > $`2` > [1] 50 > > >> lapply(iris.split, head) > $`0` > Sepal.Length Sepal.Width Petal.Length Petal.Width Species > 1 5.1 3.5 1.4 0.2 setosa > 2 4.9 3.0 1.4 0.2 setosa > 3 4.7 3.2 1.3 0.2 setosa > 4 4.6 3.1 1.5 0.2 setosa > 5 5.0 3.6 1.4 0.2 setosa > 6 5.4 3.9 1.7 0.4 setosa > > $`1` > Sepal.Length Sepal.Width Petal.Length Petal.Width Species > 51 7.0 3.2 4.7 1.4 versicolor > 52 6.4 3.2 4.5 1.5 versicolor > 53 6.9 3.1 4.9 1.5 versicolor > 54 5.5 2.3 4.0 1.3 versicolor > 55 6.5 2.8 4.6 1.5 versicolor > 56 5.7 2.8 4.5 1.3 versicolor > > $`2` > Sepal.Length Sepal.Width Petal.Length Petal.Width Species > 101 6.3 3.3 6.0 2.5 virginica > 102 5.8 2.7 5.1 1.9 virginica > 103 7.1 3.0 5.9 2.1 virginica > 104 6.3 2.9 5.6 1.8 virginica > 105 6.5 3.0 5.8 2.2 virginica > 106 7.6 3.0 6.6 2.1 virginica > > > > Regards, > > Marc Schwartz >
On Jun 10, 2015, at 12:18 PM, David Winsemius wrote:> > On Jun 10, 2015, at 5:39 AM, Liz Hare wrote: > >> Hi R-Experts, >> >> I have a data.frame like this: >> >>> head(map) >> chr snp poscm posbp dist >> 1 1 M1 2.99043 3249189 NA >> 2 1 M2 3.06457 3273096 0.07414 >> 3 1 M3 3.17018 3307151 0.10561 >> 4 1 M4 3.20892 3319643 0.03874 >> 5 1 M5 3.28120 3342947 0.07228 >> 6 1 M6 3.29624 3347798 0.01504 >> >> I need to split this into chunks of 250 rows (there will usually be a last chunk with < 250 rows). > > split( map, trunc( 0:(nrow(map)-1 )/nrow(map) ) ) > > Untested. Designed to return a list with indices starting at "0".Looking at Marc Schwartz' answer ( a smarter man than I) I see this should have been: split( map, trunc( 0:(nrow(map)-1 )/250) ) -- David.> >> trunc( 0:19/5) > [1] 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 > > > >> >> If I only had to extract one 250-line chunk, it would be easy: >> >> map1 <- map[1:250, ] >> >> or using subset(). >> >> I tried to make it a loop iterating through num and using beg and nd for starting and ending indices, but I couldn?t figure out how to reference all the variables I needed in this: >> >>> chunks >> beg nd let num >> 1 1 250 a 1 >> 2 251 500 b 2 >> 3 501 750 c 3 >> 4 751 1000 d 4 >> 5 1001 1250 e 5 >> 6 1251 1500 f 6 >> 7 1501 1750 g 7 >> 8 1751 2000 h 8 >> 9 2001 2250 i 9 >> 10 2251 2500 j 10 >> ? >> >> Remembering that loops are not always the best answer in R, I looked at other options like split, following this example but not being able to adapt it from a vector to a data.frame version >> http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r <http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r> (Yes, I?ve reviewed the language documentation). I checked out ddply and data.table, but couldn?t find a way to use them with index positions instead of column values. >> >> Thanks, >> Liz >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA