thr3ads.net - R help - [R] Split data frame into 250-row chunks [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Liz Hare

2015-Jun-10 12:39 UTC

[R] Split data frame into 250-row chunks

Hi R-Experts,

I have a data.frame like this:
> head(map)  chr snp   poscm   posbp    dist
1   1  M1 2.99043 3249189      NA
2   1  M2 3.06457 3273096 0.07414
3   1  M3 3.17018 3307151 0.10561
4   1  M4 3.20892 3319643 0.03874
5   1  M5 3.28120 3342947 0.07228
6   1  M6 3.29624 3347798 0.01504

I need to split this into chunks of 250 rows (there will usually be a last chunk
with < 250 rows).

If I only had to extract one 250-line chunk, it would be easy:

map1 <- map[1:250, ]

or using subset().

I tried to make it a loop iterating through num and using beg and nd for
starting and ending indices, but I couldn?t figure out how to reference all the
variables I needed in this:
> chunks    beg   nd let num
1     1  250   a   1
2   251  500   b   2
3   501  750   c   3
4   751 1000   d   4
5  1001 1250   e   5
6  1251 1500   f   6
7  1501 1750   g   7
8  1751 2000   h   8
9  2001 2250   i   9
10 2251 2500   j  10
?

Remembering that loops are not always the best answer in R, I looked at other
options like split, following this example but not being able to adapt it from a
vector to a data.frame version
http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r
<http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r>
(Yes, I?ve reviewed the language documentation). I checked out ddply and
data.table, but couldn?t find a way to use them with index positions instead of
column values.

Thanks,
Liz



	[[alternative HTML version deleted]]

David Winsemius

2015-Jun-10 19:18 UTC

head link

[R] Split data frame into 250-row chunks

On Jun 10, 2015, at 5:39 AM, Liz Hare wrote:
> Hi R-Experts,
> 
> I have a data.frame like this:
> 
>> head(map)
>  chr snp   poscm   posbp    dist
> 1   1  M1 2.99043 3249189      NA
> 2   1  M2 3.06457 3273096 0.07414
> 3   1  M3 3.17018 3307151 0.10561
> 4   1  M4 3.20892 3319643 0.03874
> 5   1  M5 3.28120 3342947 0.07228
> 6   1  M6 3.29624 3347798 0.01504
> 
> I need to split this into chunks of 250 rows (there will usually be a last
chunk with < 250 rows).
split( map, trunc( 0:(nrow(map)-1 )/nrow(map) ) )

Untested. Designed to return a list with indices starting at "0".
> trunc( 0:19/5) [1] 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3


> 
> If I only had to extract one 250-line chunk, it would be easy:
> 
> map1 <- map[1:250, ]
> 
> or using subset().
> 
> I tried to make it a loop iterating through num and using beg and nd for
starting and ending indices, but I couldn?t figure out how to reference all the
variables I needed in this:
> 
>> chunks
>    beg   nd let num
> 1     1  250   a   1
> 2   251  500   b   2
> 3   501  750   c   3
> 4   751 1000   d   4
> 5  1001 1250   e   5
> 6  1251 1500   f   6
> 7  1501 1750   g   7
> 8  1751 2000   h   8
> 9  2001 2250   i   9
> 10 2251 2500   j  10
> ?
> 
> Remembering that loops are not always the best answer in R, I looked at
other options like split, following this example but not being able to adapt it
from a vector to a data.frame version
> http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r
<http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r>
(Yes, I?ve reviewed the language documentation). I checked out ddply and
data.table, but couldn?t find a way to use them with index positions instead of
column values.
> 
> Thanks,
> Liz
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

Marc Schwartz

2015-Jun-10 19:21 UTC

head link

[R] Split data frame into 250-row chunks

> On Jun 10, 2015, at 7:39 AM, Liz Hare <doggene at earthlink.net>
wrote:
> 
> Hi R-Experts,
> 
> I have a data.frame like this:
> 
>> head(map)
>  chr snp   poscm   posbp    dist
> 1   1  M1 2.99043 3249189      NA
> 2   1  M2 3.06457 3273096 0.07414
> 3   1  M3 3.17018 3307151 0.10561
> 4   1  M4 3.20892 3319643 0.03874
> 5   1  M5 3.28120 3342947 0.07228
> 6   1  M6 3.29624 3347798 0.01504
> 
> I need to split this into chunks of 250 rows (there will usually be a last
chunk with < 250 rows).
> 
> If I only had to extract one 250-line chunk, it would be easy:
> 
> map1 <- map[1:250, ]
> 
> or using subset().
> 
> I tried to make it a loop iterating through num and using beg and nd for
starting and ending indices, but I couldn?t figure out how to reference all the
variables I needed in this:
> 
>> chunks
>    beg   nd let num
> 1     1  250   a   1
> 2   251  500   b   2
> 3   501  750   c   3
> 4   751 1000   d   4
> 5  1001 1250   e   5
> 6  1251 1500   f   6
> 7  1501 1750   g   7
> 8  1751 2000   h   8
> 9  2001 2250   i   9
> 10 2251 2500   j  10
> ?
> 
> Remembering that loops are not always the best answer in R, I looked at
other options like split, following this example but not being able to adapt it
from a vector to a data.frame version
> http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r
<http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r>
(Yes, I?ve reviewed the language documentation). I checked out ddply and
data.table, but couldn?t find a way to use them with index positions instead of
column values.
> 
> Thanks,
> Liz

Hi,

  map.split <- split(x, (as.numeric(rownames(map)) - 1) %/% 250)

That will create a list of data frames comprised of subsets of ?map?, each of
which will have 250 records except, of course, for the last one.

Essentially, you are creating a grouping variable based upon the numeric row
names modulo the length of the chunks that you want. For example, using the
built-in ?iris? dataset, which has 150 rows:
> (as.numeric(rownames(iris)) - 1) %/% 50  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [34] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [67] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[100] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[133] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

iris.split <- split(iris, (as.numeric(rownames(iris)) - 1) %/% 50)
> length(iris.split)[1] 3
> lapply(iris.split, nrow)$`0`
[1] 50

$`1`
[1] 50

$`2`
[1] 50

> lapply(iris.split, head)$`0`
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

$`1`
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
51          7.0         3.2          4.7         1.4 versicolor
52          6.4         3.2          4.5         1.5 versicolor
53          6.9         3.1          4.9         1.5 versicolor
54          5.5         2.3          4.0         1.3 versicolor
55          6.5         2.8          4.6         1.5 versicolor
56          5.7         2.8          4.5         1.3 versicolor

$`2`
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
101          6.3         3.3          6.0         2.5 virginica
102          5.8         2.7          5.1         1.9 virginica
103          7.1         3.0          5.9         2.1 virginica
104          6.3         2.9          5.6         1.8 virginica
105          6.5         3.0          5.8         2.2 virginica
106          7.6         3.0          6.6         2.1 virginica



Regards,

Marc Schwartz

Marc Schwartz

2015-Jun-10 19:23 UTC

head link

[R] Split data frame into 250-row chunks

> On Jun 10, 2015, at 2:21 PM, Marc Schwartz <marc_schwartz at me.com>
wrote:
> 
> 
>> On Jun 10, 2015, at 7:39 AM, Liz Hare <doggene at earthlink.net>
wrote:
>> 
>> Hi R-Experts,
>> 
>> I have a data.frame like this:
>> 
>>> head(map)
>> chr snp   poscm   posbp    dist
>> 1   1  M1 2.99043 3249189      NA
>> 2   1  M2 3.06457 3273096 0.07414
>> 3   1  M3 3.17018 3307151 0.10561
>> 4   1  M4 3.20892 3319643 0.03874
>> 5   1  M5 3.28120 3342947 0.07228
>> 6   1  M6 3.29624 3347798 0.01504
>> 
>> I need to split this into chunks of 250 rows (there will usually be a
last chunk with < 250 rows).
>> 
>> If I only had to extract one 250-line chunk, it would be easy:
>> 
>> map1 <- map[1:250, ]
>> 
>> or using subset().
>> 
>> I tried to make it a loop iterating through num and using beg and nd
for starting and ending indices, but I couldn?t figure out how to reference all
the variables I needed in this:
>> 
>>> chunks
>>   beg   nd let num
>> 1     1  250   a   1
>> 2   251  500   b   2
>> 3   501  750   c   3
>> 4   751 1000   d   4
>> 5  1001 1250   e   5
>> 6  1251 1500   f   6
>> 7  1501 1750   g   7
>> 8  1751 2000   h   8
>> 9  2001 2250   i   9
>> 10 2251 2500   j  10
>> ?
>> 
>> Remembering that loops are not always the best answer in R, I looked at
other options like split, following this example but not being able to adapt it
from a vector to a data.frame version
>>
http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r
<http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r>
(Yes, I?ve reviewed the language documentation). I checked out ddply and
data.table, but couldn?t find a way to use them with index positions instead of
column values.
>> 
>> Thanks,
>> Liz
> 
> 
> Hi,
> 
>  map.split <- split(x, (as.numeric(rownames(map)) - 1) %/% 250)


Shoot, typo in the above, it should be ?map?, not ?x?:

   map.split <- split(map, (as.numeric(rownames(map)) - 1) %/% 250)

Marc


> 
> That will create a list of data frames comprised of subsets of ?map?, each
of which will have 250 records except, of course, for the last one.
> 
> Essentially, you are creating a grouping variable based upon the numeric
row names modulo the length of the chunks that you want. For example, using the
built-in ?iris? dataset, which has 150 rows:
> 
>> (as.numeric(rownames(iris)) - 1) %/% 50
>  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> [34] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> [67] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> [100] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> [133] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
> 
> iris.split <- split(iris, (as.numeric(rownames(iris)) - 1) %/% 50)
> 
>> length(iris.split)
> [1] 3
> 
>> lapply(iris.split, nrow)
> $`0`
> [1] 50
> 
> $`1`
> [1] 50
> 
> $`2`
> [1] 50
> 
> 
>> lapply(iris.split, head)
> $`0`
>  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 1          5.1         3.5          1.4         0.2  setosa
> 2          4.9         3.0          1.4         0.2  setosa
> 3          4.7         3.2          1.3         0.2  setosa
> 4          4.6         3.1          1.5         0.2  setosa
> 5          5.0         3.6          1.4         0.2  setosa
> 6          5.4         3.9          1.7         0.4  setosa
> 
> $`1`
>   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
> 51          7.0         3.2          4.7         1.4 versicolor
> 52          6.4         3.2          4.5         1.5 versicolor
> 53          6.9         3.1          4.9         1.5 versicolor
> 54          5.5         2.3          4.0         1.3 versicolor
> 55          6.5         2.8          4.6         1.5 versicolor
> 56          5.7         2.8          4.5         1.3 versicolor
> 
> $`2`
>    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
> 101          6.3         3.3          6.0         2.5 virginica
> 102          5.8         2.7          5.1         1.9 virginica
> 103          7.1         3.0          5.9         2.1 virginica
> 104          6.3         2.9          5.6         1.8 virginica
> 105          6.5         3.0          5.8         2.2 virginica
> 106          7.6         3.0          6.6         2.1 virginica
> 
> 
> 
> Regards,
> 
> Marc Schwartz
>

David Winsemius

2015-Jun-10 19:33 UTC

head link

[R] Split data frame into 250-row chunks

On Jun 10, 2015, at 12:18 PM, David Winsemius wrote:
> 
> On Jun 10, 2015, at 5:39 AM, Liz Hare wrote:
> 
>> Hi R-Experts,
>> 
>> I have a data.frame like this:
>> 
>>> head(map)
>> chr snp   poscm   posbp    dist
>> 1   1  M1 2.99043 3249189      NA
>> 2   1  M2 3.06457 3273096 0.07414
>> 3   1  M3 3.17018 3307151 0.10561
>> 4   1  M4 3.20892 3319643 0.03874
>> 5   1  M5 3.28120 3342947 0.07228
>> 6   1  M6 3.29624 3347798 0.01504
>> 
>> I need to split this into chunks of 250 rows (there will usually be a
last chunk with < 250 rows).
> 
> split( map, trunc( 0:(nrow(map)-1 )/nrow(map) ) )
> 
> Untested. Designed to return a list with indices starting at "0".
Looking at Marc Schwartz' answer ( a smarter man than I) I see this should
have been:

split( map, trunc( 0:(nrow(map)-1 )/250) )

-- 
David.
> 
>> trunc( 0:19/5)
> [1] 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
> 
> 
> 
>> 
>> If I only had to extract one 250-line chunk, it would be easy:
>> 
>> map1 <- map[1:250, ]
>> 
>> or using subset().
>> 
>> I tried to make it a loop iterating through num and using beg and nd
for starting and ending indices, but I couldn?t figure out how to reference all
the variables I needed in this:
>> 
>>> chunks
>>   beg   nd let num
>> 1     1  250   a   1
>> 2   251  500   b   2
>> 3   501  750   c   3
>> 4   751 1000   d   4
>> 5  1001 1250   e   5
>> 6  1251 1500   f   6
>> 7  1501 1750   g   7
>> 8  1751 2000   h   8
>> 9  2001 2250   i   9
>> 10 2251 2500   j  10
>> ?
>> 
>> Remembering that loops are not always the best answer in R, I looked at
other options like split, following this example but not being able to adapt it
from a vector to a data.frame version
>>
http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r
<http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r>
(Yes, I?ve reviewed the language documentation). I checked out ddply and
data.table, but couldn?t find a way to use them with index positions instead of
column values.
>> 
>> Thanks,
>> Liz
>> 
>> 
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

R help - Jun 2015 - Split data frame into 250-row chunks

[R] Split data frame into 250-row chunks

[R] Split data frame into 250-row chunks

[R] Split data frame into 250-row chunks

[R] Split data frame into 250-row chunks

[R] Split data frame into 250-row chunks