thr3ads.net - R help - [R] Matrix Manipulation R [Jul 2015]

If this information is useful, please help other people find it:
Share via:

Alex Kim

2015-Jul-04 10:09 UTC

[R] Matrix Manipulation R

Hi guys,

Suppose I have an extremely large data frame with 2 columns and .5 mil
rows. For example, the last 6 rows may look like this:
.
..
...
89         100
93         120
95         125
101        NA
115        NA
123        NA
124        NA

I would like to manipulate this data frame to output a data frame that
looks like:,

100        89, 93, 95
120        101, 115
125        123, 124

What would be the absolute quickest way to do this, given that there are
many rows? Currently I have this:

# m is the large two column data frame
end <- na.omit(m[,'V2']);
out <- data.frame(End=end,
Start=unname(sapply(split(m[,'V1'],findInterval(m[,'V1'],end))[as.character(0:c(length(end)-1))],paste,collapse='.')))

However this is taking a little bit too long.

Thank you for your help!

	[[alternative HTML version deleted]]

David Winsemius

2015-Jul-04 16:40 UTC

head link

[R] Matrix Manipulation R

> On Jul 4, 2015, at 3:09 AM, Alex Kim <dumboisverydumb at gmail.com>
wrote:
> 
> Hi guys,
> 
> Suppose I have an extremely large data frame with 2 columns and .5 mil
> rows. For example, the last 6 rows may look like this:
> .
> ..
> ...
> 89         100
> 93         120
> 95         125
> 101        NA
> 115        NA
> 123        NA
> 124        NA
> 
> I would like to manipulate this data frame to output a data frame that
> looks like:,
> 
> 100        89, 93, 95
> 120        101, 115
> 125        123, 124
> 
> What would be the absolute quickest way to do this, given that there are
> many rows? Currently I have this:
> 
> # m is the large two column data frame
> end <- na.omit(m[,'V2']);
> out <- data.frame(End=end,
>
Start=unname(sapply(split(m[,'V1'],findInterval(m[,'V1'],end))[as.character(0:c(length(end)-1))],paste,collapse='.')))
> 
This might be a little faster. It skips some of the steps in your version:

 dput(m)
structure(list(V1 = c(89, 93, 95, 101, 115, 123, 124), V2 = c(100, 
120, 125, NA, NA, NA, NA)), .Names = c("V1", "V2"),
row.names = c(NA,
-7L), class = "data.frame")

end <- na.omit(m[,'V2?])
# this will only work if that vector is sorted
data.frame(End = end,
           Start = sapply( split( m$V1, 
                                 findInterval(m$V1, c(-Inf, end))), 
                          paste,collapse="," ) )
  End    Start
1 100 89,93,95
2 120  101,115
3 125  123,124

> However this is taking a little bit too long.
> 
> Thank you for your help!
> 
> 	[[alternative HTML version deleted]]
This is a plain-text mailing list and posting triplicate questions is poor form.

Do read the posting guide.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
? 
David Winsemius, MD
Alameda, CA, USA

R help - Jul 2015 - Matrix Manipulation R

[R] Matrix Manipulation R

[R] Matrix Manipulation R