thr3ads.net - R help - [R] Using NA as a break point for indicator variable? [May 2012]

If this information is useful, please help other people find it:
Share via:

Max Brondfield

2012-May-23 20:42 UTC

[R] Using NA as a break point for indicator variable?

Hi all,
I am working with a spatial data set for which I am only interested in high
concentration values ("leaks"). The low values (< 90th percentile)
have
already been turned into NA's, leaving me with a matrix like this:

< CH4_leak

      lon            lat            CH4
1  -71.11954 42.35068 2.595834
2  -71.11954 42.35068 2.595688
3   NA           NA           NA
4   NA           NA           NA
5   NA           NA           NA
6  -71.11948 42.35068 2.435762
7  -71.11948 42.35068 2.491003
8  NA            NA           NA
9  -71.11930 42.35068 2.464475
10 -71.11932 42.35068 2.470865

Every time an NA comes up, it means the "leak" is gone, and the next
valid
value would represent a different leak (at a different location). My goal
is to tag all of the remaining values with an indicator variable to
spatially distinguish the leaks. I am envisioning a simple numeric
indicator such as:

     lon            lat            CH4            leak_num
1  -71.11954 42.35068 2.595834   1
2  -71.11954 42.35068 2.595688   1
3   NA           NA           NA             NA
4   NA           NA           NA             NA
5   NA           NA           NA             NA
6  -71.11948 42.35068 2.435762   2
7  -71.11948 42.35068 2.491003   2
8  NA            NA           NA             NA
9  -71.11930 42.35068 2.064475   3
10 -71.11932 42.35068 2.070865  3

Does anyone have any thoughts on how to code this, perhaps using the NA
values as a "break point"? The data set is far too large to do this
manually, and I must admit I'm completely at a loss. Any help would be much
appreciated! Best,

Max

	[[alternative HTML version deleted]]

Rui Barradas

2012-May-24 12:52 UTC

head link

[R] Using NA as a break point for indicator variable?

Hello,

Assuming that 'd' is your original data.frame and that you've set
entire
rows to NA, try this


d$leak_num <- NA
ix <- !is.na(d[, 1])  # any column will do, entire row is NA
## alternative, if other rows may have NAs, due to something else
#ix <- apply(d, 1, function(x) all(!is.na(x)))
r <- rle(ix)
v <- cumsum(r$values)
d$leak_num[ix] <- rep(v[r$values], r$lengths[r$values])
d


Hope this helps,

Rui Barradas

Em 24-05-2012 11:00, Max Brondfield <mbrondf at post.harvard.edu>
escreveu:> Date: Wed, 23 May 2012 16:42:02 -0400
> From: Max Brondfield<mbrondf at post.harvard.edu>
> To:r-help at r-project.org
> Subject: [R] Using NA as a break point for indicator variable?
> Message-ID:
> 	<CADu+jDpcJUHZTXxrsxyQvjaEmw_N0iLbL6ZJjHZC-rSBCMneiw at
mail.gmail.com>
> Content-Type: text/plain
>
> Hi all,
> I am working with a spatial data set for which I am only interested in high
> concentration values ("leaks"). The low values (<  90th
percentile) have
> already been turned into NA's, leaving me with a matrix like this:
>
> <  CH4_leak
>
>        lon            lat            CH4
> 1  -71.11954 42.35068 2.595834
> 2  -71.11954 42.35068 2.595688
> 3   NA           NA           NA
> 4   NA           NA           NA
> 5   NA           NA           NA
> 6  -71.11948 42.35068 2.435762
> 7  -71.11948 42.35068 2.491003
> 8  NA            NA           NA
> 9  -71.11930 42.35068 2.464475
> 10 -71.11932 42.35068 2.470865
>
> Every time an NA comes up, it means the "leak" is gone, and the
next valid
> value would represent a different leak (at a different location). My goal
> is to tag all of the remaining values with an indicator variable to
> spatially distinguish the leaks. I am envisioning a simple numeric
> indicator such as:
>
>       lon            lat            CH4            leak_num
> 1  -71.11954 42.35068 2.595834   1
> 2  -71.11954 42.35068 2.595688   1
> 3   NA           NA           NA             NA
> 4   NA           NA           NA             NA
> 5   NA           NA           NA             NA
> 6  -71.11948 42.35068 2.435762   2
> 7  -71.11948 42.35068 2.491003   2
> 8  NA            NA           NA             NA
> 9  -71.11930 42.35068 2.064475   3
> 10 -71.11932 42.35068 2.070865  3
>
> Does anyone have any thoughts on how to code this, perhaps using the NA
> values as a "break point"? The data set is far too large to do
this
> manually, and I must admit I'm completely at a loss. Any help would be
much
> appreciated! Best,
>
> Max
>
> 	[[alternative HTML version deleted]]
>
>

William Dunlap

2012-May-24 14:59 UTC

head link

[R] Using NA as a break point for indicator variable?

> Does anyone have any thoughts on how to code this, perhaps using the NA
> values as a "break point"?
You can count the cumulative number of NA breakpoints in a vector
with cumsum(is.na(vector)), as in

  > cbind(d, LeakNo=with(d, cumsum(is.na(lon)|is.na(lat)|is.na(CH4))))
           lon      lat      CH4 LeakNo
  1  -71.11954 42.35068 2.595834      0
  2  -71.11954 42.35068 2.595688      0
  3         NA       NA       NA      1
  4         NA       NA       NA      2
  5         NA       NA       NA      3
  6  -71.11948 42.35068 2.435762      3
  7  -71.11948 42.35068 2.491003      3
  8         NA       NA       NA      4
  9  -71.11930 42.35068 2.464475      4
  10 -71.11932 42.35068 2.470865      4

Add 1 if you want to start with 1.  If you only want to increase the count
after each sequence of NA's then you could use rle() or
  > na <- with(d, is.na(lon)|is.na(lat)|is.na(CH4))
  > cbind(d, LeakNo=cumsum(c(TRUE, na[-1] < na[-length(na)])))
           lon      lat      CH4 LeakNo
  1  -71.11954 42.35068 2.595834      1
  2  -71.11954 42.35068 2.595688      1
  3         NA       NA       NA      1
  4         NA       NA       NA      1
  5         NA       NA       NA      1
  6  -71.11948 42.35068 2.435762      2
  7  -71.11948 42.35068 2.491003      2
  8         NA       NA       NA      2
  9  -71.11930 42.35068 2.464475      3
  10 -71.11932 42.35068 2.470865      3

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On Behalf
> Of Max Brondfield
> Sent: Wednesday, May 23, 2012 1:42 PM
> To: r-help at r-project.org
> Subject: [R] Using NA as a break point for indicator variable?
> 
> Hi all,
> I am working with a spatial data set for which I am only interested in high
> concentration values ("leaks"). The low values (< 90th
percentile) have
> already been turned into NA's, leaving me with a matrix like this:
> 
> < CH4_leak
> 
>       lon            lat            CH4
> 1  -71.11954 42.35068 2.595834
> 2  -71.11954 42.35068 2.595688
> 3   NA           NA           NA
> 4   NA           NA           NA
> 5   NA           NA           NA
> 6  -71.11948 42.35068 2.435762
> 7  -71.11948 42.35068 2.491003
> 8  NA            NA           NA
> 9  -71.11930 42.35068 2.464475
> 10 -71.11932 42.35068 2.470865
> 
> Every time an NA comes up, it means the "leak" is gone, and the
next valid
> value would represent a different leak (at a different location). My goal
> is to tag all of the remaining values with an indicator variable to
> spatially distinguish the leaks. I am envisioning a simple numeric
> indicator such as:
> 
>      lon            lat            CH4            leak_num
> 1  -71.11954 42.35068 2.595834   1
> 2  -71.11954 42.35068 2.595688   1
> 3   NA           NA           NA             NA
> 4   NA           NA           NA             NA
> 5   NA           NA           NA             NA
> 6  -71.11948 42.35068 2.435762   2
> 7  -71.11948 42.35068 2.491003   2
> 8  NA            NA           NA             NA
> 9  -71.11930 42.35068 2.064475   3
> 10 -71.11932 42.35068 2.070865  3
> 
> Does anyone have any thoughts on how to code this, perhaps using the NA
> values as a "break point"? The data set is far too large to do
this
> manually, and I must admit I'm completely at a loss. Any help would be
much
> appreciated! Best,
> 
> Max
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more maybe matching threads

R help - May 2012 - Using NA as a break point for indicator variable?

[R] Using NA as a break point for indicator variable?

[R] Using NA as a break point for indicator variable?

[R] Using NA as a break point for indicator variable?

Possibly Parallel Threads