Within a very large matrix composed of a mix of values and NAs, e.g, matrix A: [,1] [,2] [,3] [1,] 1 NA NA [2,] 3 NA NA [3,] 3 10 17 [4,] 4 12 18 [5,] 6 16 19 [6,] 6 22 20 [7,] 5 11 NA I need to be able to consecutively number, in new columns, the non-NA values within each column (i.e. A[1,1] A[3,2] and A[3,3] would all be set to one, and subsequent values in those columns would increase by one, until the last non-NA value is reached, if any). Any ideas? Thanks Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740
Dimitris Rizopoulos
2009-Nov-21 19:50 UTC
[R] consecutive numbering of elements in a matrix
if I understand what you want correctly, then one approach is: A <- matrix(sample(50, 21), 7, 3) A[sample(21, 5)] <- NA A row(A) - apply(is.na(A), 2, cumsum) I hope it helps. Best, Dimitris Jim Bouldin wrote:> Within a very large matrix composed of a mix of values and NAs, e.g, matrix A: > > [,1] [,2] [,3] > [1,] 1 NA NA > [2,] 3 NA NA > [3,] 3 10 17 > [4,] 4 12 18 > [5,] 6 16 19 > [6,] 6 22 20 > [7,] 5 11 NA > > I need to be able to consecutively number, in new columns, the non-NA > values within each column (i.e. A[1,1] A[3,2] and A[3,3] would all be set > to one, and subsequent values in those columns would increase by one, until > the last non-NA value is reached, if any). > > Any ideas? > Thanks > > > Jim Bouldin, PhD > Research Ecologist > Department of Plant Sciences, UC Davis > Davis CA, 95616 > 530-554-1740 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Jim Bouldin > Sent: Saturday, November 21, 2009 10:34 AM > To: r-help at r-project.org > Subject: [R] consecutive numbering of elements in a matrix > > > Within a very large matrix composed of a mix of values and > NAs, e.g, matrix A: > > [,1] [,2] [,3] > [1,] 1 NA NA > [2,] 3 NA NA > [3,] 3 10 17 > [4,] 4 12 18 > [5,] 6 16 19 > [6,] 6 22 20 > [7,] 5 11 NA > > I need to be able to consecutively number, in new columns, the non-NA > values within each column (i.e. A[1,1] A[3,2] and A[3,3] > would all be set > to one, and subsequent values in those columns would increase > by one, until > the last non-NA value is reached, if any).Is this what you are looking for? > numberNonNAsInColumn <- function (A) { for (i in seq_len(ncol(A))) { isNotNA <- !is.na(A[, i]) A[isNotNA, i] <- seq_len(sum(isNotNA)) } A } > numberNonNAsInColumn(A) [,1] [,2] [,3] [1,] 1 NA NA [2,] 2 NA NA [3,] 3 1 1 [4,] 4 2 2 [5,] 5 3 3 [6,] 6 4 4 [7,] 7 5 NA > numberNonNAsInColumn(cbind(c(101,NA,102,103,NA,NA,104), c(1001,1002,1003,NA,1004,1005,1006))) [,1] [,2] [1,] 1 1 [2,] NA 2 [3,] 2 3 [4,] 3 NA [5,] NA 4 [6,] NA 5 [7,] 4 6 I didn't know what you wanted to do if there were NA's in the middle of a column. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> > Any ideas? > Thanks > > > Jim Bouldin, PhD > Research Ecologist > Department of Plant Sciences, UC Davis > Davis CA, 95616 > 530-554-1740 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Thank you and apologies--I did not make it clear that there are no NAs mixed in with the valid values. Rather, they all occur consecutively, either toward the beginning of end of the column. Jim> I didn't know what you wanted to do if there were NA's > in the middle of a column. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > > > > > Any ideas? > > Thanks > > > > > > Jim Bouldin, PhD > > Research Ecologist > > Department of Plant Sciences, UC Davis > > Davis CA, 95616 > > 530-554-1740 > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > >Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740
Many thanks to Dimitris, William and David for very helpful answers which solved my problem. Being a relatve newb, I am confused by something in the solutions by Dimitris and David. #Create a matrix A as follows:> A <- matrix(sample(50, 21), 7, 3) > A[sample(21, 5)] <- NA;A[,1] [,2] [,3] [1,] 36 38 24 [2,] 6 33 13 [3,] 12 42 10 [4,] 7 NA NA [5,] 48 NA NA [6,] 3 NA 47 [7,] 29 23 4> B = row(A) - apply(is.na(A), 2, cumsum);B[,1] [,2] [,3] [1,] 1 1 1 [2,] 2 2 2 [3,] 3 3 3 [4,] 4 3 3 [5,] 5 3 3 [6,] 6 3 4 [7,] 7 4 5 #But:> B = row(A) - apply(!is.na(A), 2, cumsum);B[,1] [,2] [,3] [1,] 0 0 0 [2,] 0 0 0 [3,] 0 0 0 [4,] 0 1 1 [5,] 0 2 2 [6,] 0 3 2 [7,] 0 3 2 This seems exactly backwards to me. The is.na(A) command should be cumulatively summing the NA values and !is.na(A) should be doing so on the non-NA values. But the opposite is the case. I'm glad I have a solution but this apparent backwardness of expected logic has me worried. I do have another, tougher question if anyone has the time, which is, given a resulting matrix like B below:> is.na(B) <- is.na(A);B[,1] [,2] [,3] [1,] 1 1 1 [2,] 2 2 2 [3,] 3 3 3 [4,] 4 NA NA [5,] 5 NA NA [6,] 6 NA 4 [7,] 7 4 5 how can I rearrange all the columns so that equal values are in the same row, i.e. in the case above, the NA values are removed from columns 2 and 3 and all non-NA values that had been below them are moved up to replace them. Thanks again for your help. Jim
> And think about the fact that row(A) and apply(is.na(A), 2, cumsum) > will be identical in the case where there are no NAs, so their > difference would be a zero matrix. Double negativism strikes again.... > not(is.na) == "is"OK I see it now--thanks. I was interpreting the apply function incorrectly in terms of what it was summing.> You cannot have unequal length columns in a matrix. Only a list is > able to handle that task. So we need a more clear description of what > you expect, preferably typed out in full so we can "see" it.Given a matrix B like before, which has NAs mixed with integers in all columns, where those NAs may occur anywhere within the columns, and where the integers within a column are always consecutive and increasing:> B[,1] [,2] [,3] ...etc [1,] 1 1 1 [2,] 2 2 2 [3,] 3 3 3 [4,] 4 NA NA [5,] 5 NA NA [6,] 6 NA 4 [7,] NA 4 5 etc I would like to create a new matrix, in which all NAs that occur BETWEEN consecutive integers are removed, and the integers which follow such NAs are moved "up" in the column to replace them. NAs which occur near the bottom of each column, and are NOT followed by more integers can be retained without problem. Empty spaces that might result from this process, near the column bottoms as the integers are moved up, would need to be replaced by NAs so that equal numbers of entries are maintained in each row, hence still allowing a matrix to exist: If B above were in fact the complete matrix, the desired result would thus be: [,1] [,2] [,3] etc [1,] 1 1 1 [2,] 2 2 2 [3,] 3 3 3 [4,] 4 4 4 [5,] 5 NA 5 [6,] 6 NA NA [7,] NA NA NA etc In other words, all integers of a particular value in the original matrix need to be placed on the same row of a new matrix, and all "empty" values replaced with NA. I hope that explains it well enough, but will try again if not. Thanks again for any help. Jim
Dimitris Rizopoulos
2009-Nov-22 19:00 UTC
[R] consecutive numbering of elements in a matrix
one approach is the following: B <- cbind(c(1:6, NA), c(1:3, NA,NA,NA, 4), c(1:3, NA,NA, 4,5)) matrix(B[order(col(B), B)], nrow(B), ncol(B)) I hope it helps. Best, Dimitris Jim Bouldin wrote:>> And think about the fact that row(A) and apply(is.na(A), 2, cumsum) >> will be identical in the case where there are no NAs, so their >> difference would be a zero matrix. Double negativism strikes again.... >> not(is.na) == "is" > > OK I see it now--thanks. I was interpreting the apply function incorrectly > in terms of what it was summing. > >> You cannot have unequal length columns in a matrix. Only a list is >> able to handle that task. So we need a more clear description of what >> you expect, preferably typed out in full so we can "see" it. > > Given a matrix B like before, which has NAs mixed with integers in all > columns, where those NAs may occur anywhere within the columns, and where > the integers within a column are always consecutive and increasing: > >> B > [,1] [,2] [,3] ...etc > [1,] 1 1 1 > [2,] 2 2 2 > [3,] 3 3 3 > [4,] 4 NA NA > [5,] 5 NA NA > [6,] 6 NA 4 > [7,] NA 4 5 > etc > > I would like to create a new matrix, in which all NAs that occur BETWEEN > consecutive integers are removed, and the integers which follow such NAs > are moved "up" in the column to replace them. NAs which occur near the > bottom of each column, and are NOT followed by more integers can be > retained without problem. Empty spaces that might result from this > process, near the column bottoms as the integers are moved up, would need > to be replaced by NAs so that equal numbers of entries are maintained in > each row, hence still allowing a matrix to exist: > > If B above were in fact the complete matrix, the desired result would thus be: > > [,1] [,2] [,3] etc > [1,] 1 1 1 > [2,] 2 2 2 > [3,] 3 3 3 > [4,] 4 4 4 > [5,] 5 NA 5 > [6,] 6 NA NA > [7,] NA NA NA > etc > > In other words, all integers of a particular value in the original matrix > need to be placed on the same row of a new matrix, and all "empty" values > replaced with NA. I hope that explains it well enough, but will try again > if not. Thanks again for any help. > Jim > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
Thank you Dimitris, that solves it exactly! I continue to be amazed at how a single line of code can be so powerful in R, containing so much information. Hard as hell to interpret though (for me). Jim> one approach is the following: > > B <- cbind(c(1:6, NA), c(1:3, NA,NA,NA, 4), c(1:3, NA,NA, 4,5)) > matrix(B[order(col(B), B)], nrow(B), ncol(B)) > > > I hope it helps. > > Best, > DimitrisJim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740