Hi R Experts, About the data: My data consists of people (ID) with years of service (Yos) for each year. An ID can appear multiple times. The data is sorted by ID then by Year. Problem: I need to extract ID data with non-sequential YoS rows. For example below that would be all rows for ID 33 and 16 since they have a non-sequential YoS. To accomplish this I figured I could create a column called 'CheckVal' that takes current row YoS minus previous row YoS. The first instance for each ID will be 0. 'CheckVal' in the below data set was created in Excel. I want to know how to do this in R. Is there a package I can use or specific function or set of functions I can use to accomplish this? #My data looks like:> testSeqID Year YoS CheckVal dept 1 12 2010 1.1 0.0 A 2 12 2011 2.1 1.0 A 3 44 2009 1.4 0.0 C 4 44 2010 2.4 1.0 C 5 44 2011 3.4 1.0 B 6 33 2009 2.3 0.0 A 7 33 2010 4.4 2.1 A 8 16 2009 1.6 0.0 B 9 16 2010 2.6 1.0 B 10 16 2011 5.6 3.0 C 11 16 2012 6.6 1.0 A #here is dput of data for R Structure(list(ID = c(12, 12, 44, 44, 44, 33, 33, 16, 16, 16, 16), Year = c(2010, 2011, 2009, 2010, 2011, 2009, 2010, 2009, 2010, 2011, 2012), YoS = c(1.1, 2.1, 1.4, 2.4, 3.4, 2.3, 4.4, 1.6, 2.6, 5.6, 6.6), CheckVal = c(0, 1, 0, 1, 1, 0, 2.1, 0, 1, 3, 1), dept = structure(c(1L, 1L, 3L, 3L, 2L, 1L, 1L, 2L, 2L, 3L, 1L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("ID", "Year", "YoS", "CheckVal", "dept"), row.names = c(NA, 11L), class = "data.frame") Dan Workforce Analyst LLNL [[alternative HTML version deleted]]
Dan, Does this do it? ## where dt is the data tmp <- split(dt, dt$ID) foo <- lapply(tmp, function(x) any(diff(x$YoS) > 1)) foo <- data.frame( ID=names(foo), gap=unlist(foo)) Note that I ignored dept. Little hard to see how YoS can increase by more than one when the year increases by only one ... unless this is a search for erroneous data. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 11/21/13 3:32 PM, "Lopez, Dan" <lopez235 at llnl.gov> wrote:>Hi R Experts, > >About the data: >My data consists of people (ID) with years of service (Yos) for each >year. An ID can appear multiple times. >The data is sorted by ID then by Year. > >Problem: >I need to extract ID data with non-sequential YoS rows. For example below >that would be all rows for ID 33 and 16 since they have a non-sequential >YoS. >To accomplish this I figured I could create a column called 'CheckVal' >that takes current row YoS minus previous row YoS. The first instance for >each ID will be 0. 'CheckVal' in the below data set was created in Excel. >I want to know how to do this in R. >Is there a package I can use or specific function or set of functions I >can use to accomplish this? > >#My data looks like: >> testSeq > > ID Year YoS CheckVal dept > >1 12 2010 1.1 0.0 A > >2 12 2011 2.1 1.0 A > >3 44 2009 1.4 0.0 C > >4 44 2010 2.4 1.0 C > >5 44 2011 3.4 1.0 B > >6 33 2009 2.3 0.0 A > >7 33 2010 4.4 2.1 A > >8 16 2009 1.6 0.0 B > >9 16 2010 2.6 1.0 B > >10 16 2011 5.6 3.0 C > >11 16 2012 6.6 1.0 A > >#here is dput of data for R > >Structure(list(ID = c(12, 12, 44, 44, 44, 33, 33, 16, 16, 16, > >16), Year = c(2010, 2011, 2009, 2010, 2011, 2009, 2010, 2009, > >2010, 2011, 2012), YoS = c(1.1, 2.1, 1.4, 2.4, 3.4, 2.3, 4.4, > >1.6, 2.6, 5.6, 6.6), CheckVal = c(0, 1, 0, 1, 1, 0, 2.1, 0, 1, > >3, 1), dept = structure(c(1L, 1L, 3L, 3L, 2L, 1L, 1L, 2L, 2L, > >3L, 1L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("ID", > >"Year", "YoS", "CheckVal", "dept"), row.names = c(NA, 11L), class >"data.frame") > >Dan >Workforce Analyst >LLNL > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Hi, You may try: ?with(testSeq,ave(YoS,ID,FUN=function(x) c(0,diff(x)))) # [1] 0.0 1.0 0.0 1.0 1.0 0.0 2.1 0.0 1.0 3.0 1.0 ?testSeq[!!(with(testSeq,ave(YoS,ID,FUN=function(x) any(c(0,diff(x))>1)))),] A.K. On Thursday, November 21, 2013 6:55 PM, "Lopez, Dan" <lopez235 at llnl.gov> wrote: Hi R Experts, About the data: My data consists of people (ID) with years of service (Yos) for each year. An ID can appear multiple times. The data is sorted by ID then by Year. Problem: I need to extract ID data with non-sequential YoS rows. For example below that would be all rows for ID 33 and 16 since they have a non-sequential YoS. To accomplish this I figured I could create a column called 'CheckVal' that takes current row YoS minus previous row YoS. The first instance for each ID will be 0. 'CheckVal' in the below data set was created in Excel. I want to know how to do this in R. Is there a package I can use or specific function or set of functions I can use to accomplish this? #My data looks like:> testSeq? ID Year YoS CheckVal dept 1? 12 2010 1.1? ? ? 0.0? ? A 2? 12 2011 2.1? ? ? 1.0? ? A 3? 44 2009 1.4? ? ? 0.0? ? C 4? 44 2010 2.4? ? ? 1.0? ? C 5? 44 2011 3.4? ? ? 1.0? ? B 6? 33 2009 2.3? ? ? 0.0? ? A 7? 33 2010 4.4? ? ? 2.1? ? A 8? 16 2009 1.6? ? ? 0.0? ? B 9? 16 2010 2.6? ? ? 1.0? ? B 10 16 2011 5.6? ? ? 3.0? ? C 11 16 2012 6.6? ? ? 1.0? ? A #here is dput of data for R Structure(list(ID = c(12, 12, 44, 44, 44, 33, 33, 16, 16, 16, 16), Year = c(2010, 2011, 2009, 2010, 2011, 2009, 2010, 2009, 2010, 2011, 2012), YoS = c(1.1, 2.1, 1.4, 2.4, 3.4, 2.3, 4.4, 1.6, 2.6, 5.6, 6.6), CheckVal = c(0, 1, 0, 1, 1, 0, 2.1, 0, 1, 3, 1), dept = structure(c(1L, 1L, 3L, 3L, 2L, 1L, 1L, 2L, 2L, 3L, 1L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("ID", "Year", "YoS", "CheckVal", "dept"), row.names = c(NA, 11L), class = "data.frame") Dan Workforce Analyst LLNL ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Reiner
2013-Nov-22 16:16 UTC
[R] [SPAM] - Re: How do I identify non-sequential data? - Found word(s) list error in the Text body
Similar to Don MacQueen's: unsplit(lapply(split(DF, DF$ID), transform, cv = c(0, diff(YoS))), DF$ID) -- David Reiner -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Lopez, Dan Sent: Thursday, November 21, 2013 6:38 PM To: MacQueen, Don Cc: R help (r-help at r-project.org) Subject: [SPAM] - Re: [R] How do I identify non-sequential data? - Found word(s) list error in the Text body Hi Don, Yes, I am error checking a dataset produced by a query. Most likely a problem with the query but wanted to assess the problem first. BTW Arun provided another solution which is similar to yours but uses the function ave instead: testSeq[!!(with(testSeq,ave(YoS,ID,FUN=function(x) any(c(0,diff(x))>1)))),] I appreciate your response on this. Dan -----Original Message----- From: MacQueen, Don Sent: Thursday, November 21, 2013 3:58 PM To: Lopez, Dan; R help (r-help at r-project.org) Subject: Re: [R] How do I identify non-sequential data? Dan, Does this do it? ## where dt is the data tmp <- split(dt, dt$ID) foo <- lapply(tmp, function(x) any(diff(x$YoS) > 1)) foo <- data.frame( ID=names(foo), gap=unlist(foo)) Note that I ignored dept. Little hard to see how YoS can increase by more than one when the year increases by only one ... unless this is a search for erroneous data. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 11/21/13 3:32 PM, "Lopez, Dan" <lopez235 at llnl.gov> wrote:>Hi R Experts, > >About the data: >My data consists of people (ID) with years of service (Yos) for each >year. An ID can appear multiple times. >The data is sorted by ID then by Year. > >Problem: >I need to extract ID data with non-sequential YoS rows. For example >below that would be all rows for ID 33 and 16 since they have a >non-sequential YoS. >To accomplish this I figured I could create a column called 'CheckVal' >that takes current row YoS minus previous row YoS. The first instance >for each ID will be 0. 'CheckVal' in the below data set was created in Excel. >I want to know how to do this in R. >Is there a package I can use or specific function or set of functions I >can use to accomplish this? > >#My data looks like: >> testSeq > > ID Year YoS CheckVal dept > >1 12 2010 1.1 0.0 A > >2 12 2011 2.1 1.0 A > >3 44 2009 1.4 0.0 C > >4 44 2010 2.4 1.0 C > >5 44 2011 3.4 1.0 B > >6 33 2009 2.3 0.0 A > >7 33 2010 4.4 2.1 A > >8 16 2009 1.6 0.0 B > >9 16 2010 2.6 1.0 B > >10 16 2011 5.6 3.0 C > >11 16 2012 6.6 1.0 A > >#here is dput of data for R > >Structure(list(ID = c(12, 12, 44, 44, 44, 33, 33, 16, 16, 16, > >16), Year = c(2010, 2011, 2009, 2010, 2011, 2009, 2010, 2009, > >2010, 2011, 2012), YoS = c(1.1, 2.1, 1.4, 2.4, 3.4, 2.3, 4.4, > >1.6, 2.6, 5.6, 6.6), CheckVal = c(0, 1, 0, 1, 1, 0, 2.1, 0, 1, > >3, 1), dept = structure(c(1L, 1L, 3L, 3L, 2L, 1L, 1L, 2L, 2L, > >3L, 1L), .Label = c("A", "B", "C"), class = "factor")), .Names >c("ID", > >"Year", "YoS", "CheckVal", "dept"), row.names = c(NA, 11L), class >"data.frame") > >Dan >Workforce Analyst >LLNL > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail and any materials attached hereto, including, without limitation, all content hereof and thereof (collectively, "XR Content") are confidential and proprietary to XR Trading, LLC ("XR") and/or its affiliates, and are protected by intellectual property laws. Without the prior written consent of XR, the XR Content may not (i) be disclosed to any third party or (ii) be reproduced or otherwise used by anyone other than current employees of XR or its affiliates, on behalf of XR or its affiliates. THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY KIND. TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE.