Hello, my data is sorted by start.ens (see below). And now I would like to extract all rows (so called* defined row*s) with type==Expression - subset (df, type==Expression) - and the aforegoing type==DNase HS (which is not necessarly row n-1 - assumung that the defined row is n). I dont know how to add this to my subset command. Is that possible? Thanks Hermann> dfstart.ens fc.trans type end.ens peak end.grcm38 dpeak 1 9191942 0.9379 Expresssion NA NA NA NA 2 9191942 0.9741 Expresssion NA NA NA NA 3 9191942 0.9748 Expresssion NA NA NA NA 4 9195570 NA DNase HS NA NA 9195792 109 5 9579854 NA DNase HS NA NA 9580110 131 6 11088023 NA p300 11088523 7 NA NA 7 11113787 NA DNase HS NA NA 11114262 279 8 11114744 0.9803 Expresssion NA NA NA NA 9 11114744 0.9904 Expresssion NA NA NA NA 10 11114850 NA DNase HS NA NA 11115400 210 11 11455056 NA DNase HS NA NA 11455381 175 12 11461513 NA DNase HS NA NA 11462571 508 13 11462408 1.0129 Expresssion NA NA NA NA 14 11462408 1.0074 Expresssion NA NA NA NA 15 11489266 1.0019 Expresssion NA NA NA NA My (test)data:> dput (df)structure(list(start.ens = c(9191942L, 9191942L, 9191942L, 9195570L, 9579854L, 11088023L, 11113787L, 11114744L, 11114744L, 11114850L, 11455056L, 11461513L, 11462408L, 11462408L, 11489266L), fc.trans c(0.9379, 0.9741, 0.9748, NA, NA, NA, NA, 0.9803, 0.9904, NA, NA, NA, 1.0129, 1.0074, 1.0019), type = structure(c(2L, 2L, 2L, 1L, 1L, 3L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("DNase HS", "Expresssion", "p300"), class = "factor"), end.ens = c(NA, NA, NA, NA, NA, 11088523L, NA, NA, NA, NA, NA, NA, NA, NA, NA), peak = c(NA, NA, NA, NA, NA, 7L, NA, NA, NA, NA, NA, NA, NA, NA, NA), end.grcm38 = c(NA, NA, NA, 9195792L, 9580110L, NA, 11114262L, NA, NA, 11115400L, 11455381L, 11462571L, NA, NA, NA), dpeak = c(NA, NA, NA, 109L, 131L, NA, 279L, NA, NA, 210L, 175L, 508L, NA, NA, NA)), .Names c("start.ens", "fc.trans", "type", "end.ens", "peak", "end.grcm38", "dpeak"), row.names c(NA, -15L), class = "data.frame") [[alternative HTML version deleted]]
On Thu, Nov 1, 2012 at 10:28 AM, Hermann Norpois <hnorpois at googlemail.com> wrote:> Hello, > > my data is sorted by start.ens (see below). And now I would like to extract > all rows (so called* defined row*s) with type==Expression - subset (df, > type==Expression) - and the aforegoing type==DNase HS (which is not > necessarly row n-1 - assumung that the defined row is n). I dont know how > to add this to my subset command. > > Is that possible?With enough money and manpower, everything is possible. This one is possible even without a whole lot of manpower or money :) First, get rid of all rows that are neither expression not DNase since you don't seem to want those: df1 = df[ df$type %in% c("Expresssion", "DNase HS"), ]; #Then select all Expression rows and the immediately preceding DNase HS rows: keep.expr = df1$type=="Expresssion"; n = nrow(df1) keep.DNase = c(df1$type[-1]=="Expresssion" & df1$type[-n]=="DNase HS", FALSE) # This is the result you want result = df1[keep.expr | keep.DNase, ]; # Applied to your example: start.ens fc.trans type end.ens peak end.grcm38 dpeak 1 9191942 0.9379 Expresssion NA NA NA NA 2 9191942 0.9741 Expresssion NA NA NA NA 3 9191942 0.9748 Expresssion NA NA NA NA 7 11113787 NA DNase HS NA NA 11114262 279 8 11114744 0.9803 Expresssion NA NA NA NA 9 11114744 0.9904 Expresssion NA NA NA NA 12 11461513 NA DNase HS NA NA 11462571 508 13 11462408 1.0129 Expresssion NA NA NA NA 14 11462408 1.0074 Expresssion NA NA NA NA 15 11489266 1.0019 Expresssion NA NA NA NA I have to say though, the programming would be easier if you didn't spell expression with a triple s :) HTH, Peter> Thanks Hermann > >> df > start.ens fc.trans type end.ens peak end.grcm38 dpeak > 1 9191942 0.9379 Expresssion NA NA NA NA > 2 9191942 0.9741 Expresssion NA NA NA NA > 3 9191942 0.9748 Expresssion NA NA NA NA > 4 9195570 NA DNase HS NA NA 9195792 109 > 5 9579854 NA DNase HS NA NA 9580110 131 > 6 11088023 NA p300 11088523 7 NA NA > 7 11113787 NA DNase HS NA NA 11114262 279 > 8 11114744 0.9803 Expresssion NA NA NA NA > 9 11114744 0.9904 Expresssion NA NA NA NA > 10 11114850 NA DNase HS NA NA 11115400 210 > 11 11455056 NA DNase HS NA NA 11455381 175 > 12 11461513 NA DNase HS NA NA 11462571 508 > 13 11462408 1.0129 Expresssion NA NA NA NA > 14 11462408 1.0074 Expresssion NA NA NA NA > 15 11489266 1.0019 Expresssion NA NA NA NA > > My (test)data: >> dput (df) > structure(list(start.ens = c(9191942L, 9191942L, 9191942L, 9195570L, > 9579854L, 11088023L, 11113787L, 11114744L, 11114744L, 11114850L, > 11455056L, 11461513L, 11462408L, 11462408L, 11489266L), fc.trans > c(0.9379, > 0.9741, 0.9748, NA, NA, NA, NA, 0.9803, 0.9904, NA, NA, NA, 1.0129, > 1.0074, 1.0019), type = structure(c(2L, 2L, 2L, 1L, 1L, 3L, 1L, > 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("DNase HS", "Expresssion", > "p300"), class = "factor"), end.ens = c(NA, NA, NA, NA, NA, 11088523L, > NA, NA, NA, NA, NA, NA, NA, NA, NA), peak = c(NA, NA, NA, NA, > NA, 7L, NA, NA, NA, NA, NA, NA, NA, NA, NA), end.grcm38 = c(NA, > NA, NA, 9195792L, 9580110L, NA, 11114262L, NA, NA, 11115400L, > 11455381L, 11462571L, NA, NA, NA), dpeak = c(NA, NA, NA, 109L, > 131L, NA, 279L, NA, NA, 210L, 175L, 508L, NA, NA, NA)), .Names > c("start.ens", > "fc.trans", "type", "end.ens", "peak", "end.grcm38", "dpeak"), row.names > c(NA, > -15L), class = "data.frame") > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hello, A bit confusing: " I would like to extract all rows (so called* defined row*s) with type==Expression - subset (df, type==Expression) - and the aforegoing type==DNase HS (which is not necessarly row n-1 - assumung that the defined row is n" In the dataset, there is "Expresssion" for column "type". If you want to subset all the rows having "Expresssion" or "DNaseHS" res<- subset(df,type=="Expresssion"|type=="DNase HS") head(res) #? start.ens fc.trans??????? type end.ens peak end.grcm38 dpeak #1?? 9191942?? 0.9379 Expresssion????? NA?? NA???????? NA??? NA #2?? 9191942?? 0.9741 Expresssion????? NA?? NA???????? NA??? NA #3?? 9191942?? 0.9748 Expresssion????? NA?? NA???????? NA??? NA #4?? 9195570?????? NA??? DNase HS????? NA?? NA??? 9195792?? 109 #5?? 9579854?????? NA??? DNase HS????? NA?? NA??? 9580110?? 131 #7? 11113787?????? NA??? DNase HS????? NA?? NA?? 11114262?? 279 If you don't want those rows: subset(df,type!="Expresssion"&type!="DNase HS") #? start.ens fc.trans type? end.ens peak end.grcm38 dpeak #6? 11088023?????? NA p300 11088523??? 7???????? NA??? NA A.K. ----- Original Message ----- From: Hermann Norpois <hnorpois at googlemail.com> To: r-help at r-project.org Cc: Sent: Thursday, November 1, 2012 1:28 PM Subject: [R] subset a defined row plus the aforegoing Hello, my data is sorted by start.ens (see below). And now I would like to extract all rows (so called* defined row*s) with type==Expression - subset (df, type==Expression) - and the aforegoing type==DNase HS (which is not necessarly row n-1 - assumung that the defined row is n). I dont know how to add this to my subset command. Is that possible? Thanks Hermann> df? start.ens fc.trans? ? ? ? type? end.ens peak end.grcm38 dpeak 1? ? 9191942? 0.9379 Expresssion? ? ? NA? NA? ? ? ? NA? ? NA 2? ? 9191942? 0.9741 Expresssion? ? ? NA? NA? ? ? ? NA? ? NA 3? ? 9191942? 0.9748 Expresssion? ? ? NA? NA? ? ? ? NA? ? NA 4? ? 9195570? ? ? NA? ? DNase HS? ? ? NA? NA? ? 9195792? 109 5? ? 9579854? ? ? NA? ? DNase HS? ? ? NA? NA? ? 9580110? 131 6? 11088023? ? ? NA? ? ? ? p300 11088523? ? 7? ? ? ? NA? ? NA 7? 11113787? ? ? NA? ? DNase HS? ? ? NA? NA? 11114262? 279 8? 11114744? 0.9803 Expresssion? ? ? NA? NA? ? ? ? NA? ? NA 9? 11114744? 0.9904 Expresssion? ? ? NA? NA? ? ? ? NA? ? NA 10? 11114850? ? ? NA? ? DNase HS? ? ? NA? NA? 11115400? 210 11? 11455056? ? ? NA? ? DNase HS? ? ? NA? NA? 11455381? 175 12? 11461513? ? ? NA? ? DNase HS? ? ? NA? NA? 11462571? 508 13? 11462408? 1.0129 Expresssion? ? ? NA? NA? ? ? ? NA? ? NA 14? 11462408? 1.0074 Expresssion? ? ? NA? NA? ? ? ? NA? ? NA 15? 11489266? 1.0019 Expresssion? ? ? NA? NA? ? ? ? NA? ? NA My (test)data:> dput (df)structure(list(start.ens = c(9191942L, 9191942L, 9191942L, 9195570L, 9579854L, 11088023L, 11113787L, 11114744L, 11114744L, 11114850L, 11455056L, 11461513L, 11462408L, 11462408L, 11489266L), fc.trans c(0.9379, 0.9741, 0.9748, NA, NA, NA, NA, 0.9803, 0.9904, NA, NA, NA, 1.0129, 1.0074, 1.0019), type = structure(c(2L, 2L, 2L, 1L, 1L, 3L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("DNase HS", "Expresssion", "p300"), class = "factor"), end.ens = c(NA, NA, NA, NA, NA, 11088523L, NA, NA, NA, NA, NA, NA, NA, NA, NA), peak = c(NA, NA, NA, NA, NA, 7L, NA, NA, NA, NA, NA, NA, NA, NA, NA), end.grcm38 = c(NA, NA, NA, 9195792L, 9580110L, NA, 11114262L, NA, NA, 11115400L, 11455381L, 11462571L, NA, NA, NA), dpeak = c(NA, NA, NA, 109L, 131L, NA, 279L, NA, NA, 210L, 175L, 508L, NA, NA, NA)), .Names c("start.ens", "fc.trans", "type", "end.ens", "peak", "end.grcm38", "dpeak"), row.names c(NA, -15L), class = "data.frame") ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.