thr3ads.net - R help - [R] subset a defined row plus the aforegoing [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Hermann Norpois

2012-Nov-01 17:28 UTC

[R] subset a defined row plus the aforegoing

Hello,

my data is sorted by start.ens (see below). And now I would like to extract
all rows (so called* defined row*s) with type==Expression - subset (df,
type==Expression) - and the aforegoing type==DNase HS (which is not
necessarly row n-1 - assumung that the defined row is n). I dont know how
to add this to my subset command.

Is that possible?
Thanks Hermann
> df   start.ens fc.trans        type  end.ens peak end.grcm38 dpeak
1    9191942   0.9379 Expresssion       NA   NA         NA    NA
2    9191942   0.9741 Expresssion       NA   NA         NA    NA
3    9191942   0.9748 Expresssion       NA   NA         NA    NA
4    9195570       NA    DNase HS       NA   NA    9195792   109
5    9579854       NA    DNase HS       NA   NA    9580110   131
6   11088023       NA        p300 11088523    7         NA    NA
7   11113787       NA    DNase HS       NA   NA   11114262   279
8   11114744   0.9803 Expresssion       NA   NA         NA    NA
9   11114744   0.9904 Expresssion       NA   NA         NA    NA
10  11114850       NA    DNase HS       NA   NA   11115400   210
11  11455056       NA    DNase HS       NA   NA   11455381   175
12  11461513       NA    DNase HS       NA   NA   11462571   508
13  11462408   1.0129 Expresssion       NA   NA         NA    NA
14  11462408   1.0074 Expresssion       NA   NA         NA    NA
15  11489266   1.0019 Expresssion       NA   NA         NA    NA

My (test)data:> dput (df)structure(list(start.ens = c(9191942L, 9191942L, 9191942L, 9195570L,
9579854L, 11088023L, 11113787L, 11114744L, 11114744L, 11114850L,
11455056L, 11461513L, 11462408L, 11462408L, 11489266L), fc.trans c(0.9379,
0.9741, 0.9748, NA, NA, NA, NA, 0.9803, 0.9904, NA, NA, NA, 1.0129,
1.0074, 1.0019), type = structure(c(2L, 2L, 2L, 1L, 1L, 3L, 1L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("DNase HS",
"Expresssion",
"p300"), class = "factor"), end.ens = c(NA, NA, NA, NA, NA,
11088523L,
NA, NA, NA, NA, NA, NA, NA, NA, NA), peak = c(NA, NA, NA, NA,
NA, 7L, NA, NA, NA, NA, NA, NA, NA, NA, NA), end.grcm38 = c(NA,
NA, NA, 9195792L, 9580110L, NA, 11114262L, NA, NA, 11115400L,
11455381L, 11462571L, NA, NA, NA), dpeak = c(NA, NA, NA, 109L,
131L, NA, 279L, NA, NA, 210L, 175L, 508L, NA, NA, NA)), .Names
c("start.ens",
"fc.trans", "type", "end.ens", "peak",
"end.grcm38", "dpeak"), row.names c(NA,
-15L), class = "data.frame")

	[[alternative HTML version deleted]]

Peter Langfelder

2012-Nov-01 17:48 UTC

head link

[R] subset a defined row plus the aforegoing

On Thu, Nov 1, 2012 at 10:28 AM, Hermann Norpois
<hnorpois at googlemail.com> wrote:> Hello,
>
> my data is sorted by start.ens (see below). And now I would like to extract
> all rows (so called* defined row*s) with type==Expression - subset (df,
> type==Expression) - and the aforegoing type==DNase HS (which is not
> necessarly row n-1 - assumung that the defined row is n). I dont know how
> to add this to my subset command.
>
> Is that possible?
With enough money and manpower, everything is possible.

This one is possible even without a whole lot of manpower or money :)
First, get rid of all rows that are neither expression not DNase since
you don't seem to want those:

df1 = df[ df$type %in% c("Expresssion", "DNase HS"), ];

#Then select all Expression rows and the immediately preceding DNase HS rows:

keep.expr = df1$type=="Expresssion";
n = nrow(df1)
keep.DNase = c(df1$type[-1]=="Expresssion" &
df1$type[-n]=="DNase HS", FALSE)

# This is the result you want
result = df1[keep.expr | keep.DNase, ];

# Applied to your example:
   start.ens fc.trans        type end.ens peak end.grcm38 dpeak
1    9191942   0.9379 Expresssion      NA   NA         NA    NA
2    9191942   0.9741 Expresssion      NA   NA         NA    NA
3    9191942   0.9748 Expresssion      NA   NA         NA    NA
7   11113787       NA    DNase HS      NA   NA   11114262   279
8   11114744   0.9803 Expresssion      NA   NA         NA    NA
9   11114744   0.9904 Expresssion      NA   NA         NA    NA
12  11461513       NA    DNase HS      NA   NA   11462571   508
13  11462408   1.0129 Expresssion      NA   NA         NA    NA
14  11462408   1.0074 Expresssion      NA   NA         NA    NA
15  11489266   1.0019 Expresssion      NA   NA         NA    NA


I have to say though, the programming would be easier if you didn't
spell expression with a triple s :)

HTH,

Peter
> Thanks Hermann
>
>> df
>    start.ens fc.trans        type  end.ens peak end.grcm38 dpeak
> 1    9191942   0.9379 Expresssion       NA   NA         NA    NA
> 2    9191942   0.9741 Expresssion       NA   NA         NA    NA
> 3    9191942   0.9748 Expresssion       NA   NA         NA    NA
> 4    9195570       NA    DNase HS       NA   NA    9195792   109
> 5    9579854       NA    DNase HS       NA   NA    9580110   131
> 6   11088023       NA        p300 11088523    7         NA    NA
> 7   11113787       NA    DNase HS       NA   NA   11114262   279
> 8   11114744   0.9803 Expresssion       NA   NA         NA    NA
> 9   11114744   0.9904 Expresssion       NA   NA         NA    NA
> 10  11114850       NA    DNase HS       NA   NA   11115400   210
> 11  11455056       NA    DNase HS       NA   NA   11455381   175
> 12  11461513       NA    DNase HS       NA   NA   11462571   508
> 13  11462408   1.0129 Expresssion       NA   NA         NA    NA
> 14  11462408   1.0074 Expresssion       NA   NA         NA    NA
> 15  11489266   1.0019 Expresssion       NA   NA         NA    NA
>
> My (test)data:
>> dput (df)
> structure(list(start.ens = c(9191942L, 9191942L, 9191942L, 9195570L,
> 9579854L, 11088023L, 11113787L, 11114744L, 11114744L, 11114850L,
> 11455056L, 11461513L, 11462408L, 11462408L, 11489266L), fc.trans >
c(0.9379,
> 0.9741, 0.9748, NA, NA, NA, NA, 0.9803, 0.9904, NA, NA, NA, 1.0129,
> 1.0074, 1.0019), type = structure(c(2L, 2L, 2L, 1L, 1L, 3L, 1L,
> 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("DNase HS",
"Expresssion",
> "p300"), class = "factor"), end.ens = c(NA, NA, NA, NA,
NA, 11088523L,
> NA, NA, NA, NA, NA, NA, NA, NA, NA), peak = c(NA, NA, NA, NA,
> NA, 7L, NA, NA, NA, NA, NA, NA, NA, NA, NA), end.grcm38 = c(NA,
> NA, NA, 9195792L, 9580110L, NA, 11114262L, NA, NA, 11115400L,
> 11455381L, 11462571L, NA, NA, NA), dpeak = c(NA, NA, NA, 109L,
> 131L, NA, 279L, NA, NA, 210L, 175L, 508L, NA, NA, NA)), .Names >
c("start.ens",
> "fc.trans", "type", "end.ens",
"peak", "end.grcm38", "dpeak"), row.names >
c(NA,
> -15L), class = "data.frame")
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

arun

2012-Nov-01 17:50 UTC

head link

[R] subset a defined row plus the aforegoing

Hello,

A bit confusing:
" I would like to extract
all rows (so called* defined row*s) with type==Expression - subset (df,
type==Expression) - and the aforegoing type==DNase HS (which is not
necessarly row n-1 - assumung that the defined row is n"


In the dataset, there is "Expresssion" for column "type". If
you want to subset all the rows having "Expresssion" or
"DNaseHS"

res<- subset(df,type=="Expresssion"|type=="DNase HS")
head(res)
#? start.ens fc.trans??????? type end.ens peak end.grcm38 dpeak
#1?? 9191942?? 0.9379 Expresssion????? NA?? NA???????? NA??? NA
#2?? 9191942?? 0.9741 Expresssion????? NA?? NA???????? NA??? NA
#3?? 9191942?? 0.9748 Expresssion????? NA?? NA???????? NA??? NA
#4?? 9195570?????? NA??? DNase HS????? NA?? NA??? 9195792?? 109
#5?? 9579854?????? NA??? DNase HS????? NA?? NA??? 9580110?? 131
#7? 11113787?????? NA??? DNase HS????? NA?? NA?? 11114262?? 279


If you don't want those rows:
subset(df,type!="Expresssion"&type!="DNase HS")
#? start.ens fc.trans type? end.ens peak end.grcm38 dpeak
#6? 11088023?????? NA p300 11088523??? 7???????? NA??? NA
A.K.




----- Original Message -----
From: Hermann Norpois <hnorpois at googlemail.com>
To: r-help at r-project.org
Cc: 
Sent: Thursday, November 1, 2012 1:28 PM
Subject: [R] subset a defined row plus the aforegoing

Hello,

my data is sorted by start.ens (see below). And now I would like to extract
all rows (so called* defined row*s) with type==Expression - subset (df,
type==Expression) - and the aforegoing type==DNase HS (which is not
necessarly row n-1 - assumung that the defined row is n). I dont know how
to add this to my subset command.

Is that possible?
Thanks Hermann
> df?  start.ens fc.trans? ? ? ? type? end.ens peak end.grcm38 dpeak
1? ? 9191942?  0.9379 Expresssion? ? ?  NA?  NA? ? ? ?  NA? ? NA
2? ? 9191942?  0.9741 Expresssion? ? ?  NA?  NA? ? ? ?  NA? ? NA
3? ? 9191942?  0.9748 Expresssion? ? ?  NA?  NA? ? ? ?  NA? ? NA
4? ? 9195570? ? ?  NA? ? DNase HS? ? ?  NA?  NA? ? 9195792?  109
5? ? 9579854? ? ?  NA? ? DNase HS? ? ?  NA?  NA? ? 9580110?  131
6?  11088023? ? ?  NA? ? ? ? p300 11088523? ? 7? ? ? ?  NA? ? NA
7?  11113787? ? ?  NA? ? DNase HS? ? ?  NA?  NA?  11114262?  279
8?  11114744?  0.9803 Expresssion? ? ?  NA?  NA? ? ? ?  NA? ? NA
9?  11114744?  0.9904 Expresssion? ? ?  NA?  NA? ? ? ?  NA? ? NA
10? 11114850? ? ?  NA? ? DNase HS? ? ?  NA?  NA?  11115400?  210
11? 11455056? ? ?  NA? ? DNase HS? ? ?  NA?  NA?  11455381?  175
12? 11461513? ? ?  NA? ? DNase HS? ? ?  NA?  NA?  11462571?  508
13? 11462408?  1.0129 Expresssion? ? ?  NA?  NA? ? ? ?  NA? ? NA
14? 11462408?  1.0074 Expresssion? ? ?  NA?  NA? ? ? ?  NA? ? NA
15? 11489266?  1.0019 Expresssion? ? ?  NA?  NA? ? ? ?  NA? ? NA

My (test)data:> dput (df)structure(list(start.ens = c(9191942L, 9191942L, 9191942L, 9195570L,
9579854L, 11088023L, 11113787L, 11114744L, 11114744L, 11114850L,
11455056L, 11461513L, 11462408L, 11462408L, 11489266L), fc.trans c(0.9379,
0.9741, 0.9748, NA, NA, NA, NA, 0.9803, 0.9904, NA, NA, NA, 1.0129,
1.0074, 1.0019), type = structure(c(2L, 2L, 2L, 1L, 1L, 3L, 1L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("DNase HS",
"Expresssion",
"p300"), class = "factor"), end.ens = c(NA, NA, NA, NA, NA,
11088523L,
NA, NA, NA, NA, NA, NA, NA, NA, NA), peak = c(NA, NA, NA, NA,
NA, 7L, NA, NA, NA, NA, NA, NA, NA, NA, NA), end.grcm38 = c(NA,
NA, NA, 9195792L, 9580110L, NA, 11114262L, NA, NA, 11115400L,
11455381L, 11462571L, NA, NA, NA), dpeak = c(NA, NA, NA, 109L,
131L, NA, 279L, NA, NA, 210L, 175L, 508L, NA, NA, NA)), .Names
c("start.ens",
"fc.trans", "type", "end.ens", "peak",
"end.grcm38", "dpeak"), row.names c(NA,
-15L), class = "data.frame")

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Nov 2012 - subset a defined row plus the aforegoing

[R] subset a defined row plus the aforegoing

[R] subset a defined row plus the aforegoing

[R] subset a defined row plus the aforegoing

Seemingly Similar Threads