thr3ads.net - R help - [R] subsets [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Den

2011-Jan-20 08:53 UTC

[R] subsets

Dear R people
Could you please help.

Basically, there are two variables in my data set. Each patient ('id')
may have one or more diseases ('diagnosis'). It looks like 

id	diagnosis
1	ah
2	ah
2	ihd
2	im
3	ah
3	stroke
4	ah
4	ihd
4	angina
5	ihd
..............
Q: How to make three data sets:
	1. Patients with ah and ihd
 	2. Patients with ah but no ihd
	3. Patients with  ihd but no ah?

 If you have any ideas could just guide what should I look for. Is a
subset or aggregate, or loops, or something else??? I am a bit lost. (F1
F1 F1 !!!:)
Thank you

Ivan Calandra

2011-Jan-20 09:10 UTC

head link

[R] subsets

Hi!

I think you should read the intro to R, as well as ?"[" and ?subset.
It
should help you to understand.

Let's say your data is in a data.frame called df:
# 1. ah and ihd
df_ah_ihd <- df[df$diagnosis=="ah" | df$diagnosis=="ihd",
]  ## the "|"
is the boolean OR (you want one OR the other). Note the last comma

#2. ah
df_ah <- df[df$diagnosis=="ah", ]

#3. ihd
df_ihd <- df[df$diagnosis=="ihd", ]

You could do the same using subset() if you feel better with this function.

HTH,
Ivan

Le 1/20/2011 09:53, Den a ?crit :> Dear R people
> Could you please help.
>
> Basically, there are two variables in my data set. Each patient
('id')
> may have one or more diseases ('diagnosis'). It looks like
>
> id	diagnosis
> 1	ah
> 2	ah
> 2	ihd
> 2	im
> 3	ah
> 3	stroke
> 4	ah
> 4	ihd
> 4	angina
> 5	ihd
> ..............
> Q: How to make three data sets:
> 	1. Patients with ah and ihd
>   	2. Patients with ah but no ihd
> 	3. Patients with  ihd but no ah?
>
>   If you have any ideas could just guide what should I look for. Is a
> subset or aggregate, or loops, or something else??? I am a bit lost. (F1
> F1 F1 !!!:)
> Thank you
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. S?ugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calandra at uni-hamburg.de

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

Taras Zakharko

2011-Jan-20 10:05 UTC

head link

[R] subsets

Hello Den,

your problem is not as it may seem so Ivan's suggestion is only a partial
answer. I see that each patient can have
more then one diagnosis and I take that you want to isolate patients based on
particular conditions.
Thus, simply looking for "ah" or "idh" as Ivan suggests will
yield patients which can have either of those but not
necessarily patients that have both.

Instead, what one must do is apply the condition to the whole set of diagnosis
associated with each patient.
I think that its done best with the aggregate function. This function splits the
data according to some
factor (in our case it will be the patient id) and performs a routine on each
subset (in our case it will be
a condition test):

ids <- aggregate(diagnosis ~ id, df, function(x) "ah" %in% x
&&  "ihd" %in% x)
ids <- aggregate(diagnosis ~ id, df, function(x) "ah" %in% x
&&  !"ihd" %in% x)
ids <- aggregate(diagnosis ~ id, df, function(x) ! "ah" %in% x
&&  "ihd" %in% x)

Now, ids will contain a data frame like:

id	diagnosis
1	TRUE
2	FALSE
3	FALSE
...

which shows which patients have the set of diagnoses you asked for. You can then
apply these
patients to the original data by something like:

subset(df, id %in% subset(ids, diagnosis == TRUE)$id)

this will extract only patients from the 'ids' data frame  for which 
the diagnosis applies and then extract the associated
diagnosis sets from the original 'df' data frame. 

Hope it helps,

Taras
On Jan 20, 2011, at 9:53 , Den wrote:
> Dear R people
> Could you please help.
> 
> Basically, there are two variables in my data set. Each patient
('id')
> may have one or more diseases ('diagnosis'). It looks like 
> 
> id	diagnosis
> 1	ah
> 2	ah
> 2	ihd
> 2	im
> 3	ah
> 3	stroke
> 4	ah
> 4	ihd
> 4	angina
> 5	ihd
> ..............
> Q: How to make three data sets:
> 	1. Patients with ah and ihd
> 	2. Patients with ah but no ihd
> 	3. Patients with  ihd but no ah?
> 
> If you have any ideas could just guide what should I look for. Is a
> subset or aggregate, or loops, or something else??? I am a bit lost. (F1
> F1 F1 !!!:)
> Thank you
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Henrique Dallazuanna

2011-Jan-20 11:42 UTC

head link

[R] subsets

Try this:

lapply(list(c('ah', 'ihd'), 'ah', 'ihd'),
function(x)subset(aDF, diagnosis
== x))


On Thu, Jan 20, 2011 at 6:53 AM, Den <d.kazakiewicz@gmail.com> wrote:
> Dear R people
> Could you please help.
>
> Basically, there are two variables in my data set. Each patient
('id')
> may have one or more diseases ('diagnosis'). It looks like
>
> id      diagnosis
> 1       ah
> 2       ah
> 2       ihd
> 2       im
> 3       ah
> 3       stroke
> 4       ah
> 4       ihd
> 4       angina
> 5       ihd
> ..............
> Q: How to make three data sets:
>        1. Patients with ah and ihd
>        2. Patients with ah but no ihd
>        3. Patients with  ihd but no ah?
>
>  If you have any ideas could just guide what should I look for. Is a
> subset or aggregate, or loops, or something else??? I am a bit lost. (F1
> F1 F1 !!!:)
> Thank you
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

	[[alternative HTML version deleted]]

Petr Savicky

2011-Jan-20 13:29 UTC

head link

[R] subsets

On Thu, Jan 20, 2011 at 10:53:01AM +0200, Den wrote:> Dear R people
> Could you please help.
> 
> Basically, there are two variables in my data set. Each patient
('id')
> may have one or more diseases ('diagnosis'). It looks like 
> 
> id	diagnosis
> 1	ah
> 2	ah
> 2	ihd
> 2	im
> 3	ah
> 3	stroke
> 4	ah
> 4	ihd
> 4	angina
> 5	ihd
> ..............
> Q: How to make three data sets:
> 	1. Patients with ah and ihd
>  	2. Patients with ah but no ihd
> 	3. Patients with  ihd but no ah?
This may be understood as a two step procedure:
1. Split the id into disjoint groups according the above criteria.
2. Split the data cases into the groups from step 1.

If this is what you want, then function table() may be used to
collect information on each id.

  df <- structure(list(id = c(1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 5L),
      diagnosis = structure(c(1L, 1L, 3L, 4L, 1L, 5L, 1L, 3L, 2L, 3L),
      .Label = c("ah", "angina", "ihd",
"im", "stroke"), class = "factor")),
      .Names = c("id", "diagnosis"), class =
"data.frame", row.names = c(NA, -10L))

  tab <- table(df$id, df$diag)

Then, for example, the data cases for "2. Patients with ah but no ihd"
may be obtained

  sel <- tab[, "ah"] != 0 & tab[, "ihd"] == 0
  ah.noihd <- dimnames(tab)[[1]][sel] # [1] "1" "3"
  df[df$id %in% ah.noihd, ]
  #   id diagnosis
  # 1  1        ah
  # 5  3        ah
  # 6  3    stroke

I hope, this helps.

Petr Savicky.

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Jan 2011 - subsets

[R] subsets

[R] subsets

[R] subsets

[R] subsets

[R] subsets

Possibly Parallel Threads