thr3ads.net - R help - [R] subsetting like in SAS [Jan 2005]

If this information is useful, please help other people find it:
Share via:

Denis Chabot

2005-Jan-13 10:52 UTC

[R] subsetting like in SAS

Hi,

Being in the process of translating some of my SAS programs to R, I 
encountered one difficulty. I have a solution, but it is not elegant 
(and not pleasant to implement).

I have a large dataset with many variables needed to identify the 
origin of a sample, many to describe sample characteristics, others to 
describe site characteristics.

I want only a (shorter) list of sites and their characteristics.

If "origin", "ship_cat", "ship_nb",
"trip" and "set" are needed to
identify a site, in SAS you'd sort on those variables, then read the 
data with:

data sites;
	set alldata;
	by origin ship_cat ship_nb trip set;
	if first.set;
	keep list-of-variables-detailing-sites;
run;

In R I did this with the Lag function of Hmisc, and the original data 
set also needs to be sorted first:

oL <- Lag(origin)
scL <- Lag(ship_cat)
snL <- Lag(ship_nb)
tL <- Lag(trip)
sL <- Lag(set)
same <- origin==oL & ship_cat==scL & ship_nb==snL & trip==tL
& set==sL
sites <- subset(alldata, !same, 
select=c(list-of-variables-detailing-sites)

Could I do better than this?

Thanks in advance,

Denis Chabot

Petr Pikal

2005-Jan-13 13:23 UTC

head link

[R] subsetting like in SAS

Hi Denis

maybe unique() can choose unique entries from your data set 
without need for sorting.

Cheers
Petr

On 13 Jan 2005 at 11:52, Denis Chabot wrote:
> Hi,
> 
> Being in the process of translating some of my SAS programs to R, I
> encountered one difficulty. I have a solution, but it is not elegant
> (and not pleasant to implement).
> 
> I have a large dataset with many variables needed to identify the
> origin of a sample, many to describe sample characteristics, others to
> describe site characteristics.
> 
> I want only a (shorter) list of sites and their characteristics.
> 
> If "origin", "ship_cat", "ship_nb",
"trip" and "set" are needed to
> identify a site, in SAS you'd sort on those variables, then read the
> data with:
> 
> data sites;
>  set alldata;
>  by origin ship_cat ship_nb trip set;
>  if first.set;
>  keep list-of-variables-detailing-sites;
> run;
> 
> In R I did this with the Lag function of Hmisc, and the original data
> set also needs to be sorted first:
> 
> oL <- Lag(origin)
> scL <- Lag(ship_cat)
> snL <- Lag(ship_nb)
> tL <- Lag(trip)
> sL <- Lag(set)
> same <- origin==oL & ship_cat==scL & ship_nb==snL & trip==tL
& set==sL
> sites <- subset(alldata, !same,
> select=c(list-of-variables-detailing-sites)
> 
> Could I do better than this?
> 
> Thanks in advance,
> 
> Denis Chabot
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
Petr Pikal
petr.pikal at precheza.cz

Denis Chabot

2005-Jan-17 21:01 UTC

head link

[R] subsetting like in SAS

I want to thank Petr Pikal, Robert Balshaw and Na Li for suggesting the 
use of "unique" or "!duplicated" on a subset of my data
where unwanted
variables have been removed. This worked perfectly.

Denis Chabot
On 13 Jan 2005 at 11:52, Denis Chabot wrote:
> Hi,
>
> Being in the process of translating some of my SAS programs to R, I
> encountered one difficulty. I have a solution, but it is not elegant
> (and not pleasant to implement).
>
> I have a large dataset with many variables needed to identify the
> origin of a sample, many to describe sample characteristics, others to
> describe site characteristics.
>
> I want only a (shorter) list of sites and their characteristics.
>
> If "origin", "ship_cat", "ship_nb",
"trip" and "set" are needed to
> identify a site, in SAS you'd sort on those variables, then read the
> data with:
>
> data sites;
>  set alldata;
>  by origin ship_cat ship_nb trip set;
>  if first.set;
>  keep list-of-variables-detailing-sites;
> run;
>
> In R I did this with the Lag function of Hmisc, and the original data
> set also needs to be sorted first:
>
> oL <- Lag(origin)
> scL <- Lag(ship_cat)
> snL <- Lag(ship_nb)
> tL <- Lag(trip)
> sL <- Lag(set)
> same <- origin==oL & ship_cat==scL & ship_nb==snL & trip==tL
& set==sL
> sites <- subset(alldata, !same,
> select=c(list-of-variables-detailing-sites)
>
> Could I do better than this?

Reasonably Related Threads

Search for more seemingly similar threads

R help - Jan 2005 - subsetting like in SAS

[R] subsetting like in SAS

[R] subsetting like in SAS

[R] subsetting like in SAS

Reasonably Related Threads