thr3ads.net - R help - [R] Optimization problem: selecting independent rows to maximize the mean [Mar 2006]

If this information is useful, please help other people find it:
Share via:

Mark

2006-Mar-01 20:40 UTC

[R] Optimization problem: selecting independent rows to maximize the mean

Dear R community,

I have a dataframe with 500,000 rows and 102 columns. The rows
represent spatial polygons, some of which overlap others (i.e., not
all rows are independent of each other).

Given a particular row, the first column contains a unique "RowID".
The second column contains the "Variable" of interest. The remaining
100 columns ("Overlap1" ... "Overlap100") each contain a row
ID that
overlaps this row (but if this row overlaps fewer than 100 other rows
then the remainder of the columns "OL1...OL100" contain NA).

Here's the problem: I need to select the subset of 500 independent
rows that maximizes the mean and minimizes the stdev of "Variable".

Clearly this requires iterative selection and comparison of rows,
because each newly-selected row must be compared to rows already
selected to ensure it does not overlap them. At each step, a row
already selected might be removed from the subset if it can be
replaced with another that increases the mean and/or reduces the
stdev.

The above description is a simplification of my problem, but it's a start.

As I am new to R (and programming in general) I'm not sure how to
start thinking about this, or even where to look. I'd appreciate any
ideas that might help.

Thank you, Mark

Berton Gunter

2006-Mar-01 21:07 UTC

head link

[R] Optimization problem: selecting independent rows to maximizethe mean

This sounds either easy via a greedy algorithm or NP-hard. Moreover, it is
not clear to me that

1) A subset of 500 indpendent rows exists, where I presume
"independent"
means pairwise nonoverlapping;

2) That the mean and sd can be simultaneously optimized as you describe--
what if the subset with maximum mean also has bigger than minimal sd?


-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Mark
> Sent: Wednesday, March 01, 2006 12:40 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Optimization problem: selecting independent rows 
> to maximizethe mean
> 
> Dear R community,
> 
> I have a dataframe with 500,000 rows and 102 columns. The rows
> represent spatial polygons, some of which overlap others (i.e., not
> all rows are independent of each other).
> 
> Given a particular row, the first column contains a unique
"RowID".
> The second column contains the "Variable" of interest. The
remaining
> 100 columns ("Overlap1" ... "Overlap100") each contain
a row ID that
> overlaps this row (but if this row overlaps fewer than 100 other rows
> then the remainder of the columns "OL1...OL100" contain NA).
> 
> Here's the problem: I need to select the subset of 500 independent
> rows that maximizes the mean and minimizes the stdev of
"Variable".
> 
> Clearly this requires iterative selection and comparison of rows,
> because each newly-selected row must be compared to rows already
> selected to ensure it does not overlap them. At each step, a row
> already selected might be removed from the subset if it can be
> replaced with another that increases the mean and/or reduces the
> stdev.
> 
> The above description is a simplification of my problem, but 
> it's a start.
> 
> As I am new to R (and programming in general) I'm not sure how to
> start thinking about this, or even where to look. I'd appreciate any
> ideas that might help.
> 
> Thank you, Mark
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Gabor Grothendieck

2006-Mar-01 21:32 UTC

head link

[R] Optimization problem: selecting independent rows to maximize the mean

Package lpSolve might help.

On 3/1/06, Mark <mtb954 at gmail.com> wrote:> Dear R community,
>
> I have a dataframe with 500,000 rows and 102 columns. The rows
> represent spatial polygons, some of which overlap others (i.e., not
> all rows are independent of each other).
>
> Given a particular row, the first column contains a unique
"RowID".
> The second column contains the "Variable" of interest. The
remaining
> 100 columns ("Overlap1" ... "Overlap100") each contain
a row ID that
> overlaps this row (but if this row overlaps fewer than 100 other rows
> then the remainder of the columns "OL1...OL100" contain NA).
>
> Here's the problem: I need to select the subset of 500 independent
> rows that maximizes the mean and minimizes the stdev of
"Variable".
>
> Clearly this requires iterative selection and comparison of rows,
> because each newly-selected row must be compared to rows already
> selected to ensure it does not overlap them. At each step, a row
> already selected might be removed from the subset if it can be
> replaced with another that increases the mean and/or reduces the
> stdev.
>
> The above description is a simplification of my problem, but it's a
start.
>
> As I am new to R (and programming in general) I'm not sure how to
> start thinking about this, or even where to look. I'd appreciate any
> ideas that might help.
>
> Thank you, Mark
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

nojhan

2006-Mar-02 08:28 UTC

head link

[R] Optimization problem: selecting independent rows to maximizethe mean

Le Wed, 01 Mar 2006 13:07:07 -0800, Berton Gunter a
?crit?:> 2) That the mean and sd can be simultaneously optimized as you describe--
> what if the subset with maximum mean also has bigger than minimal sd?
Then you have two choices :
1) balance the two objectives with weights, according to the importance
you give to each one
2) get a list of non-dominated solutions (a "Pareto front")

Does R have packages for such multi-objectives optimization problems ?

Moreover, does it have a package for "difficult" (i.e. NP-hard)
problems ?

-- 
nojhan

Reasonably Related Threads

Search for more maybe matching threads

R help - Mar 2006 - Optimization problem: selecting independent rows to maximize the mean

[R] Optimization problem: selecting independent rows to maximize the mean

[R] Optimization problem: selecting independent rows to maximizethe mean

[R] Optimization problem: selecting independent rows to maximize the mean

[R] Optimization problem: selecting independent rows to maximizethe mean

Reasonably Related Threads