thr3ads.net - R help - [R] Analogues to my data and prediction problem [Aug 2013]

If this information is useful, please help other people find it:
Share via:

Ben Harrison

2013-Aug-26 05:50 UTC

[R] Analogues to my data and prediction problem

Hello, I am quite a novice when it comes to predictive modelling, so 
would like to see where my particular problem might lie in the spectrum 
of problems that you collectively have seen in your experiences.

Background: I have been handed a piece of software that uses a kohonen 
SOM network to analyse and predict data with missing values common, but 
I want to compare its results to other forms of modelling and prediction 
(e.g. multi-layer perceptrons, random forests??).

My data is a conglomeration of borehole data from hundreds of boreholes. 
Some measurements were made during the drilling of the boreholes (more 
or less continuous 'tool responses': geophysical well-logs), and some in
the laboratory on discrete samples of 10 cm up to metre-length scales.

The data could be considered ordered series to some extent, though 
changes in rock types with depth can result in 'step' changes in tool 
responses.

My problem is not classifying the rocks, but modelling and predicting a 
physical attribute of the rocks---thermal conductivity, which is a lab 
measurement, and hard to come by / expensive. I want to use the more 
common well-log responses to predict this attribute.

Some boreholes have different sets of well-log data though. For example, 
one might have measurements from the A and B tool, while another might 
have A, B, and C tools, and a third the B and C tools. I can construct a 
decent data base of about 70,000 observations of a common set of 5 tool 
responses, and they have associated with them about 100 measurements of 
thermal conductivity. I am mostly confident that the relationship of 
well-log responses is non-linear to thermal conductivity. Linear 
regression has not proven accurate.

What 'sort' of problem is this?

Have you seen problems like this, and what did you use to solve it?

I have papers by people using other ANN type techniques (MLP in 
particular) to model and predict thermal conductivity, but wondered if 
there was something else I could try.

Some other questions I would like a little guidance on:
Are 100 samples enough of the 'target' attribute for confident modelling
and prediction?
How would I quantify the certainty of results of modelling?
The well-log data is extensive, but if I look at the complete set of 
tool responses, there is a LOT of missing data (because there is no 
common tool set). Is there a way I can still use the less common tool 
responses?
Is discretisation of the 100 measured thermal conductivities a silly 
idea? How many 'bins' can I construct?

Thanks for reading!
Ben.

Ista Zahn

2013-Aug-26 13:02 UTC

head link

[R] Analogues to my data and prediction problem

Hi Ben,

This question apparently has nothing to do with R and is therefore
off-topic for this list. You should post this question on a statistics
forum, or seek local help.

Best,
Ista


On Mon, Aug 26, 2013 at 1:50 AM, Ben Harrison
<harb@student.unimelb.edu.au>wrote:
> Hello, I am quite a novice when it comes to predictive modelling, so would
> like to see where my particular problem might lie in the spectrum of
> problems that you collectively have seen in your experiences.
>
> Background: I have been handed a piece of software that uses a kohonen SOM
> network to analyse and predict data with missing values common, but I want
> to compare its results to other forms of modelling and prediction (e.g.
> multi-layer perceptrons, random forests??).
>
> My data is a conglomeration of borehole data from hundreds of boreholes.
> Some measurements were made during the drilling of the boreholes (more or
> less continuous 'tool responses': geophysical well-logs), and some
in the
> laboratory on discrete samples of 10 cm up to metre-length scales.
>
> The data could be considered ordered series to some extent, though changes
> in rock types with depth can result in 'step' changes in tool
responses.
>
> My problem is not classifying the rocks, but modelling and predicting a
> physical attribute of the rocks---thermal conductivity, which is a lab
> measurement, and hard to come by / expensive. I want to use the more common
> well-log responses to predict this attribute.
>
> Some boreholes have different sets of well-log data though. For example,
> one might have measurements from the A and B tool, while another might have
> A, B, and C tools, and a third the B and C tools. I can construct a decent
> data base of about 70,000 observations of a common set of 5 tool responses,
> and they have associated with them about 100 measurements of thermal
> conductivity. I am mostly confident that the relationship of well-log
> responses is non-linear to thermal conductivity. Linear regression has not
> proven accurate.
>
> What 'sort' of problem is this?
>
> Have you seen problems like this, and what did you use to solve it?
>
> I have papers by people using other ANN type techniques (MLP in
> particular) to model and predict thermal conductivity, but wondered if
> there was something else I could try.
>
> Some other questions I would like a little guidance on:
> Are 100 samples enough of the 'target' attribute for confident
modelling
> and prediction?
> How would I quantify the certainty of results of modelling?
> The well-log data is extensive, but if I look at the complete set of tool
> responses, there is a LOT of missing data (because there is no common tool
> set). Is there a way I can still use the less common tool responses?
> Is discretisation of the 100 measured thermal conductivities a silly idea?
> How many 'bins' can I construct?
>
> Thanks for reading!
> Ben.
>
> ______________________________**________________
> R-help@r-project.org mailing list
>
https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Aug 2013 - Analogues to my data and prediction problem

[R] Analogues to my data and prediction problem

[R] Analogues to my data and prediction problem