Hi Ben,
This question apparently has nothing to do with R and is therefore
off-topic for this list. You should post this question on a statistics
forum, or seek local help.
Best,
Ista
On Mon, Aug 26, 2013 at 1:50 AM, Ben Harrison
<harb@student.unimelb.edu.au>wrote:
> Hello, I am quite a novice when it comes to predictive modelling, so would
> like to see where my particular problem might lie in the spectrum of
> problems that you collectively have seen in your experiences.
>
> Background: I have been handed a piece of software that uses a kohonen SOM
> network to analyse and predict data with missing values common, but I want
> to compare its results to other forms of modelling and prediction (e.g.
> multi-layer perceptrons, random forests??).
>
> My data is a conglomeration of borehole data from hundreds of boreholes.
> Some measurements were made during the drilling of the boreholes (more or
> less continuous 'tool responses': geophysical well-logs), and some
in the
> laboratory on discrete samples of 10 cm up to metre-length scales.
>
> The data could be considered ordered series to some extent, though changes
> in rock types with depth can result in 'step' changes in tool
responses.
>
> My problem is not classifying the rocks, but modelling and predicting a
> physical attribute of the rocks---thermal conductivity, which is a lab
> measurement, and hard to come by / expensive. I want to use the more common
> well-log responses to predict this attribute.
>
> Some boreholes have different sets of well-log data though. For example,
> one might have measurements from the A and B tool, while another might have
> A, B, and C tools, and a third the B and C tools. I can construct a decent
> data base of about 70,000 observations of a common set of 5 tool responses,
> and they have associated with them about 100 measurements of thermal
> conductivity. I am mostly confident that the relationship of well-log
> responses is non-linear to thermal conductivity. Linear regression has not
> proven accurate.
>
> What 'sort' of problem is this?
>
> Have you seen problems like this, and what did you use to solve it?
>
> I have papers by people using other ANN type techniques (MLP in
> particular) to model and predict thermal conductivity, but wondered if
> there was something else I could try.
>
> Some other questions I would like a little guidance on:
> Are 100 samples enough of the 'target' attribute for confident
modelling
> and prediction?
> How would I quantify the certainty of results of modelling?
> The well-log data is extensive, but if I look at the complete set of tool
> responses, there is a LOT of missing data (because there is no common tool
> set). Is there a way I can still use the less common tool responses?
> Is discretisation of the 100 measured thermal conductivities a silly idea?
> How many 'bins' can I construct?
>
> Thanks for reading!
> Ben.
>
> ______________________________**________________
> R-help@r-project.org mailing list
>
https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]