Good morning My compliments to all. Since I'm a newbie on R, I was wondering if you could help me to achieve a small project that I think it's possible with this project (I cant seem to find a similar tool) I have a data file with about 2000 value lines, organized like this: x;y;z;j; ... I want to find diferent correlations (linear regression with Levenberg?Marquardt or least squares) between the x values and a y or z pair. For instance, between x and y. So, what I'm trying to do is: 1) Load the file (is there a limit on the load size? If yes, can I load it in sequence by parts?) 2) Define 100 sets of 20 values each (also sequence, from x1 to xn: first from x1 to x20, next from x21 to x41, etc.) or process one set at the time in case of file limits in 1) 3) Define a fitting function 4) Use the same function model to find the best fit for each set 5) Save in a file, the coefficients of those fits. Can this be done accurately with R? It would save me a lot of programming. The files will soon have about 1 million lines, which is a lot to process. I would apreciate very much if someone could help me. Kind regards Kepler [[alternative HTML version deleted]]
Dear Kepler, Yes, R can do this all. But this is is to help you when you get stuck, not to do all the work for you... You are asking basic stuff, so any introduction book on R should contain sufficient information to get you going. So please do read on of those first. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op ma 19 nov. 2018 om 11:39 schreef Rui Fernandes <rui.kepler at gmail.com>:> Good morning > > My compliments to all. > > Since I'm a newbie on R, I was wondering if you could help me to achieve a > small project that I think it's possible with this project (I cant seem to > find a similar tool) > > I have a data file with about 2000 value lines, organized like this: > > x;y;z;j; > ... > > I want to find diferent correlations (linear regression with > Levenberg?Marquardt or least squares) between the x values and a y or z > pair. For instance, between x and y. > > So, what I'm trying to do is: > > 1) Load the file (is there a limit on the load size? If yes, can I load it > in sequence by parts?) > 2) Define 100 sets of 20 values each (also sequence, from x1 to xn: first > from x1 to x20, next from x21 to x41, etc.) or process one set at the time > in case of file limits in 1) > 3) Define a fitting function > 4) Use the same function model to find the best fit for each set > 5) Save in a file, the coefficients of those fits. > > Can this be done accurately with R? > > It would save me a lot of programming. The files will soon have about 1 > million lines, which is a lot to process. > > I would apreciate very much if someone could help me. > > Kind regards > > Kepler > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Pointers inline below:> > Since I'm a newbie on R, I was wondering if you could help me to achieve a > > small project that I think it's possible with this project (I cant seem to > > find a similar tool) > > > > I have a data file with about 2000 value lines, organized like this: > > > > x;y;z;j; > > ... > > > > I want to find diferent correlations (linear regression with > > Levenberg?Marquardt or least squares) between the x values and a y or z > > pair. For instance, between x and y. > > > > So, what I'm trying to do is: > > > > 1) Load the file (is there a limit on the load size? If yes, can I load it > > in sequence by parts?)See ?read.table and note that you can define a separator. Using read.table() with sep=";" should work Load limits are memory size; I have read 800,000 lines on a 4Gb system> > 2) Define 100 sets of 20 values each (also sequence, from x1 to xn: first > > from x1 to x20, next from x21 to x41, etc.) or process one set at the time > > in case of file limits in 1)You can say something like mydata[i:(i+20), ] to get row-wise slices of your data, but an R user would perhaps consider setting up an ancillary variable using mydata$chunks <- gl(100,20) and use a variant of aggregate() or ddply to apply a function to each subset> > 3) Define a fitting functioner... anything you can write, either as an expression or a function.> > 4) Use the same function model to find the best fit for each setLook at, for example, lm for linear models (including polynomials), nls or nlm for non-linear models, and a decent book on R for a much, much, much wider range, including splines, generalised additive models, generalised linear models, mixed effects models (linear and otherwise) ...> > 5) Save in a file, the coefficients of those fits.Something like sapply or ddply should be able to give you a table of coefficients, especially if you write a wrapper function like mywrap <- function(x) coef( nls(y~fitfun)) to return a vector of coefficients from a chunk x> > Can this be done accurately with R?Yes; R has well-characterised numerically stable core functions, which is more than can be said for most spreadsheets.> > It would save me a lot of programming.You'll still have to do that, but doing it in R will be a lot faster than C> > The files will soon have about 1 > > million lines, which is a lot to process.If you can?t load it all at once, you can use read.table with start and end rows. or you can puch the whole lot to a database and use any of R's database packages to read from that; Rmysql and the like. ******************************************************************* This email and any attachments are confidential. Any use, copying or disclosure other than by the intended recipient is unauthorised. If you have received this message in error, please notify the sender immediately via +44(0)20 8943 7000 or notify postmaster at lgcgroup.com and delete this message and any copies from your computer and network. LGC Limited. Registered in England 2991879. Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK