Jack Su
2009-Aug-15 02:27 UTC
[R] How to use R to perform prediction based on history data
Say I have a csv file, each row contains several fields, one of them are whether the row is success. In history data, I have all the fields including the result of whether it is success. In future data, I only have fields without the result. For example: history data: Field1 Field2 Field3 Field4 ResultField 1231 CA TRUE 443 TRUE 23231 NC TRUE 123 FALSE 1231 CA FALSE 243 TRUE The future data: Field1 Field2 Field3 Field4 23231 NC TRUE 123 I am newbie in R and statistics, I just feel R could have some mechanism to give the probably of success rate based on history data. I tried to read in the csv data, and try to call "factor" on the list, but I am seeing error message: Error in sort.list(unique.default(x), na.last = TRUE) : Any idea are highly welcome. Thanks in advance.
Daniel Malter
2009-Aug-15 15:25 UTC
[R] How to use R to perform prediction based on history data
Please have a look at the posting guide. For your problem of loading the data, we do not know what you have done and, therefore, cannot even try to guess what the reason for the error message may be. So at least we need information what you did in R (the code). Second, the posting guide generally requires to provide minimally self-contained code. In other words, an example that we just have to copy in the R prompt that reproduces your problem. Often the attempt to create such an example makes you find the source of the error yourself. As for the modeling question, yes, R allows you to predict for many estimated models. However, this is "dangerous" if you are new to statistics and don't really know what you are doing. You should ask your/a local statistician/econometrician to help you with modeling your data. Everything we could do from a distance is vague, especially given the vague description of your data. Best, Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Urspr?ngliche Nachricht----- Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von Jack Su Gesendet: Friday, August 14, 2009 10:28 PM An: r-help at r-project.org Betreff: [R] How to use R to perform prediction based on history data Say I have a csv file, each row contains several fields, one of them are whether the row is success. In history data, I have all the fields including the result of whether it is success. In future data, I only have fields without the result. For example: history data: Field1 Field2 Field3 Field4 ResultField 1231 CA TRUE 443 TRUE 23231 NC TRUE 123 FALSE 1231 CA FALSE 243 TRUE The future data: Field1 Field2 Field3 Field4 23231 NC TRUE 123 I am newbie in R and statistics, I just feel R could have some mechanism to give the probably of success rate based on history data. I tried to read in the csv data, and try to call "factor" on the list, but I am seeing error message: Error in sort.list(unique.default(x), na.last = TRUE) : Any idea are highly welcome. Thanks in advance. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Petr PIKAL
2009-Aug-18 13:27 UTC
[R] Odp: How to use R to perform prediction based on history data
Hi r-help-bounces at r-project.org napsal dne 15.08.2009 04:27:39:> Say I have a csv file, each row contains several fields, one of them > are whether the row is success. > > In history data, I have all the fields including the result of whether > it is success. In future data, I only have fields without the result. > > For example: > > history data: > > Field1 Field2 Field3 Field4 ResultField > 1231 CA TRUE 443 TRUE > 23231 NC TRUE 123 FALSE > 1231 CA FALSE 243 TRUE > > The future data: > Field1 Field2 Field3 Field4 > 23231 NC TRUE 123 > > > > I am newbie in R and statistics, I just feel R could have some > mechanism to give the probably of success rate based on history data. > > I tried to read in the csv data, and try to call "factor" on the list, > but I am seeing error message: > Error in sort.list(unique.default(x), na.last = TRUE) : > > Any idea are highly welcome.Well, the first idea seems to be that you could buy or borrow some book of introductory statistics and look into some R intro documents (it is in doc folder of your R installation) there are also books of introductory statistics which use R as a programming language. If you do not know much about statistics and R you possibly could toss a coin and fill your ResultField accordingly (it could be quicker and maybe more foolproof :-). I would not call myself an expert in statistics but in data like that you could try ?lm, ?glm or other modelling procedure with predict ability for future data and/or tree based package like ??mvpart Regards Petr> > Thanks in advance. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Gabor Grothendieck
2009-Aug-18 13:38 UTC
[R] How to use R to perform prediction based on history data
Please read the last line to every message on r-help and note the request to provide reproducible code. Anyways, try this: Lines <- "Field1 Field2 Field3 Field4 ResultField 1231 CA TRUE 443 TRUE 23231 NC TRUE 123 FALSE 1231 CA FALSE 243 TRUE 23231 NC TRUE 123 NA" DF <- read.table(textConnection(Lines), header = TRUE) # logistic regression using first 3 rows mod <- glm(ResultField ~., DF[1:3, ], family = binomial) # prediction using 4th row predict(mod, DF[4, 1:4], type = "response") and also have a look at the caret package. On Fri, Aug 14, 2009 at 10:27 PM, Jack Su<jacksuyu at gmail.com> wrote:> Say I have a csv file, each row contains several fields, one of them > are whether the row is success. > > In history data, I have all the fields including the result of whether > it is success. In future data, I only have fields without the result. > > For example: > > history data: > > Field1 Field2 Field3 ? ? Field4 ?ResultField > 1231 ? ?CA ? ? ? TRUE ? ?443 ? ? ? ?TRUE > 23231 ?NC ? ? ? TRUE ? ?123 ? ? ? ?FALSE > 1231 ? ?CA ? ? ? ?FALSE ? ?243 ? ? ? ?TRUE > > The future data: > Field1 Field2 ? Field3 ? ? Field4 > 23231 ?NC ? ? ? TRUE ? ?123 > > > > I am newbie in R and statistics, I just feel R could have some > mechanism to give the probably of success rate based on history data. > > I tried to read in the csv data, and try to call "factor" on the list, > but I am seeing error message: > Error in sort.list(unique.default(x), na.last = TRUE) : > > Any idea are highly welcome. > > Thanks in advance. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Thank you very much, it works cool! On Tue, Aug 18, 2009 at 9:38 AM, Gabor Grothendieck<ggrothendieck at gmail.com> wrote:> Please read the last line to every message on r-help and note the > request to provide reproducible code. > > Anyways, try this: > > Lines <- "Field1 Field2 Field3 ? ? Field4 ?ResultField > 1231 ? ?CA ? ? ? TRUE ? ?443 ? ? ? ?TRUE > 23231 ?NC ? ? ? TRUE ? ?123 ? ? ? ?FALSE > 1231 ? ?CA ? ? ? ?FALSE ? ?243 ? ? ? ?TRUE > 23231 ?NC ? ? ? TRUE ? ?123 NA" > > DF <- read.table(textConnection(Lines), header = TRUE) > > # logistic regression using first 3 rows > mod <- glm(ResultField ~., DF[1:3, ], family = binomial) > > # prediction using 4th row > predict(mod, DF[4, 1:4], type = "response") > > and also have a look at the caret package. > > On Fri, Aug 14, 2009 at 10:27 PM, Jack Su<jacksuyu at gmail.com> wrote: >> Say I have a csv file, each row contains several fields, one of them >> are whether the row is success. >> >> In history data, I have all the fields including the result of whether >> it is success. In future data, I only have fields without the result. >> >> For example: >> >> history data: >> >> Field1 Field2 Field3 ? ? Field4 ?ResultField >> 1231 ? ?CA ? ? ? TRUE ? ?443 ? ? ? ?TRUE >> 23231 ?NC ? ? ? TRUE ? ?123 ? ? ? ?FALSE >> 1231 ? ?CA ? ? ? ?FALSE ? ?243 ? ? ? ?TRUE >> >> The future data: >> Field1 Field2 ? Field3 ? ? Field4 >> 23231 ?NC ? ? ? TRUE ? ?123 >> >> >> >> I am newbie in R and statistics, I just feel R could have some >> mechanism to give the probably of success rate based on history data. >> >> I tried to read in the csv data, and try to call "factor" on the list, >> but I am seeing error message: >> Error in sort.list(unique.default(x), na.last = TRUE) : >> >> Any idea are highly welcome. >> >> Thanks in advance. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >