Kristi Glover
2012-Aug-04 14:22 UTC
[R] how to assing unique ID in a table and do regression
Hi R- User I have very big data set (5000 rows). I wanted to make classes based on a column of that table (that column has the data which is continuous .) After converting into different class, this class would be Unique ID. I want to run regression for each ID. For example I have a data set> dput(dat)structure(list(ID = c(0.1, 0.8, 0.1, 1.5, 1.1, 0.9, 1.8, 2.5, 2, 2.5, 2.8, 3, 3.1, 3.2, 3.9, 1, 4, 4.7, 4.3, 4.9, 2.1, 2.4), S = c(4L, 7L, 9L, 10L, 10L, 8L, 8L, 8L, 17L, 18L, 13L, 13L, 11L, 1L, 10L, 20L, 22L, 20L, 18L, 16L, 7L, 20L), en2 = c(-2.5767, -2.5767, -2.5767, -2.5767, -2.5767, -2.5767, -2.5767, -2.5347, -2.5347, -2.5347, -2.5347, -2.5347, -2.5347, -2.4939, -2.4939, -2.4939, -2.4939, -2.4939, -2.4939, -2.4939, -2.4543, -2.4543 ), en3 = c(-1.1785, -0.6596, -0.6145, -0.6437, -0.6593, -0.7811, -1.1785, -1.1785, -1.1785, -0.6596, -0.6145, -0.6437, -0.6593, -1.1785, -0.1342, -0.2085, -0.4428, -0.5125, -0.8075, -1.1785, -1.1785, -0.1342), en4 = c(-1.4445, -1.3645, -1.1634, -0.7735, -0.6931, -1.1105, -1.4127, -1.5278, -1.4445, -1.3645, -1.1634, -0.7735, -0.6931, -1.0477, -0.8655, -0.1759, 0.1203, -0.2962, -0.4473, -1.0436, -0.9705, -0.8953), en5 = c(-0.4783, -0.3296, -0.2026, -0.3579, -0.5154, -0.5726, -0.6415, -0.3996, -0.4529, -0.5762, -0.561, -0.6891, -0.7408, -0.6287, -0.4337, -0.4586, -0.5249, -0.6086, -0.7076, -0.7114, -0.4952, 0.1091)), .Names = c("ID", "S", "en2", "en3", "en4", "en5"), class = "data.frame", row.names = c(NA, -22L)) Here ID has continuous value, I want to make groups with value 0-1, 1-2, 2-3, 3-4 from the column ID. and then. I wanted to run regression with S (dependent variable) and en2 (independent variable); again regression of S and en3 , and so on. After that, I wanted to have a table with r2 and p value. would you help me how I can do it? I was trying it manually - but it took so much time. therefore I thought to write you for your help. Thanks for your help. Kristi [[alternative HTML version deleted]]
Rui Barradas
2012-Aug-04 15:15 UTC
[R] how to assing unique ID in a table and do regression
Hello, Try the following. id.groups <- with(dat, cut(ID, breaks=0:ceiling(max(ID)))) sp <- split(dat, id.groups) regressors <- grep("en", names(dat)) models <- lapply(sp, function(.df) lapply(regressors, function(x) lm(.df[["S"]] ~ .df[[x]]))) mod.summ <- lapply(models, function(x) lapply(x, summary)) # First R2 mod.r2 <- lapply(mod.summ, function(x) lapply(x, `[[`, "r.squared")) mod.r2 # Now p-values mod.coef <- lapply(mod.summ, function(x) lapply(x, coef)) mod.pvalue <- lapply(mod.coef, function(x) lapply(x, `[`, , 4)) # p-values in matrix form, columns are 'en2', en3', etc #lapply(mod.pvalue, function(x) do.call(cbind, x)) Hope this helps, Rui Barradas Em 04-08-2012 15:22, Kristi Glover escreveu:> Hi R- User > I have very big data set (5000 rows). I wanted to make classes based on a column of that table (that column has the data which is continuous .) After converting into different class, this class would be Unique ID. I want to run regression for each ID. > For example I have a data set >> dput(dat) > structure(list(ID = c(0.1, 0.8, 0.1, 1.5, 1.1, 0.9, 1.8, 2.5, > 2, 2.5, 2.8, 3, 3.1, 3.2, 3.9, 1, 4, 4.7, 4.3, 4.9, 2.1, 2.4), > S = c(4L, 7L, 9L, 10L, 10L, 8L, 8L, 8L, 17L, 18L, 13L, 13L, > 11L, 1L, 10L, 20L, 22L, 20L, 18L, 16L, 7L, 20L), en2 = c(-2.5767, > -2.5767, -2.5767, -2.5767, -2.5767, -2.5767, -2.5767, -2.5347, > -2.5347, -2.5347, -2.5347, -2.5347, -2.5347, -2.4939, -2.4939, > -2.4939, -2.4939, -2.4939, -2.4939, -2.4939, -2.4543, -2.4543 > ), en3 = c(-1.1785, -0.6596, -0.6145, -0.6437, -0.6593, -0.7811, > -1.1785, -1.1785, -1.1785, -0.6596, -0.6145, -0.6437, -0.6593, > -1.1785, -0.1342, -0.2085, -0.4428, -0.5125, -0.8075, -1.1785, > -1.1785, -0.1342), en4 = c(-1.4445, -1.3645, -1.1634, -0.7735, > -0.6931, -1.1105, -1.4127, -1.5278, -1.4445, -1.3645, -1.1634, > -0.7735, -0.6931, -1.0477, -0.8655, -0.1759, 0.1203, -0.2962, > -0.4473, -1.0436, -0.9705, -0.8953), en5 = c(-0.4783, -0.3296, > -0.2026, -0.3579, -0.5154, -0.5726, -0.6415, -0.3996, -0.4529, > -0.5762, -0.561, -0.6891, -0.7408, -0.6287, -0.4337, -0.4586, > -0.5249, -0.6086, -0.7076, -0.7114, -0.4952, 0.1091)), .Names = c("ID", > "S", "en2", "en3", "en4", "en5"), class = "data.frame", row.names = c(NA, > -22L)) > > Here ID has continuous value, I want to make groups with value 0-1, 1-2, 2-3, 3-4 from the column ID. > and then. I wanted to run regression with S (dependent variable) and en2 (independent variable); again regression of S and en3 , and so on. > After that, I wanted to have a table with r2 and p value. > > would you help me how I can do it? I was trying it manually - but it took so much time. therefore I thought to write you for your help. > > Thanks for your help. > Kristi > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Rui Barradas
2012-Aug-06 16:30 UTC
[R] how to assing unique ID in a table and do regression
Sorry, forgot to Cc the list. Em 06-08-2012 17:29, Rui Barradas escreveu:> Hello, > > I'm glad it helped. > > The result of function cut() is a factor variable so you can coerce it > to integer, giving more "normal" names, or, if you want to keep track > of the intervals the adjusted r2 belong to, got straight to the last > two lines in the following code. > > > #dat1$groups <- as.integer( cut( ...etc... ) ) > > [...rest of your code... ] > > adj <- summary(lin.temp1)$adj.r.squared > class(adj) <- "list" > > > That's it. It has as names the intervals produced by cut that appear > in the output you've posted. > > Rui Barradas > > Em 06-08-2012 17:07, Kristi Glover escreveu: >> >> >> >> Dear Rui, >> Thanks for the help. I really appricated . It helped me out. >> I modified some of the script you gave me becasue I found the package >> 'nlme' can also do it. But I do use the script you gave me to split >> the data >> dat1$groups<-cut(dat1$LATITUDE, seq(-56,79, by=2.5)) >> lin.temp1<-lmList(S~mean_temp|groups,data=dat1) >> could you please give me an idea how I can extract r adjusted and >> put them in a table? >> I called summary but it gave me the value of r2 adjusted for each >> group but I don't know how I can put teh r2 adjusted in table (like: >> group , r2 sqaure, r2 adjusted) >>> summary(lin.temp1)$adj.r.squared >> (-56,-53.5] : >> [1] 0.2565786 >> (-53.5,-51] : >> [1] 0.0715485 >> (-51,-48.5] : >> [1] 0.2265334 >> >> Thanks >> Kristi >> >>> Date: Sat, 4 Aug 2012 16:15:57 +0100 >>> From: ruipbarradas at sapo.pt >>> To: kristi.glover at hotmail.com >>> CC: r-help at r-project.org >>> Subject: Re: [R] how to assing unique ID in a table and do regression >>> >>> Hello, >>> >>> Try the following. >>> >>> >>> id.groups <- with(dat, cut(ID, breaks=0:ceiling(max(ID)))) >>> sp <- split(dat, id.groups) >>> regressors <- grep("en", names(dat)) >>> models <- lapply(sp, function(.df) >>> lapply(regressors, function(x) lm(.df[["S"]] ~ .df[[x]]))) >>> >>> mod.summ <- lapply(models, function(x) lapply(x, summary)) >>> # First R2 >>> mod.r2 <- lapply(mod.summ, function(x) lapply(x, `[[`, "r.squared")) >>> mod.r2 >>> >>> # Now p-values >>> mod.coef <- lapply(mod.summ, function(x) lapply(x, coef)) >>> mod.pvalue <- lapply(mod.coef, function(x) lapply(x, `[`, , 4)) >>> # p-values in matrix form, columns are 'en2', en3', etc >>> #lapply(mod.pvalue, function(x) do.call(cbind, x)) >>> >>> Hope this helps, >>> >>> Rui Barradas >>> >>> Em 04-08-2012 15:22, Kristi Glover escreveu: >>>> Hi R- User >>>> I have very big data set (5000 rows). I wanted to make classes >>>> based on a column of that table (that column has the data which is >>>> continuous .) After converting into different class, this class >>>> would be Unique ID. I want to run regression for each ID. >>>> For example I have a data set >>>>> dput(dat) >>>> structure(list(ID = c(0.1, 0.8, 0.1, 1.5, 1.1, 0.9, 1.8, 2.5, >>>> 2, 2.5, 2.8, 3, 3.1, 3.2, 3.9, 1, 4, 4.7, 4.3, 4.9, 2.1, 2.4), >>>> S = c(4L, 7L, 9L, 10L, 10L, 8L, 8L, 8L, 17L, 18L, 13L, 13L, >>>> 11L, 1L, 10L, 20L, 22L, 20L, 18L, 16L, 7L, 20L), en2 = >>>> c(-2.5767, >>>> -2.5767, -2.5767, -2.5767, -2.5767, -2.5767, -2.5767, -2.5347, >>>> -2.5347, -2.5347, -2.5347, -2.5347, -2.5347, -2.4939, -2.4939, >>>> -2.4939, -2.4939, -2.4939, -2.4939, -2.4939, -2.4543, -2.4543 >>>> ), en3 = c(-1.1785, -0.6596, -0.6145, -0.6437, -0.6593, -0.7811, >>>> -1.1785, -1.1785, -1.1785, -0.6596, -0.6145, -0.6437, -0.6593, >>>> -1.1785, -0.1342, -0.2085, -0.4428, -0.5125, -0.8075, -1.1785, >>>> -1.1785, -0.1342), en4 = c(-1.4445, -1.3645, -1.1634, -0.7735, >>>> -0.6931, -1.1105, -1.4127, -1.5278, -1.4445, -1.3645, -1.1634, >>>> -0.7735, -0.6931, -1.0477, -0.8655, -0.1759, 0.1203, -0.2962, >>>> -0.4473, -1.0436, -0.9705, -0.8953), en5 = c(-0.4783, -0.3296, >>>> -0.2026, -0.3579, -0.5154, -0.5726, -0.6415, -0.3996, -0.4529, >>>> -0.5762, -0.561, -0.6891, -0.7408, -0.6287, -0.4337, -0.4586, >>>> -0.5249, -0.6086, -0.7076, -0.7114, -0.4952, 0.1091)), .Names >>>> = c("ID", >>>> "S", "en2", "en3", "en4", "en5"), class = "data.frame", row.names = >>>> c(NA, >>>> -22L)) >>>> >>>> Here ID has continuous value, I want to make groups with value 0-1, >>>> 1-2, 2-3, 3-4 from the column ID. >>>> and then. I wanted to run regression with S (dependent variable) >>>> and en2 (independent variable); again regression of S and en3 , and >>>> so on. >>>> After that, I wanted to have a table with r2 and p value. >>>> >>>> would you help me how I can do it? I was trying it manually - but >>>> it took so much time. therefore I thought to write you for your help. >>>> >>>> Thanks for your help. >>>> Kristi >>>> >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >> >