Dear Adrian, Yes it is a cyclical data set and theoretically it should repeat this interval until 61327. The data set itself is divided into 2 Parts: 1. Product category (column 10) 2. Number of Stores Participating (column 01) Overall there are 22 different products and in each you have 19 different stores participating. And theoretically each store over each product category should have a 1 - 157 week interval. The part I am struggling with is how do I run a loop over the whole data set, while checking if all stores participated 157 weeks over the different products. So far I came up with this: n=61327 # Generate Matrix to check for values Control = matrix( 0, nrow = n, ncol = 1) s <- seq(from =1 , to = 157, by = 1) CW = matrix( s, nrow = 157, ncol = 1 ) colnames(CW)[1] <- ?s' CW = as.data.frame(CW) for (i in 1:nrow(FD)) { # Let run trhough all the rows for (j in 1:157) { if(FD$WEEk[j] == C$s[j]) { Control[i] = 1 # coresponding control row = 1 } else { Control[i] = 0 # corresponding control row = 0 } } } I coded a MRE and attached an sample of my data set. MRE: #MRE dat <- data.frame( Store = c(rep(8, times = 157), rep(12, times = 157)), # Number of stores WEEK = rep(seq(from=1, to = 157, by = 1), times = 2) )
Hi see in line> -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Christoph > Puschmann > Sent: Sunday, October 9, 2016 1:27 AM > To: Adrian Du?a <dusa.adrian at unibuc.ro> > Cc: r-help at r-project.org; Christoph Puschmann > <c.puschmann at student.unsw.edu.au> > Subject: Re: [R] Loop to check for large dataset > > Dear Adrian, > > Yes it is a cyclical data set and theoretically it should repeat this interval until > 61327. The data set itself is divided into 2 Parts: > 1. Product category (column 10) > 2. Number of Stores Participating (column 01) Overall there are 22 different > products and in each you have 19 different stores participating. And > theoretically each store over each product category should have a 1 - 157 > week interval.Not much clearer and definitely not reproducible. From what I understand you have 22*19= 418 combinations of product/store. How do you want to put these 418 combinations into 157 rows? It seems to me that it can be somehow done with aggregate function, however without some small reproducible example we are fishing in murky water. Try to post data with let say 3 stores and 4 products to explain how your data is structured and what is or is not correct. Cheers Petr> > The part I am struggling with is how do I run a loop over the whole data set, > while checking if all stores participated 157 weeks over the different > products. > > So far I came up with this: > > n=61327 # Generate Matrix to check for values > Control = matrix( > 0, > nrow = n, > ncol = 1) > > s <- seq(from =1 , to = 157, by = 1) > CW = matrix( > s, > nrow = 157, > ncol = 1 > ) > > colnames(CW)[1] <- ?s' > > CW = as.data.frame(CW) > > for (i in 1:nrow(FD)) { # Let run trhough all the rows > for (j in 1:157) { > if(FD$WEEk[j] == C$s[j]) { > Control[i] = 1 # coresponding control row = 1 > } else { > Control[i] = 0 # corresponding control row = 0 > } > } > } > > I coded a MRE and attached an sample of my data set. > > MRE: > > #MRE > > dat <- data.frame( > Store = c(rep(8, times = 157), rep(12, times = 157)), # Number of stores > WEEK = rep(seq(from=1, to = 157, by = 1), times = 2) > ) > > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.________________________________ Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny pouze jeho adres?t?m. Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho syst?mu. Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat. Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i zpo?d?n?m p?enosu e-mailu. V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?: - vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu. - a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout; Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce s dodatkem ?i odchylkou. - trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech. - odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi ?i osob? j?m zastoupen? zn?m?. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.
Dear Petr, I attached a sample file, which contains the first 4 products. It is more that I have: 157 weeks, 19 different Stores and 22 products: 157*19*22 = 65,626 rows. And as I sated I have roughly 63,127 rows. (so some have to be missing). All the best, Christoph
This is an example of how a reproducible code looks like, assuming you have three columns in your dataset named S (store), P (product) and W (week), and also assuming they have integer values from 1 to 19, 1 to 22 and 1 to 157 respectively: ######### mydata <- expand.grid(seq(19), seq(22), seq(157)) names(mydata) <- c("S", "P", "W") # randomly delete 65626 - 63127 = 2499 rows set.seed(12345) # make it replicable mydata <- mydata[-sample(seq(nrow(mydata)), nrow(mydata) - 63127), ] ######### Now the dataframe mydata contains exactly 63127 rows, just as in your case. The task is to find which weeks are missing, from which store and for which product. Below is a possible code to do that. Given you have a small number of stores and products, I'll keep it simple and stupid, by using for loops: ######### result <- matrix(nrow = 0, ncol = 3) for (i in seq(19)) { for (j in seq(22)) { miss <- setdiff(seq(157), mydata$W[mydata$S == i & mydata$P == j]) if (length(miss) > 0) { result <- rbind(result, cbind(S = i, P = j, W = miss)) } } } # The result matrix contains 2499 rows that are missing.> head(result)S P W [1,] 1 1 10 [2,] 1 1 11 [3,] 1 1 82 [4,] 1 1 100 [5,] 1 1 117 [6,] 1 1 148 ######### In this example, for S(tore) number 1 and P(roduct) number 1, you are missing W(eek) 10, 11, 82 and so on. In hoping you can adapt this code to your particular example, Adrian On Sun, Oct 9, 2016 at 2:26 AM, Christoph Puschmann < c.puschmann at student.unsw.edu.au> wrote:> > Dear Adrian, > > Yes it is a cyclical data set and theoretically it should repeat thisinterval until 61327. The data set itself is divided into 2 Parts:> 1. Product category (column 10) > 2. Number of Stores Participating (column 01) > Overall there are 22 different products and in each you have 19 differentstores participating. And theoretically each store over each product category should have a 1 - 157 week interval.> > The part I am struggling with is how do I run a loop over the whole dataset, while checking if all stores participated 157 weeks over the different products.> > So far I came up with this: > > n=61327 # Generate Matrix to check for values > Control = matrix( > 0, > nrow = n, > ncol = 1) > > s <- seq(from =1 , to = 157, by = 1) > CW = matrix( > s, > nrow = 157, > ncol = 1 > ) > > colnames(CW)[1] <- ?s' > > CW = as.data.frame(CW) > > for (i in 1:nrow(FD)) { # Let run trhough all the rows > for (j in 1:157) { > if(FD$WEEk[j] == C$s[j]) { > Control[i] = 1 # coresponding control row = 1 > } else { > Control[i] = 0 # corresponding control row = 0 > } > } > } > > I coded a MRE and attached an sample of my data set. > > MRE: > > #MRE > > dat <- data.frame( > Store = c(rep(8, times = 157), rep(12, times = 157)), # Number ofstores> WEEK = rep(seq(from=1, to = 157, by = 1), times = 2) > ) > > > >-- Adrian Dusa University of Bucharest Romanian Social Data Archive Soseaua Panduri nr.90 050663 Bucharest sector 5 Romania [[alternative HTML version deleted]]
Hi Given this example data, you can get same answer with less typing and without loops. res<-xtabs(~W+P+S,mydata) res1<-which(res==0, arr.ind=T) head(res1) W P S 10 10 1 1 11 11 1 1 82 82 1 1 100 100 1 1 117 117 1 1 148 148 1 1 Cheers Petr From: dusa.adrian at gmail.com [mailto:dusa.adrian at gmail.com] On Behalf Of Adrian Du?a Sent: Monday, October 10, 2016 12:26 PM To: Christoph Puschmann <c.puschmann at student.unsw.edu.au> Cc: r-help at r-project.org; PIKAL Petr <petr.pikal at precheza.cz> Subject: Re: [R] Loop to check for large dataset This is an example of how a reproducible code looks like, assuming you have three columns in your dataset named S (store), P (product) and W (week), and also assuming they have integer values from 1 to 19, 1 to 22 and 1 to 157 respectively: ######### mydata <- expand.grid(seq(19), seq(22), seq(157)) names(mydata) <- c("S", "P", "W") # randomly delete 65626 - 63127 = 2499 rows set.seed(12345) # make it replicable mydata <- mydata[-sample(seq(nrow(mydata)), nrow(mydata) - 63127), ] ######### Now the dataframe mydata contains exactly 63127 rows, just as in your case. The task is to find which weeks are missing, from which store and for which product. Below is a possible code to do that. Given you have a small number of stores and products, I'll keep it simple and stupid, by using for loops: ######### result <- matrix(nrow = 0, ncol = 3) for (i in seq(19)) { for (j in seq(22)) { miss <- setdiff(seq(157), mydata$W[mydata$S == i & mydata$P == j]) if (length(miss) > 0) { result <- rbind(result, cbind(S = i, P = j, W = miss)) } } } # The result matrix contains 2499 rows that are missing.> head(result)S P W [1,] 1 1 10 [2,] 1 1 11 [3,] 1 1 82 [4,] 1 1 100 [5,] 1 1 117 [6,] 1 1 148 ######### In this example, for S(tore) number 1 and P(roduct) number 1, you are missing W(eek) 10, 11, 82 and so on. In hoping you can adapt this code to your particular example, Adrian On Sun, Oct 9, 2016 at 2:26 AM, Christoph Puschmann <c.puschmann at student.unsw.edu.au<mailto:c.puschmann at student.unsw.edu.au>> wrote:> > Dear Adrian, > > Yes it is a cyclical data set and theoretically it should repeat this interval until 61327. The data set itself is divided into 2 Parts: > 1. Product category (column 10) > 2. Number of Stores Participating (column 01) > Overall there are 22 different products and in each you have 19 different stores participating. And theoretically each store over each product category should have a 1 - 157 week interval. > > The part I am struggling with is how do I run a loop over the whole data set, while checking if all stores participated 157 weeks over the different products. > > So far I came up with this: > > n=61327 # Generate Matrix to check for values > Control = matrix( > 0, > nrow = n, > ncol = 1) > > s <- seq(from =1 , to = 157, by = 1) > CW = matrix( > s, > nrow = 157, > ncol = 1 > ) > > colnames(CW)[1] <- ?s' > > CW = as.data.frame(CW) > > for (i in 1:nrow(FD)) { # Let run trhough all the rows > for (j in 1:157) { > if(FD$WEEk[j] == C$s[j]) { > Control[i] = 1 # coresponding control row = 1 > } else { > Control[i] = 0 # corresponding control row = 0 > } > } > } > > I coded a MRE and attached an sample of my data set. > > MRE: > > #MRE > > dat <- data.frame( > Store = c(rep(8, times = 157), rep(12, times = 157)), # Number of stores > WEEK = rep(seq(from=1, to = 157, by = 1), times = 2) > ) > > > >-- Adrian Dusa University of Bucharest Romanian Social Data Archive Soseaua Panduri nr.90 050663 Bucharest sector 5 Romania ________________________________ Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny pouze jeho adres?t?m. Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho syst?mu. Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat. Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i zpo?d?n?m p?enosu e-mailu. V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?: - vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu. - a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout; Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce s dodatkem ?i odchylkou. - trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech. - odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi ?i osob? j?m zastoupen? zn?m?. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient. [[alternative HTML version deleted]]