Hey PIKAL, It's not a homework neithe that is the real dataset i have signer NDA for my company so that i can share the original data file, Actually I'm working on a market basket analysis task but not able to convert my existing data table to appropriate format so that i can apply Apriori algorithm using R, and this is very important me to get it done because I'm an intern and if i won't get it done they will not going to hire me as a full-time employee. i tried everything by myself but not able to get it done. your precious 10-15 can save my upcoming years. so please if you can please help me through this. i want another dataset based on first two dataset i have mentioned . Thanks On 30 August 2017 at 12:49, PIKAL Petr <petr.pikal at precheza.cz> wrote:> Hi > > It seems to me like homework, there is no homework policy on this help > list. > > What do you want to do with your table 3? It seems to me futile. > > Anyway, some combination of melt, merge, cast and regular expressions > could be employed in such task, but it could be rather tricky. > > But be aware that > > Suger does not match sugar (I wonder that sugar is dairy product) > > and you mix uppercase and lowercase letters which could be also > problematic, when matching words. > > Cheers > Petr > > > -----Original Message----- > > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Hemant > Sain > > Sent: Wednesday, August 30, 2017 8:28 AM > > To: r-help at r-project.org > > Subject: [R] Dataframe Manipulation > > > > i want to do a market basket analysis and I?m trying to create a dataset > for that > > i have two tables, one table contains daily transaction of products in > which > > each row of table shows item purchased by the customer, The second table > > contains parent group under those products are fallen, for example under > fruit > > category there are several fruits like mango, banana, apple etc. > > i want to create a third table in which parent group are mentioned as > header > > which can be extracted from Table 2, and all the rows represent > transaction of > > products > > > > with their names, and if there is no transaction for any parent category > then > > the cell supposed to fill as NA. please help me with R or C/c++ code( R > would be > > > > preferred) here I?m attaching you all three tables for better reference > i have > > first two tables and i want to get a table like table 3 > > > > Tables are explained in the attached doc. > > > > -- > > hemantsain.com > > ________________________________ > Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou > ur?eny pouze jeho adres?t?m. > Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? > neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie > vyma?te ze sv?ho syst?mu. > Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email > jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat. > Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi > ?i zpo?d?n?m p?enosu e-mailu. > > V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?: > - vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? > smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu. > - a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout; > Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany > p??jemce s dodatkem ?i odchylkou. > - trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve > v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech. > - odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za > spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n > nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto > emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich > existence je adres?tovi ?i osob? j?m zastoupen? zn?m?. > > This e-mail and any documents attached to it may be confidential and are > intended only for its intended recipients. > If you received this e-mail by mistake, please immediately inform its > sender. Delete the contents of this e-mail with all attachments and its > copies from your system. > If you are not the intended recipient of this e-mail, you are not > authorized to use, disseminate, copy or disclose this e-mail in any manner. > The sender of this e-mail shall not be liable for any possible damage > caused by modifications of the e-mail or by delay with transfer of the > email. > > In case that this e-mail forms part of business dealings: > - the sender reserves the right to end negotiations about entering into a > contract in any time, for any reason, and without stating any reasoning. > - if the e-mail contains an offer, the recipient is entitled to > immediately accept such offer; The sender of this e-mail (offer) excludes > any acceptance of the offer on the part of the recipient containing any > amendment or variation. > - the sender insists on that the respective contract is concluded only > upon an express mutual agreement on all its aspects. > - the sender of this e-mail informs that he/she is not authorized to enter > into any contracts on behalf of the company except for cases in which > he/she is expressly authorized to do so in writing, and such authorization > or power of attorney is submitted to the recipient or the person > represented by the recipient, or the existence of such authorization is > known to the recipient of the person represented by the recipient. >-- hemantsain.com [[alternative HTML version deleted]]
Hi Hemant,
Does this help you along?
table_1 <- textConnection("Item_1;Item_2;Item_3
1KG banana;300ML milk;1kg sugar
2Large Corona_Beer;2pack Fries;
2 Lux_Soap;1kg sugar;")
table_1 <- read.csv(table_1, sep = ";", na.strings = "",
stringsAsFactors FALSE, check.names = FALSE)
table_2 <-
textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy
Products
Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk
Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter
Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red M;sugar
Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste")
table_2 <- read.csv(table_2, sep = ";", na.strings = "",
stringsAsFactors FALSE, check.names = FALSE)
library(tidyr)
library(dplyr)
table_2 <- gather(table_2, "Category", "Item")
table_1 <- gather(table_1, "Foo", "Item") %>%
filter(!is.na(Item))
table_1 <- separate(table_1, col = "Item", into =
c("Quantity", "Item"),
sep = " ")
table_3 <- left_join(table_1, table_2, by = "Item") %>%
mutate(Item = paste(Quantity, Item)) %>%
select(-Quantity)
table_3 %>%
group_by(Foo, Category) %>%
summarise(Item = paste(Item, collapse = ", ")) %>%
spread(key = "Category", value = "Item")
You need to figure out how to handle words written with different cases and
how to get the quantity in an universal way. For the code above, I
corrected these things by hand in the example data.
HTH
Ulrik
On Wed, 30 Aug 2017 at 10:16 Hemant Sain <hemantsain55 at gmail.com>
wrote:
> Hey PIKAL,
> It's not a homework neithe that is the real dataset i have signer NDA
for
> my company so that i can share the original data file, Actually I'm
working
> on a market basket analysis task but not able to convert my existing data
> table to appropriate format so that i can apply Apriori algorithm using R,
> and this is very important me to get it done because I'm an intern and
if i
> won't get it done they will not going to hire me as a full-time
employee.
> i tried everything by myself but not able to get it done.
> your precious 10-15 can save my upcoming years. so please if you can please
> help me through this.
> i want another dataset based on first two dataset i have mentioned .
>
> Thanks
>
> On 30 August 2017 at 12:49, PIKAL Petr <petr.pikal at precheza.cz>
wrote:
>
> > Hi
> >
> > It seems to me like homework, there is no homework policy on this help
> > list.
> >
> > What do you want to do with your table 3? It seems to me futile.
> >
> > Anyway, some combination of melt, merge, cast and regular expressions
> > could be employed in such task, but it could be rather tricky.
> >
> > But be aware that
> >
> > Suger does not match sugar (I wonder that sugar is dairy product)
> >
> > and you mix uppercase and lowercase letters which could be also
> > problematic, when matching words.
> >
> > Cheers
> > Petr
> >
> > > -----Original Message-----
> > > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf
Of Hemant
> > Sain
> > > Sent: Wednesday, August 30, 2017 8:28 AM
> > > To: r-help at r-project.org
> > > Subject: [R] Dataframe Manipulation
> > >
> > > i want to do a market basket analysis and I?m trying to create a
> dataset
> > for that
> > > i have two tables, one table contains daily transaction of
products in
> > which
> > > each row of table shows item purchased by the customer, The
second
> table
> > > contains parent group under those products are fallen, for
example
> under
> > fruit
> > > category there are several fruits like mango, banana, apple etc.
> > > i want to create a third table in which parent group are
mentioned as
> > header
> > > which can be extracted from Table 2, and all the rows represent
> > transaction of
> > > products
> > >
> > > with their names, and if there is no transaction for any parent
> category
> > then
> > > the cell supposed to fill as NA. please help me with R or C/c++
code( R
> > would be
> > >
> > > preferred) here I?m attaching you all three tables for better
reference
> > i have
> > > first two tables and i want to get a table like table 3
> > >
> > > Tables are explained in the attached doc.
> > >
> > > --
> > > hemantsain.com
> >
> > ________________________________
> > Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a
jsou
> > ur?eny pouze jeho adres?t?m.
> > Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav?
> > neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho
> kopie
> > vyma?te ze sv?ho syst?mu.
> > Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento
> email
> > jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat.
> > Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou
modifikacemi
> > ?i zpo?d?n?m p?enosu e-mailu.
> >
> > V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?:
> > - vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en?
> > smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu.
> > - a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn?
> p?ijmout;
> > Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany
> > p??jemce s dodatkem ?i odchylkou.
> > - trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve
> > v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech.
> > - odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za
> > spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn?
> zmocn?n
> > nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi
> tohoto
> > emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo
jejich
> > existence je adres?tovi ?i osob? j?m zastoupen? zn?m?.
> >
> > This e-mail and any documents attached to it may be confidential and
are
> > intended only for its intended recipients.
> > If you received this e-mail by mistake, please immediately inform its
> > sender. Delete the contents of this e-mail with all attachments and
its
> > copies from your system.
> > If you are not the intended recipient of this e-mail, you are not
> > authorized to use, disseminate, copy or disclose this e-mail in any
> manner.
> > The sender of this e-mail shall not be liable for any possible damage
> > caused by modifications of the e-mail or by delay with transfer of the
> > email.
> >
> > In case that this e-mail forms part of business dealings:
> > - the sender reserves the right to end negotiations about entering
into a
> > contract in any time, for any reason, and without stating any
reasoning.
> > - if the e-mail contains an offer, the recipient is entitled to
> > immediately accept such offer; The sender of this e-mail (offer)
excludes
> > any acceptance of the offer on the part of the recipient containing
any
> > amendment or variation.
> > - the sender insists on that the respective contract is concluded only
> > upon an express mutual agreement on all its aspects.
> > - the sender of this e-mail informs that he/she is not authorized to
> enter
> > into any contracts on behalf of the company except for cases in which
> > he/she is expressly authorized to do so in writing, and such
> authorization
> > or power of attorney is submitted to the recipient or the person
> > represented by the recipient, or the existence of such authorization
is
> > known to the recipient of the person represented by the recipient.
> >
>
>
>
> --
> hemantsain.com
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
by using these two tables we have to create third table in this format where categories will be on the top and transaction will be in the rows, On 30 August 2017 at 16:42, Hemant Sain <hemantsain55 at gmail.com> wrote:> Hello Ulrik, > Can you please once check this code again on the following data set > because it doesn't giving same output to me due to absence of quantity,a > compare to previous demo data set becaue spiting is getting done on the > basis of quantity and in real data set quantity is missing. so please use > following data set and help me out please consider this mail is my final > email i won't bother you again but its about my job please help me > . > > Note* the file I'm attaching is very confidential > > On 30 August 2017 at 15:02, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote: > >> Hi Hemant, >> >> Does this help you along? >> >> table_1 <- textConnection("Item_1;Item_2;Item_3 >> 1KG banana;300ML milk;1kg sugar >> 2Large Corona_Beer;2pack Fries; >> 2 Lux_Soap;1kg sugar;") >> >> table_1 <- read.csv(table_1, sep = ";", na.strings = "", stringsAsFactors >> = FALSE, check.names = FALSE) >> >> table_2 <- textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy >> Products >> Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk >> Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter >> Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red M;sugar >> Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste") >> >> table_2 <- read.csv(table_2, sep = ";", na.strings = "", stringsAsFactors >> = FALSE, check.names = FALSE) >> >> library(tidyr) >> library(dplyr) >> >> table_2 <- gather(table_2, "Category", "Item") >> >> table_1 <- gather(table_1, "Foo", "Item") %>% >> filter(!is.na(Item)) >> >> table_1 <- separate(table_1, col = "Item", into = c("Quantity", "Item"), >> sep = " ") >> >> table_3 <- left_join(table_1, table_2, by = "Item") %>% >> mutate(Item = paste(Quantity, Item)) %>% >> select(-Quantity) >> >> table_3 %>% >> group_by(Foo, Category) %>% >> summarise(Item = paste(Item, collapse = ", ")) %>% >> spread(key = "Category", value = "Item") >> >> You need to figure out how to handle words written with different cases >> and how to get the quantity in an universal way. For the code above, I >> corrected these things by hand in the example data. >> >> HTH >> Ulrik >> >> On Wed, 30 Aug 2017 at 10:16 Hemant Sain <hemantsain55 at gmail.com> wrote: >> >>> Hey PIKAL, >>> It's not a homework neithe that is the real dataset i have signer NDA for >>> my company so that i can share the original data file, Actually I'm >>> working >>> on a market basket analysis task but not able to convert my existing data >>> table to appropriate format so that i can apply Apriori algorithm using >>> R, >>> and this is very important me to get it done because I'm an intern and >>> if i >>> won't get it done they will not going to hire me as a full-time >>> employee. >>> i tried everything by myself but not able to get it done. >>> your precious 10-15 can save my upcoming years. so please if you can >>> please >>> help me through this. >>> i want another dataset based on first two dataset i have mentioned . >>> >>> Thanks >>> >>> On 30 August 2017 at 12:49, PIKAL Petr <petr.pikal at precheza.cz> wrote: >>> >>> > Hi >>> > >>> > It seems to me like homework, there is no homework policy on this help >>> > list. >>> > >>> > What do you want to do with your table 3? It seems to me futile. >>> > >>> > Anyway, some combination of melt, merge, cast and regular expressions >>> > could be employed in such task, but it could be rather tricky. >>> > >>> > But be aware that >>> > >>> > Suger does not match sugar (I wonder that sugar is dairy product) >>> > >>> > and you mix uppercase and lowercase letters which could be also >>> > problematic, when matching words. >>> > >>> > Cheers >>> > Petr >>> > >>> > > -----Original Message----- >>> > > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of >>> Hemant >>> > Sain >>> > > Sent: Wednesday, August 30, 2017 8:28 AM >>> > > To: r-help at r-project.org >>> > > Subject: [R] Dataframe Manipulation >>> > > >>> > > i want to do a market basket analysis and I?m trying to create a >>> dataset >>> > for that >>> > > i have two tables, one table contains daily transaction of products >>> in >>> > which >>> > > each row of table shows item purchased by the customer, The second >>> table >>> > > contains parent group under those products are fallen, for example >>> under >>> > fruit >>> > > category there are several fruits like mango, banana, apple etc. >>> > > i want to create a third table in which parent group are mentioned as >>> > header >>> > > which can be extracted from Table 2, and all the rows represent >>> > transaction of >>> > > products >>> > > >>> > > with their names, and if there is no transaction for any parent >>> category >>> > then >>> > > the cell supposed to fill as NA. please help me with R or C/c++ >>> code( R >>> > would be >>> > > >>> > > preferred) here I?m attaching you all three tables for better >>> reference >>> > i have >>> > > first two tables and i want to get a table like table 3 >>> > > >>> > > Tables are explained in the attached doc. >>> > > >>> > > -- >>> > > hemantsain.com >>> > >>> > ________________________________ >>> > Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou >>> > ur?eny pouze jeho adres?t?m. >>> > Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? >>> > neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho >>> kopie >>> > vyma?te ze sv?ho syst?mu. >>> > Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento >>> email >>> > jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat. >>> > Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou >>> modifikacemi >>> > ?i zpo?d?n?m p?enosu e-mailu. >>> > >>> > V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?: >>> > - vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? >>> > smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu. >>> > - a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? >>> p?ijmout; >>> > Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany >>> > p??jemce s dodatkem ?i odchylkou. >>> > - trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve >>> > v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech. >>> > - odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za >>> > spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? >>> zmocn?n >>> > nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi >>> tohoto >>> > emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich >>> > existence je adres?tovi ?i osob? j?m zastoupen? zn?m?. >>> > >>> > This e-mail and any documents attached to it may be confidential and >>> are >>> > intended only for its intended recipients. >>> > If you received this e-mail by mistake, please immediately inform its >>> > sender. Delete the contents of this e-mail with all attachments and its >>> > copies from your system. >>> > If you are not the intended recipient of this e-mail, you are not >>> > authorized to use, disseminate, copy or disclose this e-mail in any >>> manner. >>> > The sender of this e-mail shall not be liable for any possible damage >>> > caused by modifications of the e-mail or by delay with transfer of the >>> > email. >>> > >>> > In case that this e-mail forms part of business dealings: >>> > - the sender reserves the right to end negotiations about entering >>> into a >>> > contract in any time, for any reason, and without stating any >>> reasoning. >>> > - if the e-mail contains an offer, the recipient is entitled to >>> > immediately accept such offer; The sender of this e-mail (offer) >>> excludes >>> > any acceptance of the offer on the part of the recipient containing any >>> > amendment or variation. >>> > - the sender insists on that the respective contract is concluded only >>> > upon an express mutual agreement on all its aspects. >>> > - the sender of this e-mail informs that he/she is not authorized to >>> enter >>> > into any contracts on behalf of the company except for cases in which >>> > he/she is expressly authorized to do so in writing, and such >>> authorization >>> > or power of attorney is submitted to the recipient or the person >>> > represented by the recipient, or the existence of such authorization is >>> > known to the recipient of the person represented by the recipient. >>> > >>> >>> >>> >>> -- >>> hemantsain.com >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posti >>> ng-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> > > > -- > hemantsain.com >-- hemantsain.com
Hi Hemant,
the solution is really quite similar, and the logic is identical:
library(readr)
library(dplyr)
library(stringr)
library(tidyr)
data_help <- read_csv("data_help.csv")
cat_help <- read_csv("cat_help.csv")
# Helper function to split the Items and create a data_frame
split_items <- function(items){
x <- items$Items_purchased_on_Receipts %>%
str_split(pattern = ",") %>%
unlist(use.names = FALSE)
data_frame(Item = x, Purchase_ID = items$Purchase_ID)
}
data_help <-
data_help %>%
mutate(Purchase_ID = 1:n()) %>%
group_by(Purchase_ID) %>%
do(split_items(.))
cat_help %>% gather("Foo", "Item") %>%
filter(!is.na(Item)) %>%
left_join(data_help, by = "Item") %>%
group_by(Foo, Purchase_ID) %>%
summarise(Item = paste(Item, collapse = ", ")) %>%
spread(key = "Foo", value = "Item")
HTH
Ulrik
On Wed, 30 Aug 2017 at 13:22 Hemant Sain <hemantsain55 at gmail.com>
wrote:
> by using these two tables we have to create third table in this format
> where categories will be on the top and transaction will be in the rows,
>
> On 30 August 2017 at 16:42, Hemant Sain <hemantsain55 at gmail.com>
wrote:
>
>> Hello Ulrik,
>> Can you please once check this code again on the following data set
>> because it doesn't giving same output to me due to absence of
quantity,a
>> compare to previous demo data set becaue spiting is getting done on the
>> basis of quantity and in real data set quantity is missing. so please
use
>> following data set and help me out please consider this mail is my
final
>> email i won't bother you again but its about my job please help me
>> .
>>
>> Note* the file I'm attaching is very confidential
>>
>> On 30 August 2017 at 15:02, Ulrik Stervbo <ulrik.stervbo at
gmail.com>
>> wrote:
>>
>>> Hi Hemant,
>>>
>>> Does this help you along?
>>>
>>> table_1 <- textConnection("Item_1;Item_2;Item_3
>>> 1KG banana;300ML milk;1kg sugar
>>> 2Large Corona_Beer;2pack Fries;
>>> 2 Lux_Soap;1kg sugar;")
>>>
>>> table_1 <- read.csv(table_1, sep = ";", na.strings =
"",
>>> stringsAsFactors = FALSE, check.names = FALSE)
>>>
>>> table_2 <-
>>>
textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy
>>> Products
>>> Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt
XL;milk
>>> Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black
L;Butter
>>> Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red
>>> M;sugar
>>> Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste")
>>>
>>> table_2 <- read.csv(table_2, sep = ";", na.strings =
"",
>>> stringsAsFactors = FALSE, check.names = FALSE)
>>>
>>> library(tidyr)
>>> library(dplyr)
>>>
>>> table_2 <- gather(table_2, "Category",
"Item")
>>>
>>> table_1 <- gather(table_1, "Foo", "Item")
%>%
>>> filter(!is.na(Item))
>>>
>>> table_1 <- separate(table_1, col = "Item", into =
c("Quantity", "Item"),
>>> sep = " ")
>>>
>>> table_3 <- left_join(table_1, table_2, by = "Item")
%>%
>>> mutate(Item = paste(Quantity, Item)) %>%
>>> select(-Quantity)
>>>
>>> table_3 %>%
>>> group_by(Foo, Category) %>%
>>> summarise(Item = paste(Item, collapse = ", ")) %>%
>>> spread(key = "Category", value = "Item")
>>>
>>> You need to figure out how to handle words written with different
cases
>>> and how to get the quantity in an universal way. For the code
above, I
>>> corrected these things by hand in the example data.
>>>
>>> HTH
>>> Ulrik
>>>
>>> On Wed, 30 Aug 2017 at 10:16 Hemant Sain <hemantsain55 at
gmail.com> wrote:
>>>
>>>> Hey PIKAL,
>>>> It's not a homework neithe that is the real dataset i have
signer NDA
>>>> for
>>>> my company so that i can share the original data file, Actually
I'm
>>>> working
>>>> on a market basket analysis task but not able to convert my
existing
>>>> data
>>>> table to appropriate format so that i can apply Apriori
algorithm using
>>>> R,
>>>> and this is very important me to get it done because I'm an
intern and
>>>> if i
>>>> won't get it done they will not going to hire me as a
full-time
>>>> employee.
>>>> i tried everything by myself but not able to get it done.
>>>> your precious 10-15 can save my upcoming years. so please if
you can
>>>> please
>>>> help me through this.
>>>> i want another dataset based on first two dataset i have
mentioned .
>>>>
>>>> Thanks
>>>>
>>>> On 30 August 2017 at 12:49, PIKAL Petr <petr.pikal at
precheza.cz> wrote:
>>>>
>>>> > Hi
>>>> >
>>>> > It seems to me like homework, there is no homework policy
on this help
>>>> > list.
>>>> >
>>>> > What do you want to do with your table 3? It seems to me
futile.
>>>> >
>>>> > Anyway, some combination of melt, merge, cast and regular
expressions
>>>> > could be employed in such task, but it could be rather
tricky.
>>>> >
>>>> > But be aware that
>>>> >
>>>> > Suger does not match sugar (I wonder that sugar is dairy
product)
>>>> >
>>>> > and you mix uppercase and lowercase letters which could be
also
>>>> > problematic, when matching words.
>>>> >
>>>> > Cheers
>>>> > Petr
>>>> >
>>>> > > -----Original Message-----
>>>> > > From: R-help [mailto:r-help-bounces at r-project.org]
On Behalf Of
>>>> Hemant
>>>> > Sain
>>>> > > Sent: Wednesday, August 30, 2017 8:28 AM
>>>> > > To: r-help at r-project.org
>>>> > > Subject: [R] Dataframe Manipulation
>>>> > >
>>>> > > i want to do a market basket analysis and I?m trying
to create a
>>>> dataset
>>>> > for that
>>>> > > i have two tables, one table contains daily
transaction of products
>>>> in
>>>> > which
>>>> > > each row of table shows item purchased by the
customer, The second
>>>> table
>>>> > > contains parent group under those products are
fallen, for example
>>>> under
>>>> > fruit
>>>> > > category there are several fruits like mango, banana,
apple etc.
>>>> > > i want to create a third table in which parent group
are mentioned
>>>> as
>>>> > header
>>>> > > which can be extracted from Table 2, and all the rows
represent
>>>> > transaction of
>>>> > > products
>>>> > >
>>>> > > with their names, and if there is no transaction for
any parent
>>>> category
>>>> > then
>>>> > > the cell supposed to fill as NA. please help me with
R or C/c++
>>>> code( R
>>>> > would be
>>>> > >
>>>> > > preferred) here I?m attaching you all three tables
for better
>>>> reference
>>>> > i have
>>>> > > first two tables and i want to get a table like table
3
>>>> > >
>>>> > > Tables are explained in the attached doc.
>>>> > >
>>>> > > --
>>>> > > hemantsain.com
>>>> >
>>>> > ________________________________
>>>> > Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou
d?v?rn? a
>>>> jsou
>>>> > ur?eny pouze jeho adres?t?m.
>>>> > Jestli?e jste obdr?el(a) tento e-mail omylem, informujte
laskav?
>>>> > neprodlen? jeho odes?latele. Obsah tohoto emailu i s
p??lohami a jeho
>>>> kopie
>>>> > vyma?te ze sv?ho syst?mu.
>>>> > Nejste-li zam??len?m adres?tem tohoto emailu, nejste
opr?vn?ni tento
>>>> email
>>>> > jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat.
>>>> > Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu
zp?sobenou
>>>> modifikacemi
>>>> > ?i zpo?d?n?m p?enosu e-mailu.
>>>> >
>>>> > V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?:
>>>> > - vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o
uzav?en?
>>>> > smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu.
>>>> > - a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku
bezodkladn?
>>>> p?ijmout;
>>>> > Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet?
nab?dky ze strany
>>>> > p??jemce s dodatkem ?i odchylkou.
>>>> > - trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena
teprve
>>>> > v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech.
>>>> > - odes?latel tohoto emailu informuje, ?e nen? opr?vn?n
uzav?rat za
>>>> > spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu
byl p?semn?
>>>> zmocn?n
>>>> > nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly
adres?tovi
>>>> tohoto
>>>> > emailu p??padn? osob?, kterou adres?t zastupuje,
p?edlo?eny nebo
>>>> jejich
>>>> > existence je adres?tovi ?i osob? j?m zastoupen? zn?m?.
>>>> >
>>>> > This e-mail and any documents attached to it may be
confidential and
>>>> are
>>>> > intended only for its intended recipients.
>>>> > If you received this e-mail by mistake, please immediately
inform its
>>>> > sender. Delete the contents of this e-mail with all
attachments and
>>>> its
>>>> > copies from your system.
>>>> > If you are not the intended recipient of this e-mail, you
are not
>>>> > authorized to use, disseminate, copy or disclose this
e-mail in any
>>>> manner.
>>>> > The sender of this e-mail shall not be liable for any
possible damage
>>>> > caused by modifications of the e-mail or by delay with
transfer of the
>>>> > email.
>>>> >
>>>> > In case that this e-mail forms part of business dealings:
>>>> > - the sender reserves the right to end negotiations about
entering
>>>> into a
>>>> > contract in any time, for any reason, and without stating
any
>>>> reasoning.
>>>> > - if the e-mail contains an offer, the recipient is
entitled to
>>>> > immediately accept such offer; The sender of this e-mail
(offer)
>>>> excludes
>>>> > any acceptance of the offer on the part of the recipient
containing
>>>> any
>>>> > amendment or variation.
>>>> > - the sender insists on that the respective contract is
concluded only
>>>> > upon an express mutual agreement on all its aspects.
>>>> > - the sender of this e-mail informs that he/she is not
authorized to
>>>> enter
>>>> > into any contracts on behalf of the company except for
cases in which
>>>> > he/she is expressly authorized to do so in writing, and
such
>>>> authorization
>>>> > or power of attorney is submitted to the recipient or the
person
>>>> > represented by the recipient, or the existence of such
authorization
>>>> is
>>>> > known to the recipient of the person represented by the
recipient.
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> hemantsain.com
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>
>>>
>>
>>
>> --
>> hemantsain.com
>>
>
>
>
> --
> hemantsain.com
>
[[alternative HTML version deleted]]