Dear R user: I have got the following problem: I have imported two data sets into R: one set includes price information, another one includes volume information. but I noticed the wrong data order problem in the product name, for instance, in one data set, "GALAXY ACE S 5830" in another one, it is "S 5830 GALAXY ACE" both represent same product. how do i map two name into one in R? there are so many product name having this problem. i hope there is some mechanism which can autimatically map those. thanks for your help.. Kind regards, Tammy [[alternative HTML version deleted]]
It may be easy or difficult depending on what your data are like. "GALAXY ACE S 5830" vs "S 5830 GALAXY ACE" One easy and reasonably general way would be to divide each such bit into 4 "words" and then compare if set 2 contains exactly all words in set 1 but possibly in different order. x1 <- "GALAXY ACE S 5830" x2 <- "S 5830 GALAXY ACE" x3 <- "S 5830 GALAXY ZOMBIE" divide <- function(x) strsplit(x1, " ")[[1]] check <- function(x, y) all(divide(x) %in% divide(y)) check(x1,x2) # [1] TRUE check(x1,x3) #FALSE Or you could try reading in your data in a different way so that "S", "GALAXY", "ACE", and "5830" would be in different variables (if all product names have identical structure i.e 4 elements, or is S 5830 supposed to be the price?). Or build a catalogue of all possible product names and then compare each name to it. etc htmh On 9/26/12, Tammy Ma <metal_licaling at live.com> wrote:> > Dear R user: > > > I have got the following problem: > > I have imported two data sets into R: one set includes price information, > another one includes volume information. but I noticed the wrong data order > problem in the product name, > > for instance, > > in one data set, > > "GALAXY ACE S 5830" > > in another one, > > it is "S 5830 GALAXY ACE" > > both represent same product. how do i map two name into one in R? > > there are so many product name having this problem. i hope there is some > mechanism which can autimatically map those. thanks for your help.. > > > Kind regards, > Tammy > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi Tammy, I think we need more information. Are the names always four parts? Does the fix always involve moving two parts from the back to the front? For that matter, which of the two you gave is correct? Or does it matter what order the parts are in as long as it's consistent? Sorting them would be easiest, and would work regardless of number of parts and how they were entered. That could be done, for instance, with strsplit(), sort() and paste(). Sarah On Wednesday, September 26, 2012, Tammy Ma wrote:> > Dear R user: > > > I have got the following problem: > > I have imported two data sets into R: one set includes price information, > another one includes volume information. but I noticed the wrong data order > problem in the product name, > > for instance, > > in one data set, > > "GALAXY ACE S 5830" > > in another one, > > it is "S 5830 GALAXY ACE" > > both represent same product. how do i map two name into one in R? > > there are so many product name having this problem. i hope there is some > mechanism which can autimatically map those. thanks for your help.. > > > Kind regards, > Tammy > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org <javascript:;> mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Sarah Goslee http://www.stringpage.com http://www.sarahgoslee.com http://www.functionaldiversity.org [[alternative HTML version deleted]]
HI, Try this: vec1<-"GALAXY ACE S 5830" vec2<-"S 5830 GALAXY ACE" vec3<-"R GALAXY 5812 ACE" ?vec11<-paste(sort(unlist(strsplit(vec2," "))),collapse="_") ?vec22<-paste(sort(unlist(strsplit(vec2," "))),collapse="_") ?vec11 #[1] "5830_ACE_GALAXY_S" ?vec22 #[1] "5830_ACE_GALAXY_S" ?identical(vec11,vec22) #[1] TRUE ?vec33<-paste(sort(unlist(strsplit(vec3," "))),collapse="_") ?identical(vec11,vec33) #[1] FALSE A.K. ----- Original Message ----- From: Tammy Ma <metal_licaling at live.com> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Wednesday, September 26, 2012 5:04 AM Subject: [R] map two names into one Dear R user: I have got the following problem: I have imported two data sets into R: one set includes price information, another one includes volume information. but I noticed the wrong data order problem in the product name, for instance, in one data set, "GALAXY ACE S 5830" in another one, it is "S 5830 GALAXY ACE"? both represent same product. how do i map two name into one in R? there are so many product name having this problem. i hope there is some mechanism which can autimatically map those.? thanks for your help.. Kind regards, Tammy ??? ??? ??? ? ??? ??? ? ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.