I know that the basic approach to append or merge data frames is using the rbind and merge commands. However, if I understand things correctly, for both commands one needs to do quite some additional programming to get e.g. behavior as with the Stata append and morge commands or to achieve some things which I think users need quite frequently. E.g. for appending, the data frame must have identical column names. In order to rename columns or in order to add columns with missing values if necessary, additional programming is needed. For merging, all matches get combined, so it is not easily possible to check for 1:1 or 1:n matches or limit the join to such kind of matches, is it? Those are just examples, there are a number of additional details that would be useful to be able to control for merging/appending (maybe at the expense of restricting the operation to just data frames). So my question is: are there any packages or existing utility functions which would provide append and merge functionality at a slightly higher (user-friendly) level? Although I am quite a noob, I would be prepared to give it a try and program these myself, but I have the feeling that this must be so common that maybe it would mean re-inventing the wheel? ____________________________________________________________ GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most webmails
Hello, John, as a start take a look at ?merge And to (maybe) get a bit overwhelmed at first sight use RSiteSearch( "merge") Hth -- Gerrit On Thu, 7 Feb 2013, John Smith wrote:> I know that the basic approach to append or merge data frames is using the rbind and merge commands. > However, if I understand things correctly, for both commands one needs to do quite some additional programming to get e.g. behavior as with the Stata append and morge commands or to achieve some things which I think users need quite frequently. > > E.g. for appending, the data frame must have identical column names. In order to rename columns or in order to add columns with missing values if necessary, additional programming is needed. > For merging, all matches get combined, so it is not easily possible to check for 1:1 or 1:n matches or limit the join to such kind of matches, is it? > Those are just examples, there are a number of additional details that would be useful to be able to control for merging/appending (maybe at the expense of restricting the operation to just data frames). > > So my question is: are there any packages or existing utility functions which would provide append and merge functionality at a slightly higher (user-friendly) level? > > Although I am quite a noob, I would be prepared to give it a try and program these myself, but I have the feeling that this must be so common that maybe it would mean re-inventing the wheel? > > ____________________________________________________________ > GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at http://www.inbox.com/smileys > Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk?? and most webmails > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Gerrit, as I said in my original email, I already know both the merge and the rbind commands, but I think that many standard situations (I have given just a few) require rather clumsy ad-hoc programming. So I was wondering if there are any packages or existing code that would make it easier to handle the diverse append/merge tasks that tend to occur rather frequently. For at least some of these tasks, other stats packages, like Stata do provide out-of-the box commands or command options. Again, I just want to know if there is already some package or code for this out there somewhere, especially for appending tasks (automatically taking care of missing variables, renaming, data types etc.) The RSiteSearch() function is helpful and came up with slightly different versions of merge but nothing for append. John> -----Original Message----- > From: gerrit.eichner at math.uni-giessen.de > Sent: Thu, 7 Feb 2013 16:57:13 +0100 (MET) > To: johsmi9933 at inbox.com > Subject: Re: [R] appending and merging data frames > > Hello, John, > > as a start take a look at > > ?merge > > And to (maybe) get a bit overwhelmed at first sight use > > RSiteSearch( "merge") > > > Hth -- Gerrit > > On Thu, 7 Feb 2013, John Smith wrote: > >> I know that the basic approach to append or merge data frames is using >> the rbind and merge commands. >> However, if I understand things correctly, for both commands one needs >> to do quite some additional programming to get e.g. behavior as with the >> Stata append and morge commands or to achieve some things which I think >> users need quite frequently. >> >> E.g. for appending, the data frame must have identical column names. In >> order to rename columns or in order to add columns with missing values >> if necessary, additional programming is needed. >> For merging, all matches get combined, so it is not easily possible to >> check for 1:1 or 1:n matches or limit the join to such kind of matches, >> is it? >> Those are just examples, there are a number of additional details that >> would be useful to be able to control for merging/appending (maybe at >> the expense of restricting the operation to just data frames). >> >> So my question is: are there any packages or existing utility functions >> which would provide append and merge functionality at a slightly higher >> (user-friendly) level? >> >> Although I am quite a noob, I would be prepared to give it a try and >> program these myself, but I have the feeling that this must be so common >> that maybe it would mean re-inventing the wheel? >> >> ____________________________________________________________ >> GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at >> http://www.inbox.com/smileys >> Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk?? >> and most webmails >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.____________________________________________________________ FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more!
On Feb 7, 2013, at 9:12 AM, John Smith wrote:> Hi Gerrit, > > as I said in my original email, I already know both the > merge and the rbind commands, but I think that many > standard situations (I have given just a few) require > rather clumsy ad-hoc programming. So I was wondering > if there are any packages or existing code that would > make it easier to handle the diverse append/merge > tasks that tend to occur rather frequently. > For at least some of these tasks, other stats > packages, like Stata do provide out-of-the box > commands or command options. > Again, I just want to know if there is already some > package or code for this out there somewhere, especially > for appending tasks (automatically taking care of > missing variables, renaming, data types etc.) > The RSiteSearch() function is helpful and came up > with slightly different versions of merge but nothing > for append.The sqldf package provides an interface to popular database drivers. I do not think your question provides enough specificity to go much further. You say Stata provides ... something... but you do not really explain what that something is. My efforts to understand the Stata documentation for the egen command left me shaking my head in disbelief at its opacity, and caused me to appreciate further the efforts of the R developers to make our help system available. The most productive approach would be to present a simple example in R code. -- David.> > John > >> -----Original Message----- >> From: gerrit.eichner at math.uni-giessen.de >> Sent: Thu, 7 Feb 2013 16:57:13 +0100 (MET) >> To: johsmi9933 at inbox.com >> Subject: Re: [R] appending and merging data frames >> >> Hello, John, >> >> as a start take a look at >> >> ?merge >> >> And to (maybe) get a bit overwhelmed at first sight use >> >> RSiteSearch( "merge") >> >> >> Hth -- Gerrit >> >> On Thu, 7 Feb 2013, John Smith wrote: >> >>> I know that the basic approach to append or merge data frames is >>> using >>> the rbind and merge commands. >>> However, if I understand things correctly, for both commands one >>> needs >>> to do quite some additional programming to get e.g. behavior as >>> with the >>> Stata append and morge commands or to achieve some things which I >>> think >>> users need quite frequently. >>> >>> E.g. for appending, the data frame must have identical column >>> names. In >>> order to rename columns or in order to add columns with missing >>> values >>> if necessary, additional programming is needed. >>> For merging, all matches get combined, so it is not easily >>> possible to >>> check for 1:1 or 1:n matches or limit the join to such kind of >>> matches, >>> is it? >>> Those are just examples, there are a number of additional details >>> that >>> would be useful to be able to control for merging/appending (maybe >>> at >>> the expense of restricting the operation to just data frames). >>> >>> So my question is: are there any packages or existing utility >>> functions >>> which would provide append and merge functionality at a slightly >>> higher >>> (user-friendly) level? >>> >>> Although I am quite a noob, I would be prepared to give it a try and >>> program these myself, but I have the feeling that this must be so >>> common >>> that maybe it would mean re-inventing the wheel? >>> >>> ____________________________________________________________ >>> GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at >>> http://www.inbox.com/smileys >>> Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google >>> Talk?? >>> and most webmails >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > ____________________________________________________________ > FREE ONLINE PHOTOSHARING - Share your photos online with your > friends and family! > Visit http://www.inbox.com/photosharing to find out more! > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Alameda, CA, USA
I gave a small number of examples in my first email. For append, in Stata any variable that is missing in the other dataset is automatically inserted with all values set to the missing value. In R, one would first have to compare the columns in both data frames and generate columns apropriately. For merge, when merging dataset A with dataset B on some set of key variables, it is possible to specify if the join should be done 1:1, 1:n, n:1 or n:m. Stata creates a data structure that contains information about which observations could or could not be matched and which matches did not fit the 1:1/1:n or n:1 join pattern. Again, I know that in R it is not hard to program all that yourself (while programming in Stata is something different altogether), I just wondered if for these operations, which probably are needed quite often, there are some additional packages or existing code that already deals with most of the situations one encounters when appending/merging. The background is that I am making the move from other stats packages (including Stata) to R myself and I also want to motivate others to do it. Also, before I go about implementing my own utility functions for this, I wanted to make sure that something much better isn't already out there (it usually is, especially when one is a beginner).> -----Original Message----- > From: dwinsemius at comcast.net > Sent: Thu, 7 Feb 2013 09:46:02 -0800 > To: johsmi9933 at inbox.com > Subject: Re: [R] appending and merging data frames > > > On Feb 7, 2013, at 9:12 AM, John Smith wrote: > >> Hi Gerrit, >> >> as I said in my original email, I already know both the >> merge and the rbind commands, but I think that many >> standard situations (I have given just a few) require >> rather clumsy ad-hoc programming. So I was wondering >> if there are any packages or existing code that would >> make it easier to handle the diverse append/merge >> tasks that tend to occur rather frequently. >> For at least some of these tasks, other stats >> packages, like Stata do provide out-of-the box >> commands or command options. >> Again, I just want to know if there is already some >> package or code for this out there somewhere, especially >> for appending tasks (automatically taking care of >> missing variables, renaming, data types etc.) >> The RSiteSearch() function is helpful and came up >> with slightly different versions of merge but nothing >> for append. > > The sqldf package provides an interface to popular database drivers. I > do not think your question provides enough specificity to go much > further. You say Stata provides ... something... but you do not really > explain what that something is. My efforts to understand the Stata > documentation for the egen command left me shaking my head in > disbelief at its opacity, and caused me to appreciate further the > efforts of the R developers to make our help system available. The > most productive approach would be to present a simple example in R code. > > -- > David. > >> >> John >> >>> -----Original Message----- >>> From: gerrit.eichner at math.uni-giessen.de >>> Sent: Thu, 7 Feb 2013 16:57:13 +0100 (MET) >>> To: johsmi9933 at inbox.com >>> Subject: Re: [R] appending and merging data frames >>> >>> Hello, John, >>> >>> as a start take a look at >>> >>> ?merge >>> >>> And to (maybe) get a bit overwhelmed at first sight use >>> >>> RSiteSearch( "merge") >>> >>> >>> Hth -- Gerrit >>> >>> On Thu, 7 Feb 2013, John Smith wrote: >>> >>>> I know that the basic approach to append or merge data frames is >>>> using >>>> the rbind and merge commands. >>>> However, if I understand things correctly, for both commands one >>>> needs >>>> to do quite some additional programming to get e.g. behavior as >>>> with the >>>> Stata append and morge commands or to achieve some things which I >>>> think >>>> users need quite frequently. >>>> >>>> E.g. for appending, the data frame must have identical column >>>> names. In >>>> order to rename columns or in order to add columns with missing >>>> values >>>> if necessary, additional programming is needed. >>>> For merging, all matches get combined, so it is not easily >>>> possible to >>>> check for 1:1 or 1:n matches or limit the join to such kind of >>>> matches, >>>> is it? >>>> Those are just examples, there are a number of additional details >>>> that >>>> would be useful to be able to control for merging/appending (maybe >>>> at >>>> the expense of restricting the operation to just data frames). >>>> >>>> So my question is: are there any packages or existing utility >>>> functions >>>> which would provide append and merge functionality at a slightly >>>> higher >>>> (user-friendly) level? >>>> >>>> Although I am quite a noob, I would be prepared to give it a try and >>>> program these myself, but I have the feeling that this must be so >>>> common >>>> that maybe it would mean re-inventing the wheel? >>>> >>>> ____________________________________________________________ >>>> GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at >>>> http://www.inbox.com/smileys >>>> Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google >>>> Talk?? >>>> and most webmails >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >> >> ____________________________________________________________ >> FREE ONLINE PHOTOSHARING - Share your photos online with your >> friends and family! >> Visit http://www.inbox.com/photosharing to find out more! >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > Alameda, CA, USA >____________________________________________________________ FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
One rather extremem limitation in Stata is that there is always only just one "active" or loaded dataset. So all the commands that involve more than one dataset operate on the loaded "master" dataset and one or more other datasets that are stored somewhere. Merge in Stata joins the loaded master dataset with another one on the harddisk. There are many options that influence the behavior, but roughly this is what can be done: When one specifies one of the join options 1:1, 1:n or n:1, an error is shown if the key variables do not uniquely identify observations at the "1" side. (so with regard to your question it is (b)) After the merge, a column is added to the resulting dataset that indicates if the observation was matched, if it occurred in the master dataset only or if it occurred in the other dataset only. Additional options allow to restrict what will be kept in the merge, e.g. only observations that occur in the master or in both and in the same way one can restrict what pairings are allowed without raising an error. Another set of options allows to specify how variables that occur in both datasets but are not key variables are updated: e.g. always from master, always from the other dataset or the master gets updated only if the current value is missing. A merge can also be done purely on observation number (no key variables). Additional options control how value labels (how factors are represented as readable strings) and variable notes are merged.> -----Original Message----- > From: wdunlap at tibco.com > Sent: Thu, 7 Feb 2013 18:18:26 +0000 > To: johsmi9933 at inbox.com, dwinsemius at comcast.net > Subject: RE: [R] appending and merging data frames > >> For merge, when merging dataset A with dataset B on some >> set of key variables, it is possible to specify if the >> join should be done 1:1, 1:n, n:1 or n:m. Stata creates >> a data structure that contains information about which >> observations could or could not be matched and which >> matches did not fit the 1:1/1:n or n:1 join pattern. > > I don't know Stata and am curious about the above. > If you ask for a 1:n join but your key column[s] in your > first input have duplicates, what does Stata do? Does it > (a) use the first of the duplicates to produce an answer > and also return an object describing the problem > (b) refuse to do the merge and return an object describing > the problem > or something else? How is the data structure containing > information about problems in the merge connected to > the output of merge? > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] >> On Behalf >> Of John Smith >> Sent: Thursday, February 07, 2013 9:59 AM >> To: David Winsemius >> Cc: r-help at r-project.org >> Subject: Re: [R] appending and merging data frames >> >> I gave a small number of examples in my first email. >> >> For append, in Stata any variable that is missing in the >> other dataset is automatically inserted with all values set >> to the missing value. In R, one would first have to compare >> the columns in both data frames and generate columns >> apropriately. >> >> For merge, when merging dataset A with dataset B on some >> set of key variables, it is possible to specify if the >> join should be done 1:1, 1:n, n:1 or n:m. Stata creates >> a data structure that contains information about which >> observations could or could not be matched and which >> matches did not fit the 1:1/1:n or n:1 join pattern. >> >> Again, I know that in R it is not hard to program all that >> yourself (while programming in Stata is something different >> altogether), >> I just wondered if for these operations, which probably >> are needed quite often, there are some additional packages >> or existing code that already deals with most of the situations >> one encounters when appending/merging. >> >> The background is that I am making the move from other stats >> packages (including Stata) to R myself and I also want to >> motivate others to do it. >> >> Also, before I go about implementing my own utility functions >> for this, I wanted to make sure that something much better >> isn't already out there (it usually is, especially when one >> is a beginner). >> >>> -----Original Message----- >>> From: dwinsemius at comcast.net >>> Sent: Thu, 7 Feb 2013 09:46:02 -0800 >>> To: johsmi9933 at inbox.com >>> Subject: Re: [R] appending and merging data frames >>> >>> >>> On Feb 7, 2013, at 9:12 AM, John Smith wrote: >>> >>>> Hi Gerrit, >>>> >>>> as I said in my original email, I already know both the >>>> merge and the rbind commands, but I think that many >>>> standard situations (I have given just a few) require >>>> rather clumsy ad-hoc programming. So I was wondering >>>> if there are any packages or existing code that would >>>> make it easier to handle the diverse append/merge >>>> tasks that tend to occur rather frequently. >>>> For at least some of these tasks, other stats >>>> packages, like Stata do provide out-of-the box >>>> commands or command options. >>>> Again, I just want to know if there is already some >>>> package or code for this out there somewhere, especially >>>> for appending tasks (automatically taking care of >>>> missing variables, renaming, data types etc.) >>>> The RSiteSearch() function is helpful and came up >>>> with slightly different versions of merge but nothing >>>> for append. >>> >>> The sqldf package provides an interface to popular database drivers. I >>> do not think your question provides enough specificity to go much >>> further. You say Stata provides ... something... but you do not really >>> explain what that something is. My efforts to understand the Stata >>> documentation for the egen command left me shaking my head in >>> disbelief at its opacity, and caused me to appreciate further the >>> efforts of the R developers to make our help system available. The >>> most productive approach would be to present a simple example in R >>> code. >>> >>> -- >>> David. >>> >>>> >>>> John >>>> >>>>> -----Original Message----- >>>>> From: gerrit.eichner at math.uni-giessen.de >>>>> Sent: Thu, 7 Feb 2013 16:57:13 +0100 (MET) >>>>> To: johsmi9933 at inbox.com >>>>> Subject: Re: [R] appending and merging data frames >>>>> >>>>> Hello, John, >>>>> >>>>> as a start take a look at >>>>> >>>>> ?merge >>>>> >>>>> And to (maybe) get a bit overwhelmed at first sight use >>>>> >>>>> RSiteSearch( "merge") >>>>> >>>>> >>>>> Hth -- Gerrit >>>>> >>>>> On Thu, 7 Feb 2013, John Smith wrote: >>>>> >>>>>> I know that the basic approach to append or merge data frames is >>>>>> using >>>>>> the rbind and merge commands. >>>>>> However, if I understand things correctly, for both commands one >>>>>> needs >>>>>> to do quite some additional programming to get e.g. behavior as >>>>>> with the >>>>>> Stata append and morge commands or to achieve some things which I >>>>>> think >>>>>> users need quite frequently. >>>>>> >>>>>> E.g. for appending, the data frame must have identical column >>>>>> names. In >>>>>> order to rename columns or in order to add columns with missing >>>>>> values >>>>>> if necessary, additional programming is needed. >>>>>> For merging, all matches get combined, so it is not easily >>>>>> possible to >>>>>> check for 1:1 or 1:n matches or limit the join to such kind of >>>>>> matches, >>>>>> is it? >>>>>> Those are just examples, there are a number of additional details >>>>>> that >>>>>> would be useful to be able to control for merging/appending (maybe >>>>>> at >>>>>> the expense of restricting the operation to just data frames). >>>>>> >>>>>> So my question is: are there any packages or existing utility >>>>>> functions >>>>>> which would provide append and merge functionality at a slightly >>>>>> higher >>>>>> (user-friendly) level? >>>>>> >>>>>> Although I am quite a noob, I would be prepared to give it a try and >>>>>> program these myself, but I have the feeling that this must be so >>>>>> common >>>>>> that maybe it would mean re-inventing the wheel? >>>>>> >>>>>> ____________________________________________________________ >>>>>> GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at >>>>>> http://www.inbox.com/smileys >>>>>> Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google >>>>>> Talk?? >>>>>> and most webmails >>>>>> >>>>>> ______________________________________________ >>>>>> R-help at r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> ____________________________________________________________ >>>> FREE ONLINE PHOTOSHARING - Share your photos online with your >>>> friends and family! >>>> Visit http://www.inbox.com/photosharing to find out more! >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> David Winsemius, MD >>> Alameda, CA, USA >>> >> >> ____________________________________________________________ >> FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.____________________________________________________________ FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!