Hey all, We ran into some problems last night with the repos up at github. We initially thought it was a git-submodules bug. We learned that it was something else, but in the process of trying to find the source of the bug, we learned a few things about git-submodules and have decided that it creates an unfortunate set of complications and dependencies for a development effort like ours. To that end, we have simplified. No more submodules. To update, do the following: cd rspec-dev git pull rake git:update See http://github.com/dchelimsky/rspec-dev/wikis/contributingpatches for more info. Cheers, David
Hi David, Is there any chance you could elaborate on the problems please, so that other projects can make an informed decision whether to use submodules or not? Cheers, Jon On Wed, 2008-04-16 at 22:39 -0400, David Chelimsky wrote:> Hey all, > > We ran into some problems last night with the repos up at github. We > initially thought it was a git-submodules bug. We learned that it was > something else, but in the process of trying to find the source of the > bug, we learned a few things about git-submodules and have decided > that it creates an unfortunate set of complications and dependencies > for a development effort like ours. > > To that end, we have simplified. No more submodules. To update, do the > following: > > cd rspec-dev > git pull > rake git:update > > See http://github.com/dchelimsky/rspec-dev/wikis/contributingpatches > for more info. > > Cheers, > David > _______________________________________________ > rspec-users mailing list > rspec-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/rspec-users-- Jonathan Leighton http://jonathanleighton.com/
On Apr 17, 2008, at 5:20 AM, Jonathan Leighton <j at jonathanleighton.com> wrote:> Hi David, > > Is there any chance you could elaborate on the problems please, so > that > other projects can make an informed decision whether to use submodules > or not?There is always a chance :) I''ll be at a computer slightly bigger than my phone in a little while and will follow up.> Cheers, > Jon > > On Wed, 2008-04-16 at 22:39 -0400, David Chelimsky wrote: >> Hey all, >> >> We ran into some problems last night with the repos up at github. We >> initially thought it was a git-submodules bug. We learned that it was >> something else, but in the process of trying to find the source of >> the >> bug, we learned a few things about git-submodules and have decided >> that it creates an unfortunate set of complications and dependencies >> for a development effort like ours. >> >> To that end, we have simplified. No more submodules. To update, do >> the >> following: >> >> cd rspec-dev >> git pull >> rake git:update >> >> See http://github.com/dchelimsky/rspec-dev/wikis/contributingpatches >> for more info. >> >> Cheers, >> David
On Apr 17, 2008, at 7:11 AM, David Chelimsky wrote:> On Apr 17, 2008, at 5:20 AM, Jonathan Leighton > <j at jonathanleighton.com> wrote: > >> Hi David, >> >> Is there any chance you could elaborate on the problems please, so >> that >> other projects can make an informed decision whether to use >> submodules >> or not? > > There is always a chance :)What I learned was that submodules are great for things like consuming plugins in your rails projects, but not so great for a development effort in multiple people are pushing and pulling to multiple repositories with dependencies and no multi-project transaction support. The parent repository depends on specific versions of the subs. As you''re making changes in your local repos, the last thing you do is commit the parent with the references to the new versions of the subs. Every time you make a change to any of the subs you have to commit a change to the parent. These changes are useful as documentation if you''re updating a plugin to the latest release. Not that useful if the log is polluted with these for every commit to every submodule. When you pull, you pull the parent first, and then use git-submodule to pull the correct versions of the subs. The parent is in control of the situation and it somewhat guarantees that you''re getting all the right stuff. This is GREAT for consumers, but problematic for contributors. And even then, if consumers are pulling from a development branch while developers are pushing to it, then consumers might run into problems. Let''s say you''re doing a pull while I''m doing a push. If I push the parent first, and I push it before you do, there is a chance that when you go to pull the subs those versions of the subs might not be there yet. Conversely, if I push the subs first and you grab an old parent, you''ll be pulling old subs. No problem when you''re pulling, but it''s going to create problems when you go to push because you''re that much further down the history. Of course, these problems exist even when you''re dealing with a single repository on a team that believes in frequent commits, continuous integration, etc. And just by virtue of the fact that we have several repos with dependencies means that we''re going to run into conflicts now and then. It just seems that the explicit references from parent to children adds a layer of complexity to this for both consumers and developers. This all make sense?>> Cheers, >> Jon >> >> On Wed, 2008-04-16 at 22:39 -0400, David Chelimsky wrote: >>> Hey all, >>> >>> We ran into some problems last night with the repos up at github. We >>> initially thought it was a git-submodules bug. We learned that it >>> was >>> something else, but in the process of trying to find the source of >>> the >>> bug, we learned a few things about git-submodules and have decided >>> that it creates an unfortunate set of complications and dependencies >>> for a development effort like ours. >>> >>> To that end, we have simplified. No more submodules. To update, do >>> the >>> following: >>> >>> cd rspec-dev >>> git pull >>> rake git:update >>> >>> See http://github.com/dchelimsky/rspec-dev/wikis/contributingpatches >>> for more info. >>> >>> Cheers, >>> David
Great information, David. Sounds like a useful blog post! On Thu, Apr 17, 2008 at 7:49 AM, David Chelimsky <dchelimsky at gmail.com> wrote:> On Apr 17, 2008, at 7:11 AM, David Chelimsky wrote: > > > On Apr 17, 2008, at 5:20 AM, Jonathan Leighton > > <j at jonathanleighton.com> wrote: > > > >> Hi David, > >> > >> Is there any chance you could elaborate on the problems please, so > >> that > >> other projects can make an informed decision whether to use > >> submodules > >> or not? > > > > There is always a chance :) > > What I learned was that submodules are great for things like consuming > plugins in your rails projects, but not so great for a development > effort in multiple people are pushing and pulling to multiple > repositories with dependencies and no multi-project transaction support. > > The parent repository depends on specific versions of the subs. As > you''re making changes in your local repos, the last thing you do is > commit the parent with the references to the new versions of the subs. > Every time you make a change to any of the subs you have to commit a > change to the parent. These changes are useful as documentation if > you''re updating a plugin to the latest release. Not that useful if the > log is polluted with these for every commit to every submodule. > > When you pull, you pull the parent first, and then use git-submodule > to pull the correct versions of the subs. The parent is in control of > the situation and it somewhat guarantees that you''re getting all the > right stuff. This is GREAT for consumers, but problematic for > contributors. And even then, if consumers are pulling from a > development branch while developers are pushing to it, then consumers > might run into problems. > > Let''s say you''re doing a pull while I''m doing a push. If I push the > parent first, and I push it before you do, there is a chance that when > you go to pull the subs those versions of the subs might not be there > yet. Conversely, if I push the subs first and you grab an old parent, > you''ll be pulling old subs. No problem when you''re pulling, but it''s > going to create problems when you go to push because you''re that much > further down the history. > > Of course, these problems exist even when you''re dealing with a single > repository on a team that believes in frequent commits, continuous > integration, etc. And just by virtue of the fact that we have several > repos with dependencies means that we''re going to run into conflicts > now and then. It just seems that the explicit references from parent > to children adds a layer of complexity to this for both consumers and > developers. > > This all make sense? > > > >> Cheers, > >> Jon > >> > >> On Wed, 2008-04-16 at 22:39 -0400, David Chelimsky wrote: > >>> Hey all, > >>> > >>> We ran into some problems last night with the repos up at github. We > >>> initially thought it was a git-submodules bug. We learned that it > >>> was > >>> something else, but in the process of trying to find the source of > >>> the > >>> bug, we learned a few things about git-submodules and have decided > >>> that it creates an unfortunate set of complications and dependencies > >>> for a development effort like ours. > >>> > >>> To that end, we have simplified. No more submodules. To update, do > >>> the > >>> following: > >>> > >>> cd rspec-dev > >>> git pull > >>> rake git:update > >>> > >>> See http://github.com/dchelimsky/rspec-dev/wikis/contributingpatches > >>> for more info. > >>> > >>> Cheers, > >>> David > > _______________________________________________ > rspec-users mailing list > rspec-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/rspec-users >-- Bryan Ray http://www.bryanray.net "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning." -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/rspec-users/attachments/20080417/f8af287a/attachment.html
On Thu, 2008-04-17 at 08:49 -0400, David Chelimsky wrote:> This all make sense?Ok, I have to confess I haven''t been paying that much attention to the way things are or were set out on github, so let me see if I''m fully understanding what you''re saying... Was "rspec" previously split up into several repositories, with a "parent" repository which contained the other repositories as submodules? So you are essentially saying that it is a bad idea to split one single project into a number of pieces and manage that project through submodules? However, you do consider submodules to be a good idea if you are using and wish to track third-party upstream code, for example plugins in a Rails project? Cheers -- Jonathan Leighton http://jonathanleighton.com/
On Thu, Apr 17, 2008 at 9:01 AM, Jonathan Leighton <j at jonathanleighton.com> wrote:> On Thu, 2008-04-17 at 08:49 -0400, David Chelimsky wrote: > > This all make sense? > > Ok, I have to confess I haven''t been paying that much attention to the > way things are or were set out on github, so let me see if I''m fully > understanding what you''re saying... > > Was "rspec" previously split up into several repositories, with a > "parent" repository which contained the other repositories as > submodules? So you are essentially saying that it is a bad idea to split > one single project into a number of pieces and manage that project > through submodules? However, you do consider submodules to be a good > idea if you are using and wish to track third-party upstream code, for > example plugins in a Rails project?RSpec was split into four repos...and it still is actually. But originally the rspec-dev project was a superproject that included the other three as submodules. The problem with submodules is if two people are making changes to the submodules at the same time. Let''s say I work on the rspec submodule, and my final commit is abc123. You work on the rspec submodule as well and your final commit is def456. The superproject tracks the head of each submodule, meaning we each need to commit a reference to the heads of rspec. At some point you pull from my...and the incoming commits say that the head is abc123, but you say def456. merge conflict. Not a big deal, since you have all the latest code, so you can safely point it at def456. But it''s a bit of a hassle because you have to do that every single time. I don''t actually know what all the potential problems are, but beyond just the hassle, it seems very easy for someone to make a mistake, causing a lot of headaches. We still have stuff split up, but we realized there''s no reason for the rspec-dev repo to track the others as submodules. We wrote a rake task to check out all the other repos beneath the rspec-dev dir. It''s basically the exact same setup, but without the submodule tracking. And it avoids any problems with submodules, because it''s all just standard git push/pull/merge stuff. Pat
With mercurial I nearly did a similar thing, working on my own but committing from two different machines. Luckily mercurial gave me a warning that allowed me to make sense of what I was doing. Not sure how this works with git but here goes. 1. I push from laptop1 to my central server. All is good. 2. I push from laptop2 to my central server. Mercurial doesn''t allow this and warns me that the remote repo will have two heads (which is allowed but probably not what I want). I can override this with --force. 3. Oh silly sod - of course I committed from the other machine. 4. I pull from the central server, merge locally and commit, creating a new single head representing the merge 5. I then push the result, meaning there is only ever a single head/tip/edge/whatever in the repository. 6. I realise that this is what I always do with subversion anyway - update, merge, [run tests], commit. It seems git doesn''t protect you from yourself like hg does - which is understandable, it''s designed for and used by scarier people! Could a pull-merge-commit before pushing have avoided this, and should we make that our endorsed way of working? Or am I missing something else about how dscm works? Cheers, Dan On 18/04/2008, Pat Maddox <pergesu at gmail.com> wrote:> > On Thu, Apr 17, 2008 at 9:01 AM, Jonathan Leighton > <j at jonathanleighton.com> wrote: > > On Thu, 2008-04-17 at 08:49 -0400, David Chelimsky wrote: > > > This all make sense? > > > > Ok, I have to confess I haven''t been paying that much attention to the > > way things are or were set out on github, so let me see if I''m fully > > understanding what you''re saying... > > > > Was "rspec" previously split up into several repositories, with a > > "parent" repository which contained the other repositories as > > submodules? So you are essentially saying that it is a bad idea to > split > > one single project into a number of pieces and manage that project > > through submodules? However, you do consider submodules to be a good > > idea if you are using and wish to track third-party upstream code, for > > example plugins in a Rails project? > > > RSpec was split into four repos...and it still is actually. But > originally the rspec-dev project was a superproject that included the > other three as submodules. > > The problem with submodules is if two people are making changes to the > submodules at the same time. > > Let''s say I work on the rspec submodule, and my final commit is > abc123. You work on the rspec submodule as well and your final commit > is def456. The superproject tracks the head of each submodule, > meaning we each need to commit a reference to the heads of rspec. At > some point you pull from my...and the incoming commits say that the > head is abc123, but you say def456. merge conflict. Not a big deal, > since you have all the latest code, so you can safely point it at > def456. But it''s a bit of a hassle because you have to do that every > single time. I don''t actually know what all the potential problems > are, but beyond just the hassle, it seems very easy for someone to > make a mistake, causing a lot of headaches. > > We still have stuff split up, but we realized there''s no reason for > the rspec-dev repo to track the others as submodules. We wrote a rake > task to check out all the other repos beneath the rspec-dev dir. It''s > basically the exact same setup, but without the submodule tracking. > And it avoids any problems with submodules, because it''s all just > standard git push/pull/merge stuff. > > > Pat > > _______________________________________________ > rspec-users mailing list > rspec-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/rspec-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/rspec-users/attachments/20080418/03b4f1c3/attachment-0001.html
On Fri, Apr 18, 2008 at 12:43 AM, Dan North <tastapod at gmail.com> wrote:> Could a pull-merge-commit before pushing have avoided this, and should we > make that our endorsed way of working? Or am I missing something else about > how dscm works?I''m still fuzzy on the details of exactly what happened. I believe it was the result of a "commit -f" which forced the remote repository to rewrite the history when there was branched histories that needed resolving. I believe that pull-merge-commit would work fine, I experimented locally to understand the effects of handling submodule reference merge conflicts. As I mentioned before, it is just a bit of a hassle to have to do. David also pointed out that even without the conflicts, you still have to commit the reference, leading to lots of "updated rspec-rails" type commits in rspec-dev. pull-merge-commit is probably a good workflow (and indeed the only one, because otherwise it''s push-REJECTED-pull-merge-commit). The main advantage to not using submodules is that you''ll only have to merge is when git can''t intelligently merge the repos, rather than every time two repositories have different HEADs. Pat
On Apr 18, 2008, at 3:43 AM, Dan North wrote:> With mercurial I nearly did a similar thing, working on my own but > committing from two different machines. Luckily mercurial gave me a > warning that allowed me to make sense of what I was doing. Not sure > how this works with git but here goes. > > 1. I push from laptop1 to my central server. All is good. > 2. I push from laptop2 to my central server. Mercurial doesn''t allow > this and warns me that the remote repo will have two heads (which is > allowed but probably not what I want). I can override this with -- > force. > 3. Oh silly sod - of course I committed from the other machine.This is actually what happened. Two people were doing work at the same time and one got the warning from the repo and did a "push --force." We''ve all learned a lesson from this and it won''t happen again. In my opinion, even if you are allowed to force a push, the repo should maintain some reachable history somewhere of the commits that you are "hiding." So the public "view" removes those commits but they can be retrieved.> 4. I pull from the central server, merge locally and commit, > creating a new single head representing the merge > 5. I then push the result, meaning there is only ever a single head/ > tip/edge/whatever in the repository. > 6. I realise that this is what I always do with subversion anyway - > update, merge, [run tests], commit. > > It seems git doesn''t protect you from yourself like hg does - which > is understandable, it''s designed for and used by scarier people!It actually does in much the same way. You get a warning, but you can still force the push.> Could a pull-merge-commit before pushing have avoided this, and > should we make that our endorsed way of working? Or am I missing > something else about how dscm works?I do think this should be the way we do things. We have some rake tasks that manage these bits one step at a time. I''ll add one that combines them. You''ll still be able to do them one at a time, and you''ll still need to pull/merge again if central repo warns you on commit. Cheers, David> Cheers, > Dan > > > On 18/04/2008, Pat Maddox <pergesu at gmail.com> wrote: > On Thu, Apr 17, 2008 at 9:01 AM, Jonathan Leighton > <j at jonathanleighton.com> wrote: > > On Thu, 2008-04-17 at 08:49 -0400, David Chelimsky wrote: > > > This all make sense? > > > > Ok, I have to confess I haven''t been paying that much attention > to the > > way things are or were set out on github, so let me see if I''m > fully > > understanding what you''re saying... > > > > Was "rspec" previously split up into several repositories, with a > > "parent" repository which contained the other repositories as > > submodules? So you are essentially saying that it is a bad idea > to split > > one single project into a number of pieces and manage that project > > through submodules? However, you do consider submodules to be a > good > > idea if you are using and wish to track third-party upstream > code, for > > example plugins in a Rails project? > > > RSpec was split into four repos...and it still is actually. But > originally the rspec-dev project was a superproject that included the > other three as submodules. > > The problem with submodules is if two people are making changes to the > submodules at the same time. > > Let''s say I work on the rspec submodule, and my final commit is > abc123. You work on the rspec submodule as well and your final commit > is def456. The superproject tracks the head of each submodule, > meaning we each need to commit a reference to the heads of rspec. At > some point you pull from my...and the incoming commits say that the > head is abc123, but you say def456. merge conflict. Not a big deal, > since you have all the latest code, so you can safely point it at > def456. But it''s a bit of a hassle because you have to do that every > single time. I don''t actually know what all the potential problems > are, but beyond just the hassle, it seems very easy for someone to > make a mistake, causing a lot of headaches. > > We still have stuff split up, but we realized there''s no reason for > the rspec-dev repo to track the others as submodules. We wrote a rake > task to check out all the other repos beneath the rspec-dev dir. It''s > basically the exact same setup, but without the submodule tracking. > And it avoids any problems with submodules, because it''s all just > standard git push/pull/merge stuff. > > > Pat > > _______________________________________________ > rspec-users mailing list > rspec-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/rspec-users > > _______________________________________________ > rspec-users mailing list > rspec-users at rubyforge.org > http://rubyforge.org/mailman/listinfo/rspec-users-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/rspec-users/attachments/20080418/9874baca/attachment-0001.html
El 18/4/2008, a las 14:16, David Chelimsky <dchelimsky at gmail.com> escribi?:> On Apr 18, 2008, at 3:43 AM, Dan North wrote: > >> With mercurial I nearly did a similar thing, working on my own but >> committing from two different machines. Luckily mercurial gave me a >> warning that allowed me to make sense of what I was doing. Not sure >> how this works with git but here goes. >> >> 1. I push from laptop1 to my central server. All is good. >> 2. I push from laptop2 to my central server. Mercurial doesn''t allow >> this and warns me that the remote repo will have two heads (which is >> allowed but probably not what I want). I can override this with -- >> force. >> 3. Oh silly sod - of course I committed from the other machine. > > This is actually what happened. Two people were doing work at the same > time and one got the warning from the repo and did a "push --force." > We''ve all learned a lesson from this and it won''t happen again. > > In my opinion, even if you are allowed to force a push, the repo > should maintain some reachable history somewhere of the commits that > you are "hiding." So the public "view" removes those commits but they > can be retrieved.This is a potentially painful lesson that I think we all learn once we start using Git. If you''re lucky, you learn it on an unimportant repo or one which you fully control and can dig around in. By way of counter-opinion, the ability to force a push like that may be considered a feature. Example case: you accidentally include confidential company files when you push to a remote repo; if you fix the mistake by altering the history in your local repo and then trying pushing again Git will correctly warn you that this won''t be a fast- forward merge, but the ability to force the push anyway allows you to "remove" the unwanted history from the remote repo. Note that the history should still be there, it''s just that it won''t be "reachable" when someone clones the public repo. Does GitHub provide shell access so that you can get into the remote repo and run stuff like "git fsck" on it? The missing commits should still be in the object database, unless the GitHub crew are doing overly aggressive automatic repacking and pruning on the repos (or unless there is something about the way bare repos work which I don''t know; quite possible!). In the case that GitHub doesn''t provide shell access then it''s really like pushing into a blackhole drop box, with no way to do any "archaeology" on the object database.>> Could a pull-merge-commit before pushing have avoided this, and >> should we make that our endorsed way of working? Or am I missing >> something else about how dscm works? > > I do think this should be the way we do things. We have some rake > tasks that manage these bits one step at a time. I''ll add one that > combines them. You''ll still be able to do them one at a time, and > you''ll still need to pull/merge again if central repo warns you on > commit.If you get into the habit of rebasing before committing rather than merging you''ll get a much nicer history. It will basically look linear: A--B--C--D--E--F--G--etc Rather than full of crisscrosses from multiple small merges: E''--F'' / \ A--B--C--D--E--F--G--H--I--J \ / \ \ / B''--C'' E''''--F''''--G''--H'' Sure, Git can easily do the merges but the resulting history is harder to analyse and less "bisectable" (with "git bisect"). Cheers, Wincent