diff mbox series

[RFC] Rough draft document on merging and rebasing

Message ID 20190530135317.3c8d0d7b@lwn.net
State New
Headers show
Series [RFC] Rough draft document on merging and rebasing | expand

Commit Message

Jonathan Corbet May 30, 2019, 7:53 p.m. UTC
This is a first attempt at following through on last month's discussion
about common merging and rebasing errors.  The hope here is to document
existing best practices rather than trying to define new ones.  I've
certainly failed somewhere along the way; please set me straight and I'll
try to do better next time.

Thanks,

jon

-------------
docs: Add a document on repository management

Every merge window seems to involve at least one episode where subsystem
maintainers don't manage their trees as Linus would like.  Document the
expectations so that at least he has something to point people to.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>

---
 Documentation/maintainer/index.rst        |   1 +
 Documentation/maintainer/repo-hygiene.rst | 195 ++++++++++++++++++++++
 2 files changed, 196 insertions(+)
 create mode 100644 Documentation/maintainer/repo-hygiene.rst

-- 
2.21.0

Comments

Randy Dunlap May 31, 2019, 12:45 a.m. UTC | #1
On 5/30/19 12:53 PM, Jonathan Corbet wrote:
> +  git merge v5.2-rc1^0


That line is presented in my email client (Thunderbird) as

     git merge v5.2-rc1{superscript 0}

Could you escape/quote it to prevent that?

> +

> +The "^0" will cause Git to do a fast-forward merge (which should be

> +possible in this situation), thus avoiding the addition of a spurious merge

> +commit.



-- 
~Randy
Linus Torvalds May 31, 2019, 7:49 p.m. UTC | #2
On Thu, May 30, 2019 at 12:53 PM Jonathan Corbet <corbet@lwn.net> wrote:
>

> This is a first attempt at following through on last month's discussion

> about common merging and rebasing errors.


Looks good to me,

                Linus
David Rientjes May 31, 2019, 9:14 p.m. UTC | #3
On Thu, 30 May 2019, Jonathan Corbet wrote:

> docs: Add a document on repository management

> 

> Every merge window seems to involve at least one episode where subsystem

> maintainers don't manage their trees as Linus would like.  Document the

> expectations so that at least he has something to point people to.

> 

> Signed-off-by: Jonathan Corbet <corbet@lwn.net>


We're in the process of defining the pros and cons of merging vs rebasing 
as part of our workflows internally and this is a great doc that does that 
with less words and more clarity than I did it, so very happy to add

Acked-by: David Rientjes <rientjes@google.com>
Jonathan Corbet May 31, 2019, 11:36 p.m. UTC | #4
On Thu, 30 May 2019 17:45:23 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:

> On 5/30/19 12:53 PM, Jonathan Corbet wrote:

> > +  git merge v5.2-rc1^0  

> 

> That line is presented in my email client (Thunderbird) as

> 

>      git merge v5.2-rc1{superscript 0}

> 

> Could you escape/quote it to prevent that?


So I'm a wee bit confused.  That's a literal string that one needs to
type to obtain the needed effect; if thunderbird is doing weird things
with it, I think that the problem does not lie with the document...?  What
change would you have me make here?

Thanks,

jon
Randy Dunlap May 31, 2019, 11:52 p.m. UTC | #5
On 5/31/19 4:36 PM, Jonathan Corbet wrote:
> On Thu, 30 May 2019 17:45:23 -0700

> Randy Dunlap <rdunlap@infradead.org> wrote:

> 

>> On 5/30/19 12:53 PM, Jonathan Corbet wrote:

>>> +  git merge v5.2-rc1^0  

>>

>> That line is presented in my email client (Thunderbird) as

>>

>>      git merge v5.2-rc1{superscript 0}

>>

>> Could you escape/quote it to prevent that?

> 

> So I'm a wee bit confused.  That's a literal string that one needs to

> type to obtain the needed effect; if thunderbird is doing weird things

> with it, I think that the problem does not lie with the document...?  What

> change would you have me make here?


I dunno.  I just won't depend on my email client.
I'll read it some other way (like a text editor).

cheers.
-- 
~Randy
Theodore Ts'o June 1, 2019, 3:42 p.m. UTC | #6
On Thu, May 30, 2019 at 01:53:17PM -0600, Jonathan Corbet wrote:
> +Rebasing

> +========

> +

> +"Rebasing" is the process of changing the history of a series of commits

> +within a repository.  At its simplest, a rebase could change the starting

> +point of a patch series from one point to another.  Other uses include

> +fixing (or deleting) broken commits, adding tags to commits, or changing

> +the order in which commits are applied.  Used properly, rebasing can yield

> +a cleaner and clearer development history; used improperly, it can obscure

> +that history and introduce bugs.


Rebasing is being used in two senses here.  The first is where the
diffs don't change (much), but the starting point of the changes is
being changed.  The second is where a broken commit is dropped, adding
a signed-off-by, etc.

Both have the property that they can used for good or ill, but the
details when this is an issue can change a bit.  More below...

> +There are a few rules of thumb that can help developers to avoid the worst

> +perils of rebasing:

> +

> + - History that has been exposed to the world beyond your private system

> +   should not be rebased.  Others may have pulled a copy of your tree and

> +   built on it; rebasing your tree will create pain for them.  If work is

> +   in need of rebasing, that is usually a sign that it is not yet ready to

> +   be committed to a public repository.


That seems to be a bit too categorical.  It's been recommended, and
some people do, push patches to a branch so the zero-day bot will pick
it up and test for potential problems.  And broken commits *do* get
dropped from candidate stable kernel releases.  And, of course,
there's linux-next, which is constantly getting rebased.

And there have been people who have pushed out RFC patche series onto
a git branch, and publicized it on LKML for the convenience of
reviewers.  (Perhaps because there is a patch which is so big it
exceeds the LKML size restrictions.)

I think it's more about whether people know that a branch is
considered unstable from a historical perspective or not.  No one
builds on top of linux-next because, well, that would be silly.

Finally, I'm bit concerned about anything which states absolutes,
because there are people who tend to be real stickler for the rules,
and if they see something stated in absolute terms, they fail to
understand that there are exceptions that are well understood, and in
use for years before the existence of the document which is trying to
codify best practices.

> + - Realize the rebasing a patch series changes the environment in which it

> +   was developed and, likely, invalidates much of the testing that was

> +   done.  A rebased patch series should, as a general rule, be treated like

> +   new code and retested from the beginning.


In this paragraph, "rebasing" is being used in the change the base
commit on top of which a series of changes were based upon.  And it's
what most people think of when they see the word "rebase".

However "git rebase" is also used to rewrite git history when dropping
a bad commit, or fixing things so the branch is bisectable, etc.  But
in the definitions above, the word "rebase" was defined to include
both "changing the base of a patch stack", and "rewriting git
history".  I wonder if that helps more than it hinders understanding.

At least in my mind, "rebasing" is much more often the wrong thing,
where as "rewriting git history" for a leaf repository can be more
often justifable.  And of course, if someone is working on a feature
for which the review and development cycle takes several kernel
releases, I'd claim that "rebasing" in that case always makes sense,
even if they *have* exposed their patch series via git for review /
commenting purposes.

Regards,

						- Ted
Geert Uytterhoeven June 4, 2019, 10:29 a.m. UTC | #7
Hi Jon,

On Thu, May 30, 2019 at 9:54 PM Jonathan Corbet <corbet@lwn.net> wrote:
> This is a first attempt at following through on last month's discussion

> about common merging and rebasing errors.  The hope here is to document

> existing best practices rather than trying to define new ones.  I've

> certainly failed somewhere along the way; please set me straight and I'll

> try to do better next time.

>

> Thanks,

>

> jon

>

> -------------

> docs: Add a document on repository management

>

> Every merge window seems to involve at least one episode where subsystem

> maintainers don't manage their trees as Linus would like.  Document the

> expectations so that at least he has something to point people to.

>

> Signed-off-by: Jonathan Corbet <corbet@lwn.net>


Thanks!

> --- /dev/null

> +++ b/Documentation/maintainer/repo-hygiene.rst


> +One thing to be aware of in general is that, unlike many other projects,

> +the kernel community is not scared by seeing merge commits in its

> +development history.  Indeed, given the scale of the project, avoiding

> +merges would be nearly impossible.  Some problems encountered by

> +maintainers results from a desire to avoid merges, while others come from


result

> +merging a little too often.


[...]

> + - Realize the rebasing a patch series changes the environment in which it


Realize that

> +   was developed and, likely, invalidates much of the testing that was

> +   done.  A rebased patch series should, as a general rule, be treated like

> +   new code and retested from the beginning.


> +Finally

> +=======

> +

> +It is relatively common to merge with the mainline toward the beginning of

> +the development cycle in order to pick up changes and fixes done elsewhere

> +in the tree.  As always, such a merge should pick a well-known release

> +point rather than some random spot.  If your upstream-bound branch has

> +emptied entirely into the mainline during the merge window, you can pull it

> +forward with a command like::

> +

> +  git merge v5.2-rc1^0

> +

> +The "^0" will cause Git to do a fast-forward merge (which should be

> +possible in this situation), thus avoiding the addition of a spurious merge

> +commit.


I usually use

     git rebase v5.2-rc1 <mybranch>

_after_ verifying everything has been merged, i.e.

    git cherry -v v5.2-rc1 <mybranch>

did not give any output.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Jonathan Corbet June 4, 2019, 7:08 p.m. UTC | #8
On Sat, 1 Jun 2019 11:42:48 -0400
"Theodore Ts'o" <tytso@mit.edu> wrote:

> Finally, I'm bit concerned about anything which states absolutes,

> because there are people who tend to be real stickler for the rules,

> and if they see something stated in absolute terms, they fail to

> understand that there are exceptions that are well understood, and in

> use for years before the existence of the document which is trying to

> codify best practices.


Hence the "there are exceptions" text at the bottom of the document :)

Anyway, I'll rework it to try to take your comments into account.  Maybe
we should consistently say "rebasing" for changing the parent commit of a
patch set, and "history modification" for the other tricks...?

Thanks for taking a look,

jon
Geert Uytterhoeven June 4, 2019, 7:32 p.m. UTC | #9
Hi Jon,

On Tue, Jun 4, 2019 at 9:09 PM Jonathan Corbet <corbet@lwn.net> wrote:
> On Sat, 1 Jun 2019 11:42:48 -0400

> "Theodore Ts'o" <tytso@mit.edu> wrote:

> > Finally, I'm bit concerned about anything which states absolutes,

> > because there are people who tend to be real stickler for the rules,

> > and if they see something stated in absolute terms, they fail to

> > understand that there are exceptions that are well understood, and in

> > use for years before the existence of the document which is trying to

> > codify best practices.

>

> Hence the "there are exceptions" text at the bottom of the document :)

>

> Anyway, I'll rework it to try to take your comments into account.  Maybe

> we should consistently say "rebasing" for changing the parent commit of a

> patch set, and "history modification" for the other tricks...?


Or just "reworking a branch" for the other tricks?

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Theodore Ts'o June 4, 2019, 8:43 p.m. UTC | #10
On Tue, Jun 04, 2019 at 01:08:37PM -0600, Jonathan Corbet wrote:
> On Sat, 1 Jun 2019 11:42:48 -0400

> "Theodore Ts'o" <tytso@mit.edu> wrote:

> 

> > Finally, I'm bit concerned about anything which states absolutes,

> > because there are people who tend to be real stickler for the rules,

> > and if they see something stated in absolute terms, they fail to

> > understand that there are exceptions that are well understood, and in

> > use for years before the existence of the document which is trying to

> > codify best practices.

> 

> Hence the "there are exceptions" text at the bottom of the document :)

> 

> Anyway, I'll rework it to try to take your comments into account.  Maybe

> we should consistently say "rebasing" for changing the parent commit of a

> patch set, and "history modification" for the other tricks...?


Those names sound good to me.   Thanks!!

							- Ted
diff mbox series

Patch

diff --git a/Documentation/maintainer/index.rst b/Documentation/maintainer/index.rst
index 2a14916930cb..48c1d56253d8 100644
--- a/Documentation/maintainer/index.rst
+++ b/Documentation/maintainer/index.rst
@@ -10,5 +10,6 @@  additions to this manual.
    :maxdepth: 2
 
    configure-git
+   repo-hygiene
    pull-requests
 
diff --git a/Documentation/maintainer/repo-hygiene.rst b/Documentation/maintainer/repo-hygiene.rst
new file mode 100644
index 000000000000..0a738ea51d65
--- /dev/null
+++ b/Documentation/maintainer/repo-hygiene.rst
@@ -0,0 +1,195 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+
+========================================
+Repository hygiene: rebasing and merging
+========================================
+
+Maintaining a subsystem, as a general rule, requires a familiarity with the
+Git source-code management system.  Git is a powerful tool with a lot of
+features; as is often the case with such tools, there are right and wrong
+ways to use those features.  This document looks in particular at the use
+of rebasing and merging.  Maintainers often get in trouble when they use
+those tools incorrectly, but avoiding problems is not actually all that
+hard.
+
+One thing to be aware of in general is that, unlike many other projects,
+the kernel community is not scared by seeing merge commits in its
+development history.  Indeed, given the scale of the project, avoiding
+merges would be nearly impossible.  Some problems encountered by
+maintainers results from a desire to avoid merges, while others come from
+merging a little too often.
+
+Rebasing
+========
+
+"Rebasing" is the process of changing the history of a series of commits
+within a repository.  At its simplest, a rebase could change the starting
+point of a patch series from one point to another.  Other uses include
+fixing (or deleting) broken commits, adding tags to commits, or changing
+the order in which commits are applied.  Used properly, rebasing can yield
+a cleaner and clearer development history; used improperly, it can obscure
+that history and introduce bugs.
+
+There are a few rules of thumb that can help developers to avoid the worst
+perils of rebasing:
+
+ - History that has been exposed to the world beyond your private system
+   should not be rebased.  Others may have pulled a copy of your tree and
+   built on it; rebasing your tree will create pain for them.  If work is
+   in need of rebasing, that is usually a sign that it is not yet ready to
+   be committed to a public repository.
+
+ - Do not rebase a branch that contains history created by others.  If you
+   have pulled changes from another developer's repository, you are now a
+   custodian of their history.  You should not change it.
+
+ - Do not rebase without a good reason to do so.  Just being on a newer
+   base or avoiding a merge with an upstream repository is not generally a
+   good reason.
+
+ - If you must rebase a repository, do not pick some random kernel commit
+   as the new base.  The kernel is often in a relatively unstable state
+   between release points; basing development on one of those points
+   increases the chances of running into surprising bugs.  When a patch
+   series must move to a new base, pick a stable point (such as one of
+   the -rc releases) to move to.
+
+ - Realize the rebasing a patch series changes the environment in which it
+   was developed and, likely, invalidates much of the testing that was
+   done.  A rebased patch series should, as a general rule, be treated like
+   new code and retested from the beginning.
+
+A frequent cause of merge-window trouble is when Linus is presented with a
+patch series that has clearly been rebased, often to a random commit,
+shortly before the pull request was sent.  The chances of such a series
+having been adequately tested are relatively low - as are the chances of
+the pull request being acted upon.
+
+If, instead, rebasing is limited to private trees, commits are based on a
+well-known starting point, and they are well tested, the potential for
+trouble is low.
+
+Merging
+=======
+
+Merging is a common operation in the kernel development process; the 5.1
+development cycle included 1,126 merge commits - nearly 9% of the total.
+Kernel work is accumulated in over 100 different subsystem trees, each of
+which may contain multiple topic branches; each branch is usually developed
+independently of the others.  So naturally, at least merge will be required
+before any given branch finds its way into an upstream repository.
+
+Many projects require that branches in pull requests be based on the
+current trunk so that no merge commits appear in the history.  The kernel
+is not such a project; any rebasing of branches to avoid merges will, as
+described above, lead to certain trouble.
+
+Subsystem maintainers find themselves having to do two types of merges:
+from lower-level subsystem trees and from others, either sibling trees or
+the mainline.  The best practices to follow differ in those two situations.
+
+Merging from lower-level trees
+------------------------------
+
+Larger subsystems tend to have multiple levels of maintainers, with the
+lower-level maintainers sending pull requests to the higher levels.  Acting
+on such a pull request will almost certainly generate a merge commit; that
+is as it should be.  In fact, subsystem maintainers may want to use
+the --no-ff flag to force the addition of a merge commit in the rare cases
+where one would not normally be created so that the reasons for the merge
+can be recorded.  The changelog for the merge should, for any kind of
+merge, say *why* the merge is being done.  For a lower-level tree, "why" is
+usually a summary of the changes that will come with that pull.
+
+Maintainers at all levels should be using signed tags on their pull
+requests, and upstream maintainers should verify the tags when pulling
+branches.  Failure to do so threatens the security of the development
+process as a whole.
+
+As per the rules outlined above, once you have merged somebody else's
+history into your tree, you cannot rebase that branch, even if you
+otherwise would be able to.
+
+Merging from sibling or upstream trees
+--------------------------------------
+
+While merges from downstream are common and unremarkable, merges from other
+trees tend to be a red flag when it comes time to push a branch upstream.
+Such merges need to be carefully thought about and well justified, or
+there's a good chance that a subsequent pull request will be rejected.
+
+It is natural to want to merge the master branch into a repository; it can
+help to make sure that there are no conflicts with parallel development and
+generally gives a warm, fuzzy feeling of being up-to-date.  But this
+temptation should be avoided almost all of the time.
+
+Why is that?  Merges with upstream will muddy the development history of
+your own branch.  They will significantly increase your chances of
+encountering bugs from elsewhere in the community and make it hard to
+ensure that the work you are managing is stable and ready for upstream.
+Frequent merges can also obscure problems with the development process in
+your tree; they can hide interactions with other trees that should not be
+happening (often) in a well-managed branch.
+
+One of the most frequent causes of merge-related trouble is when a
+maintainer merges with the upstream in order to resolve merge conflicts
+before sending a pull request.  Again, this temptation is easy enough to
+understand, but it should absolutely be avoided.  This is especially true
+for the final pull request: Linus is adamant that he would much rather see
+merge conflicts than unnecessary back merges.  Seeing the conflicts lets
+him know where potential problem areas are.  He does a lot of merges (382
+in the 5.1 development cycle) and has gotten quite good at conflict
+resolution - often better than the developers involved.
+
+So what should a maintainer do when there is a conflict between their
+subsystem branch and the mainline?  The most important step is to warn
+Linus in the pull request that the conflict will happen; if nothing else,
+that demonstrates an awareness of how your branch fits into the whole.  For
+especially difficult conflicts, create and push a *separate* branch to show
+how you would resolve things.  Mention that branch in your pull request,
+but the pull request itself should be for the unmerged branch.
+
+Even in the absence of known conflicts, doing a test merge before sending a
+pull request is a good idea.  It may alert you to problems that you somehow
+didn't see from linux-next and helps to understand exactly what you are
+asking upstream to do.
+
+Another reason for doing merges of upstream or another subsystem tree is to
+resolve dependencies.  These dependency issues do happen at times, and
+sometimes a cross-merge with another tree is the best way to resolve them;
+as always, in such situations, the merge commit should explain why the
+merge has been done.  Take a moment to do it right; people will read those
+changelogs.
+
+Often, though, dependency issues indicate that a change of approach is
+needed.  Merging another subsystem tree to resolve a dependency risks
+bringing in other bugs.  If that subsystem tree fails to be pulled
+upstream, whatever problems it had will block the merging of your tree as
+well.  Possible alternatives include agreeing with the maintainer to carry
+both sets of changes in one of the trees or creating a special branch
+dedicated to the dependent commits.  If the dependency is related to major
+infrastructural changes, the right solution might be to hold the dependent
+commits for one development cycle so that those changes have time to
+stabilize in the mainline.
+
+Finally
+=======
+
+It is relatively common to merge with the mainline toward the beginning of
+the development cycle in order to pick up changes and fixes done elsewhere
+in the tree.  As always, such a merge should pick a well-known release
+point rather than some random spot.  If your upstream-bound branch has
+emptied entirely into the mainline during the merge window, you can pull it
+forward with a command like::
+
+  git merge v5.2-rc1^0
+
+The "^0" will cause Git to do a fast-forward merge (which should be
+possible in this situation), thus avoiding the addition of a spurious merge
+commit.
+
+The guidelines laid out above are just that: guidelines.  There will always
+be situations that call out for a different solution, and these guidelines
+should not prevent developers from doing the right thing when the need
+arises.  But one should always think about whether the need has truly
+arisen and be prepared to explain why something abnormal needs to be done.