Add post about stacked PRs (#5 )

2025-10-21 15:48:40 +00:00 · 2023-10-18 21:10:33 +01:00
2 changed files with 53 additions and 58 deletions
--- a/hugo/content/posts/2023-10-08-guix-bootstrap.md
+++ b/hugo/content/posts/2023-10-08-guix-bootstrap.md
@@ -1,58 +0,0 @@
---
-lastmod: "2023-10-08T11:43:00.0000000+01:00"
-author: patrick
-categories:
- programming
-date: "2023-10-18T11:43:00.0000000+01:00"
-title: The GUIX bootstrap
-summary: "Notes for a talk I gave at work on the GUIX bootstrap."
---
-
-This is simply an outline, with no actual content.
-
-# Why bootstrap?
-
-* Auditing and security
-  * Seminal paper: [Reflections on Trusting Trust](https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf)
-
-# How is a system normally installed?
-
-* Massive binary blob (250MB of gcc, binutils etc) to start a bootstrap
-* Or an even massiver blob (Windows installer)
-
-# Necessary tools
-
-* A C compiler e.g. [TCC](https://bellard.org/tcc/)
-* Text manipulation e.g. `sed`
-
-# GUIX Full-Source Bootstrap
-
-[Official docs](https://guix.gnu.org/en/manual/devel/en/html_node/Full_002dSource-Bootstrap.html).
-
-## Stage-0
-
-[Main repo](https://github.com/oriansj/stage0-posix); [kaemfile](https://github.com/oriansj/stage0-posix-x86/blob/e86bf7d304bae5ce5ccc88454bb60cf0837e941f/mescc-tools-mini-kaem.kaem#L97).
-
-The kernel is trusted; eventually they would like to make no syscalls at all and run on bare metal.
-(GNU Guix bootstrap kernel is still 25MB.)
-
-* Base: a [tiny self-hosted assembler](https://github.com/oriansj/bootstrap-seeds/blob/master/POSIX/x86/hex0_x86.hex0) of 357 bytes, incredibly strict language, human-verifiable
-* [hex1](https://github.com/oriansj/stage0-posix-x86/blob/e86bf7d304bae5ce5ccc88454bb60cf0837e941f/hex1_x86.hex0): a slightly more powerful assembler, better hex parsing, single-character labels and some jumps
-* [hex2](https://github.com/oriansj/stage0-posix-x86/blob/e86bf7d304bae5ce5ccc88454bb60cf0837e941f/hex2_x86.hex1): an assembler with labels and absolute memory addresses
-* [catm](https://github.com/oriansj/stage0-posix-x86/blob/e86bf7d304bae5ce5ccc88454bb60cf0837e941f/catm_x86.hex2): an implementation of `cat`
-* [M0](https://github.com/oriansj/stage0-posix-x86/blob/e86bf7d304bae5ce5ccc88454bb60cf0837e941f/M0_x86.hex2): a C-style preprocessor and a bona-fide assembler which recognises a language you might recognise
-* [cc-x86](https://github.com/oriansj/stage0-posix-x86/blob/e86bf7d304bae5ce5ccc88454bb60cf0837e941f/cc_x86.M1): a C compiler! (only a subset of C though)
-* M2-Planet: a slightly better C compiler
-* [blood-elf-0](https://github.com/oriansj/mescc-tools/blob/master/blood-elf.c): writes [DWARF](https://en.wikipedia.org/wiki/DWARF) stubs for debug tables (but no actual implementation of those stubs)
-* M1: a better C compiler which is debuggable and implements some optimisations (TODO: example?)
-* Rebuild earlier inputs now that we have an optimising compiler
-* blood-elf again: provides implementations for the stubs `blood-elf-0` wrote (TODO: is that true? Understand the nature of the stubs and implementation)
-* A variety of nice things like `sha256sum`, `mkdir`, `untar`, primitive `cp`, `chmod`
-* [kaem](https://github.com/oriansj/kaem/tree/master): a tiny build system (anagram of `make`)
-
-## GNU Mes
-
-[Mes](https://www.gnu.org/software/mes/) is an intertwined pair of a C compiler and Scheme interpreter; its source is mirrored [on GitHub](https://github.com/gnu-mirror-unofficial/mes).
-It can be built with `kaem`, and the resulting C compiler can build [TCC](https://bellard.org/tcc/), which can then build early GCC, which can bootstrap later GCCs and hence support for other languages and architectures.
-
-As of a few years ago, they were experimenting with using the Mes Scheme compiler to compile [Gash](https://savannah.nongnu.org/projects/gash), an interpreted Scheme POSIX shell which could replace some of the binary blob.
--- a/hugo/content/posts/2023-10-18-squash-stacked-prs.md
+++ b/hugo/content/posts/2023-10-18-squash-stacked-prs.md
@@ -0,0 +1,53 @@
+---
+lastmod: "2023-10-18T20:36:00.0000000+01:00"
+author: patrick
+categories:
+- programming
+date: "2023-10-18T20:36:00.0000000+01:00"
+title: Squashed stacked PRs workflow
+summary: "How to handle stacked pull requests in a repository which requires squashing all history on merge."
+---
+
+Recall that the "stacked PRs" Git workflow deals with a set of changes (say `C` depends on `B`, which itself depends on `A`), each dependent on the last, and all going into some base branch which I'll call `main` for the sake of this note.
+The workflow represents this set of changes as a collection of pull requests: `A` into `main`, `B` into `A`, and `C` into `B`.
+
+# Problem statement
+
+The stacked PRs workflow is fine as long as we *merge* each pull request into its target, because then Git's standard merge algorithms are "sufficiently associative" that the sequence of merges tends to do the right thing.
+(Of course, Git's standard merge algorithms are not associative; see [the Pijul manual](https://pijul.org/manual/why_pijul.html) for concrete examples and discussion of why this is inherently true.)
+
+But if we *squash* each pull request into its target, then the only way we can merge the entire stack is to merge `C` into `B`, then `B+C` into `A`, then `A+B+C` into `main`.
+Any other order, and the rewrite of history in the squash causes the computed merge base of our source and target to be very different from what we actually know it is, and this almost always causes the merge to become wildly conflicted.
+
+For example, if we squash-merge `A` into `main` (which for the sake of argument should be a fast-forward merge, except that we've squashed), then we construct a new commit `squash(A)` whose tree is the same as `A` and which has the parent `main`; then we set `main` to point to `squash(A)`.
+The merge base of `B` with `squash(A)` would be simply `A` if we hadn't squashed, but `A` is no longer in the history of `squash(A)`, so the merge base is actually `main^` (i.e. `main` as it was before the squash-merge); and the merge of `squash(A)` and `B` with a base of `main^` is liable to be gruesome.
+So we can't cleanly merge `B` into `main = squash(A)`.
+
+The clean problem statement, then, is:
+
+> How do I squash-merge the stack in the order `A -> main`, `B -> main + A`, `C -> main + A + B`, without having to resolve conflicts at each step?
+
+# Solution
+
+Since we're squashing into `main` anyway, we should feel free to make a complete mess of history on our branches.
+
+* Squash-merge `A` into `main`.
+* Merge into `B` the `A` commit that's immediately before the squash into `main`. (This should be clean unless you made changes to `A` which genuinely conflicted with `B`, so this is work you should really have done in preparation for the review of `B` anyway.)
+* Fetch `origin/main` locally, and merge into `B` the `main` commit that's immediately before the squash of `A`. (This should be clean if you've been hygienically keeping your branches up to date with `main` by merging `main -> A`, `A -> B`, `B -> C`. If it's not clean, again this is work that you would have to do anyway even in a non-squashing world.)
+
+Now `B` is up to date both with `main` and `A` as of immediately before `A` was squashed into `main`, so it should be the case that merging `main + A` into `B` would be a no-op: it should not change the tree of `B`.
+However, we aren't merging `main + A`.
+We're instead merging the squashed `main + squash(A)` for some single commit `squash(A)` which Git thinks is completely unrelated to `A`, but which in fact has the same *tree* as `A`.
+
+So the last step is:
+
+* Merge the squashed `main + squash(A)` commit into `B` with the `ours` strategy: `git merge $COMMIT_HASH --strategy=ours`. That is: since we know `B`'s got the right *tree*, but its history is woefully incompatible with `main + squash(A)`'s history, we just do a dummy no-op merge to make their histories compatible again.
+
+(Then merge this back up the stack, by merging the new `B` into `C`.)
+
+## The state after performing this procedure
+
+* `A` has been squashed into `main`.
+* `B`'s tree is as if `A` were merged into `main` and then the resulting `main + A` were merged into `B`.
+* `B`'s history contains the squashed `main + squash(A)`, so subsequent merges of `main` into `B` or `B` into `main` will be clean.
+* `B`'s history looks a bit mad, but we shrug and move on.