It's common practice for tools to cache by commit. Tree hashes are a fundamental but oft-untaught part of git. Many of you might have a passing familiarity with cases where the [*empty* tree hash](https://stackoverflow.com/questions/9765453/is-gits-semi-secret-empty-tree-object-reliable-and-why-is-there-not-a-symbolic) is useful. Tree hashes are associated with a particular *content* of a repo. Each folder has its own tree hash, and they nest. If you want to cache the evaluation or build outputs of something that varies over the repo contents and you use the commit hash: ```mmd graph TD subgraph "Before reword" A1["add tests
commit: b9cfea tree: 8a3f91"] B1["fix path
commit: e7cf3a tree: 2b1c07"] C1["rpath fix
commit: 235ea6 tree: f4e882"] D1["bake LLVM
commit: 478191 tree: c73f15"] A1 --> B1 --> C1 --> D1 end subgraph "After rewording commit 2" A2["add tests
commit: b9cfea tree: 8a3f91"] B2["fix /opt/rocm path
commit: fa83b1 tree: 2b1c07"] C2["rpath fix
commit: 91de44 tree: f4e882"] D2["bake LLVM
commit: 3f7721 tree: c73f15"] A2 --> B2 --> C2 --> D2 end style B2 fill:#d94444,color:#fff style C2 fill:#d94444,color:#fff style D2 fill:#d94444,color:#fff ``` You have to eat missing cache entries for all unchanged commits! If we use tree hashes, squashing, rewording, or reordering can't invalidate caches for unchanged content. Reordering two commits changes their intermediate tree hashes but once both have landed the tree is identical; everything subsequent is a cache hit again. We can even use the tree hash of a subdir, like `lib/` if only changes in that folder need us to recompute. ```console $ git rev-parse HEAD^{tree} 6bfefc1b09b11351ab4b7bfa6f3a765f463aa6aa $ git rev-parse HEAD:./ # might be easier to remember 6bfefc1b09b11351ab4b7bfa6f3a765f463aa6aa $ git show HEAD^{tree} tree HEAD^{tree} .editorconfig .envrc content/ flake.lock flake.nix og-template/ static/ templates/ $ git rev-parse HEAD:content f7ec97a728eaf53d6489b8e579938b66af3022c4 $ git show HEAD:content tree HEAD:content _index.html articles/ oss-contributions.md qrh/ siteroll.md $ git rev-parse 7716e72^{tree} 6d1d979c892cc766282f140a8e49d974702161d5 $ git rev-parse 7716e72:content b56eec0376e65a6e34d5a9af2adab5bce799b193 ``` [github:LunNova/nix-eval-diff](https://github.com/LunNova/nix-eval-diff) takes a commit range like `origin/staging…HEAD` and evaluates each commit in the range, showing changed/added/removed packages & flagging if a commit fails eval. This protects me from reordering commits in a way that leaves some non-evaluating ones which is bad practice & lets me notice if a commit has more rebuilds than expected which might indicate I messed up the split. ```c 5f83595ff1ea (+0 -0 ~79 ) rocmPackages.llvm: use clang config file instead of deprecated GCC_INSTALL_PREFIX 691e49259092 (+0 -0 ~92 ) rocmPackages: 7.1.1 -> 7.2.0 331070f87dcc (+0 -0 ~82 ) rocmPackages.llvm: revert loop unrolling pessimisation 3376d1efb517 (+0 -0 ~0 ) rocmPackages.miopen: add updateScript for DVC S3 kdb files 3bc0e6a4c5fc (+0 -0 ~0 ) rocmPackages: sha256 = -> hash = ``` nix-eval-diff is much faster with tree hashing as it can confidently only eval on actual changed content in an extremely cheap git-native way, including when I squash intermediate commits or reorder.