Artisanal Handcrafted Git Repositories | drew’s dev blog

Date:

Share:

There’s no love and care put into crafting our git repositories nowadays.

Let’s change that.

I’m going to talk about how to handmake your git repositories without using these silly git commands.

You might also learn a bit more about how git works under the hood during the process, or whatever.

If you’re so inclined, you might also take it as an opportunity to appreciate how the power of git comes not from the complexity of its code but from the simplicity and elegance of its design. I mean, if that’s your thing.

Git refers to the user-friendly commands as “porcelain” and to the internals as “plumbing”, so you can think of this as an introductory lesson in git plumbing. In fact, git has “plumbing” commands which we’re not even using, so this is more like an introductory course in git fluid dynamics. In other words, it’s pretty silly.

#

Pre-requisites

I’m going to assume you are familiar with git and are comfortable in a shell environment, otherwise this probably won’t make much sense to you.

#

Let’s get started

The first thing we’d do normally is run git init, but where’s the care and attention in that? We’re going to do it the old-fashioned way, like the pilgrims of yore would’ve done.

# This is where git stores all the information for a repository, from branches to

# Git expects these folders to exist, but we don't need to add anything into them yet.

$ mkdir -p .git/hooks .git/info .git/objects/info .git/objects/pack .git/refs/heads .git/refs/remotes .git/refs/tags .git/logs

# This is just a standard repo-specific config file – these are the default values on my

$ cat <<EOF > .git/config

repositoryformatversion = 0

Now we only have to add one more file before we’ve got a valid git repository, and that’s the HEAD. You might’ve come across this before – HEAD just means “what is your repository pointing to right now?”. It’s a text file and it’s either got a reference in (the normal state of a repository) or just a plain commit hash – this is what’s referred to as a “detached head” (sounds painful, I know).

We’re going to point git to our default branch, called main.

$ echo "ref: refs/heads/main" > .git/HEAD

You might be thinking – wait a minute, where is this main branch? I’ve not created any branches yet. And you haven’t, but that’s fine because we don’t have any commits yet. We can run git status to check whether git is happy with our handiwork so far.

nothing to commit (create/copy files and use "git add" to track)

If you get the error “fatal: This operation must be run in a work tree”, it’s probably just because you’re running it from inside the .git folder, which won’t work. Just cd one level up and you should be good to go.

Nice! Now, let’s get committing.

#

Content Addressable Storage

Before we start hand-crafting our commits, we need to spend a bit of time refining our craftwork by studying the lore.

First, we need to understand what an “object” is in git, what they are and how git stores them.

Git needs to store lots of things. If you create a commit in git, git then needs to store everything about the commit itself, including the commit description, which files it includes, who committed it, the timestamp, and the contents of the files.

All of these things are stored as “objects”. But what do these look like? Let’s take another repository which already has a bunch of data in it.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓

commit 84eae65b6780129486768e6497736c38bfdf9b3d (HEAD -> main, origin/main, origin/HEAD) ┃

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Date: Mon Sep 30 21:39:55 2024 +0100

Improve py 3.13 graphs and add extra section on scaling.

This is the last commit in the repo for this blog, at the time of writing. We can see that the commit is identified by its SHA-1 hash, 84eae65b (I can’t be bothered to type out the whole thing).

This is where things start to get clever – the way git stores all these things like commits, files, etc.1 is not based on some location separate from the data itself like a filename or a key (which would require knowing this filename/key outside of the data itself, e.g. by querying a central listing) but based on the contents of the thing itself. Do a SHA-1 hash of the object and that is the “key” that you use to find the object on disk.

This is called Content Addressable Storage (CAS) or sometimes fixed-content storage and once you recognise it, you start seeing it all over the place. CAS and content addressing in general are used by, among other things:

A side bonus of using content addressable storage is that if you ever have duplicate files, you don’t need to store them twice – they’ll have the same hash so they’ll be stored in the same location.

#

Commit objects

To find commit 84eae65b we need to look in .git/objects/84/eae65b6780129486768e6497736c38bfdf9b3d. The data is compressed using zlib so we have to pipe it into a decompressor (I’m using pigz here):

$ cat .git/objects/84/eae65b6780129486768e6497736c38bfdf9b3d | pigz -d

commit 1136tree 5a0be7720e65417e08034a64bc257bc56a60b4b3

parent e345662c7d53408eb2638cf0fdbae442fe6b68f4

author Drew Silcock 1727728795 +0100

committer Drew Silcock 1727728795 +0100

gpgsig -----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEaZwozZ5d++BpkqZmtEW8+mMmNyAFAmb7DJsACgkQtEW8+mMm

NyAZeBAAs2I1rodxTBpOnFUgNnl5Slf2o03VZlc7kvbw2miCUP5CkO40REHzGXXE

K3sJSUhObttTrKr0GjUChcvzBZBKoigawP+h3IeY07whhhTcnNaBXjQqzpcl+G5A

ryEVkQXdCqVRWAk3I/6Z3hFlfUogzbxihGoEKvjyMZtmfy0di0WAOJ+PLlTIEwJJ

SQYcUaA7l01ocIWy85MezGJHZEpurcBjzu5nkYCMGRw85u9tXXqjzaYh6Fu7WVEH

rHmBO8tEFF/WcQC1FonVggrOQOAsssuaMxwxKV/p4HRxP9lHGmzFCGfbKAY1bQ82

2dWgROwMAp6jtvLSX6OLu6i0O3+m6NAwTtKcOFDU+Jae4h2m1GC3/8qDukhK7o+e

5LJCLAZPtTvpai43COLRnF9iteV15H267WOxpIvXqbMBwIFcaaHepFMLA0Y39Kr3

FHd1JAaaE6fiUe4rjNP5Wx6ZVLKdEYznjbxgxiRkr9dcemR5SUQtreHjaaLTo0E9

m6bEE1huZp+gu/dy9e7hgNORiwmkUP49r4/WPbNwwKrMxr5lD1ZwQk6DKEi6jAyq

BduJ4fdtamFlngnbJtoW0LHsdxROMwHkqs1Pz4zxpmeOZZEv0p0pzFhM30ta+YjQ

ogBAyoRGHAZG2cze5uI8Cg7fr1A+uTqGmBAXexYN+/ok4+Bf/5g=

-----END PGP SIGNATURE-----

Improve py 3.13 graphs and add extra section on scaling.

If we pipe this into hexyl we can see the hidden null byte:

$ cat .git/objects/84/eae65b6780129486768e6497736c38bfdf9b3d | pigz -d | hexyl

┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐

│00000000│ 63 6f 6d 6d 69 74 20 31 31 33 36 00 74 72 65 65 │commit 1┊136⋄tree│

│00000010│ 20 35 61 30 62 65 37 37 32 30 65 36 35 34 31 37 5a0be77┊20e65417│

│00000020│ 65 30 38 30 33 34 61 36 34 62 63 32 35 37 62 63 │e08034a6┊4bc257bc│

│00000030│ 35 36 61 36 30 62 34 62 33 0a 70 61 72 65 6e 74 │56a60b4b┊3_parent│

│00000040│ 20 65 33 34 35 36 36 32 63 37 64 35 33 34 30 38 e345662┊c7d53408│

│00000050│ 65 62 32 36 33 38 63 66 30 66 64 62 61 65 34 34 │eb2638cf┊0fdbae44│

So this follows the format commit (content length)\x00(commit contents) which is the format used by all the other types of objects too.

#

Tree objects

We can see the hash of another object here by looking at the first info after the null byte: tree 5a0be7720e65417e08034a64bc257bc56a60b4b3. This tells us that the “tree” object with hash 5a0be772 contains the files included in the commit.

$ cat .git/objects/5a/0be7720e65417e08034a64bc257bc56a60b4b3 | pigz -d

tree 678100644 .gitignore� H@�L�������BO��p�100644 .prettierignore�Y��Vt$��n!]- ]�100644 .prettierrc.mjs�2iY��n�$3Q%�w��40000 .vscode

3���i(�[�Ə��D��100644 LICENCE���+�����rW4�^�;��r100644 README.md]%��h�

�+�100644 TODO.txt��E+1�n�[�� ��ڏ�100644 astro.config.mjs(

��c�Xp����jl�5A40000 fonts�0�A�����'���y�}40000 images��[z��CY��9������100644 logo.svg���

100644 package-lock.json�����,!���ԥ��1��100644 package.jsonH�jCi��� i�za�c;�b�40000 public�D�PD0en/�%1��0ư��40000 src�OF;�0� ���)

�a�100644 tsconfig.jso?�s����V�v 4ice>y⏎

We can clearly see that this contains some useful information like filenames and some 644s which smells of unix permissions/file modes, but it looks like a mixed binary-text format. If we look at the hex, we can see a bit clearer what’s going on:

$ cat .git/objects/5a/0be7720e65417e08034a64bc257bc56a60b4b3 | pigz -d | hexyl

┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐

│00000000│ 74 72 65 65 20 36 37 38 00 31 30 30 36 34 34 20 │tree 678┊⋄100644

│00000010│ 2e 67 69 74 69 67 6e 6f 72 65 00 d6 20 48 19 40 │.gitigno┊re⋄× H•@│

│00000020│ d7 4c de f6 ff ba be c0 d2 42 4f a7 b5 70 f6 31 │×L××××××┊×BO××p×1│

│00000030│ 30 30 36 34 34 20 2e 70 72 65 74 74 69 65 72 69 │00644 .p┊rettieri│

│00000040│ 67 6e 6f 72 65 00 dd 59 c0 f4 8e 56 74 24 ef 9b │gnore⋄×Y┊×××Vt$××│

│00000050│ e7 6e 21 1d 5d 2d 20 5d e2 80 31 30 30 36 34 34 │×n!•]- ]┊××100644│

│00000060│ 20 2e 70 72 65 74 74 69 65 72 72 63 2e 6d 6a 73 .pretti┊errc.mjs│

│00000070│ 00 b8 32 69 59 18 96 10 b9 6e eb 24 33 51 25 89 │⋄×2iY•ו┊×n×$3Q%×│

│00000080│ 77 04 02 b4 eb 34 30 30 30 30 20 2e 76 73 63 6f │w••××400┊00 .vsco│

│00000090│ 64 65 00 0c 33 c7 c3 c0 69 28 c9 5b e8 c6 8f 9f │de⋄_3×××┊i(×[××××│

│000000a0│ 15 a5 08 c8 44 a3 dc 31 30 30 36 34 34 20 4c 49 │•ו×D××1┊00644 LI│

│000000b0│ 43 45 4e 43 45 00 e2 18 cd e0 2b a8 bf d9 ea c7 │CENCE⋄ו┊××+×××××│

│000000c0│ 72 57 34 f4 5e 95 3b b7 87 72 31 30 30 36 34 34 │rW4×^×;×┊×r100644│

│000000d0│ 20 52 45 41 44 4d 45 2e 6d 64 00 5d 25 dd e9 68 README.┊md⋄]%××h│

│000000e0│ b1 0a 7e 6e a3 2a 51 a7 14 77 0c c8 2b 06 b0 31 │×_~n×*Q×┊•w_×+•×1│

│000000f0│ 30 30 36 34 34 20 54 4f 44 4f 2e 74 78 74 00 be │00644 TO┊DO.txt⋄×│

│00000100│ 1c e0 45 2b 31 bd 6e fd 5b a2 e3 9f 14 09 e9 e8 │•×E+1×n×┊[××ו_××│

│00000110│ da 8f a0 31 30 30 36 34 34 20 61 73 74 72 6f 2e │×××10064┊4 astro.│

│00000120│ 63 6f 6e 66 69 67 2e 6d 6a 73 00 28 0c 93 c5 63 │config.m┊js⋄(_××c│

│00000130│ bd 58 70 16 16 b0 e6 1e f9 ab 6a 6c ef 35 41 34 │×Xp••×ו┊××jl×5A4│

│00000140│ 30 30 30 30 20 66 6f 6e 74 73 00 ef 30 1e e8 41 │0000 fon┊ts⋄×0•×A│

│00000150│ 8d c7 18 d7 c1 16 ed 27 1c 8d b6 a4 79 d0 7d 34 │×ו×ו×'┊•×××y×}4│

│00000160│ 30 30 30 30 20 69 6d 61 ┊ 67 65 73 00 1c 10 b7 b5 │0000 ima┊ges⋄••××│

│00000170│ 5b 7a 99 9c 43 59 18 e7 ┊ d9 39 fa 8f c8 ff d2 ca │[z××CY•×┊×9××××××│

│00000180│ 31 30 30 36 34 34 20 6c ┊ 6f 67 6f 2e 73 76 67 00 │100644 l┊ogo.svg⋄│

│00000190│ 95 07 b5 d8 0b 69 3a 4c ┊ 96 eb 06 36 b6 a7 0b be │ו×וi:L┊×ו6×ו×│

│000001a0│ a6 83 29 0d 31 30 30 36 ┊ 34 34 20 70 61 63 6b 61 │××)_1006┊44 packa│

│000001b0│ 67 65 2d 6c 6f 63 6b 2e ┊ 6a 73 6f 6e 00 9e 8f bc │ge-lock.┊json⋄×××│

│000001c0│ b0 d9 7f 11 2c 21 f7 a9 ┊ 8e d4 a5 be d8 31 10 86 │×ו•,!××┊×××××1•×│

│000001d0│ e4 31 30 30 36 34 34 20 ┊ 70 61 63 6b 61 67 65 2e │×100644 ┊package.│

│000001e0│ 6a 73 6f 6e 00 48 97 6a ┊ 43 69 f1 f4 e4 20 69 91 │json⋄H×j┊Ci××× i×│

│000001f0│ 02 7a 61 ec 63 3b 80 62 ┊ b9 34 30 30 30 30 20 70 │•za×c;×b┊×40000 p│

│00000200│ 75 62 6c 69 63 00 b0 44 ┊ d0 50 44 17 30 65 6e 2f │ublic⋄×D┊×PD•0en/│

│00000210│ 90 25 31 97 d1 30 c6 b0 ┊ 90 a8 34 30 30 30 30 20 │×%1××0××┊××40000 │

│00000220│ 73 72 63 00 cb 4f 46 3b ┊ b9 02 30 0e 8d 0a aa d6 │src⋄×OF;┊ו0•×_××│

│00000230│ 84 03 44 c6 40 2d b9 5d ┊ 31 30 30 36 34 34 20 74 │וD×@-×]┊100644 t│

│00000240│ 61 69 6c 77 69 6e 64 2e ┊ 63 6f 6e 66 69 67 2e 6d │ailwind.┊config.m│

│00000250│ 6a 73 00 6b 3c 62 76 4e ┊ 97 2a f4 50 e1 e4 3d d8 │js⋄k

│00000260│ d6 b9 35 5f b8 07 f2 34 ┊ 30 30 30 30 20 74 69 6e │××5_ו×4┊0000 tin│

│00000270│ 61 00 60 6f b2 72 1f d5 ┊ df 74 c9 81 c1 f6 36 42 │a⋄`o×r•×┊×t××××6B│

│00000280│ 38 23 0d d0 61 ae 31 30 ┊ 30 36 34 34 20 74 73 63 │8#_×a×10┊0644 tsc│

│00000290│ 6f 6e 66 69 67 2e 6a 73 ┊ 6f 6e 00 08 3f fe 73 de │onfig.js┊on⋄•?×s×│

│000002a0│ c8 f0 8b 56 e3 76 20 00 ┊ 05 34 69 63 65 3e 79 │×××V×v ⋄┊•4ice>y │

└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘

Ok maybe it’s not that much clearer, but it’s obvious once I explain it:

(file mode) (file name)\x00(binary hash)(file mode) (file name)\x00(binary hash)...

The file mode is in plaintext so here we see that the file .gitignore has mode 100644 which (according to Greg Bacon on StackOverflow) means a regular file with permissions 644. (Side note: I don’t know what git does on Windows where unix file modes and stat don’t exist, leave a comment below if you do!)

This also tells us that the hash of the .gitignore file is d6 20 48 19 40 d7 4c de f6 ff ba be c0 d2 42 4f a7 b5 70 f6 – we don’t need a null terminator at the end of this because SHA-1 hashes are always precisely 20 bytes, so we can go straight into the file mode of the next file in the tree without any bytes separating the two files in the tree.

Because git uses this content-addressing system, we can then look up the file itself using this hash:

$ cat .git/objects/d6/20481940d74cdef6ffbabec0d2424fa7b570f6|pigz -d

# jetbrains setting folder

Again, we see the object type, in this case “blob” indicating file contents, followed by the length of the file contents, followed by a null byte (not shown by terminal here), followed by the file contents (and a very exciting example it is).

There’s actually some helper commands, which git calls “plumbing” commands, for printing out object info: git cat-file -p (sha-1 hash) will pretty print the object contents (the -t flag will display the object type and -s will display the size). We haven’t covered annotated tags, which are also objects, but they’re quite similar to commits and not massively interesting for our discussion. Lightweight tags (without messages) are just references to commits hashes, not objects, so are stored in the refs/ folder alongside branches.

Neat!

#

But wait, there’s no diffs/delta here! I thought commits were showing changes to files?

Commits do not store deltas or differences between files but rather store the whole before and the whole after. This was actually one of the big selling points of git compared to competing version control systems, back before git became the de-facto standard2.

What git does when you do git diff is get the before contents, get the after contents, then run a differ on these two versions to display to the user. You can even change the differ that git uses. Personally, I use delta, but I’ve heard good things about difftastic and diff-so-fancy.

You might be thinking that this will surely make repositories mahoosive, right? We’ll get onto that in just a second.

#

What’s up with the first two letters of the hash being the folder?

Good question. Apparently, there are some filesystems which don’t allow you to have more than a certain n# files in any particular directory, and others which use a sequential scan to find a file within a directory, which wouldn’t be good if you were trying to commit to the Linux kernel, which has some ~4.5 million objects or so.

SHA-1 hashes are well distributed so you can expect as many hashes starting with 00 as 1e as 8f and every other possible combination. Git takes advantage of this by checking whether there are 27 or more files in any of these subdirectories3. If so, it would indicate that there are more than 6,700 objects in total, which tells git that it’s time to “pack” those “loose” objects up into packfiles, to save on space.

#

Time to pack it up

If you look in the .git/objects folder, there are a couple of non-hex folders that you might be wondering about. One is called info/ and one is called pack/.

The “info” folder is super interesting and not something I encountered before researching for this blog post – turns out multiple git repositories can share a single object database to reduce on storage size which is pretty neat. I think the “info” folder is only used for very specific, niche use cases to be honest.

The “pack” folder is very commonly used, and it’s where git puts its packfiles. What’s a packfile, I hear you ask? Well, as I said above, when git decides that there are too many “loose” objects in the .git/objects folder (apparently, 6,700 by default), it pulls multiple objects together into a single packfile. This allows multiple objects to be compressed together within the same file, improving compression ratios.

This does introduce the problem of how to find objects once they’ve been packed up. Git supports per-pack index files and also multi-index files, which is a more recent introduction that allows using a single index for all your packs. If you’re using multiple pack index files, you can try some tactics like looking at the most recently changed packfiles first as potential speedups, but ultimately you do have to search through all the packfile index files to find your object.

#

Creating our first commit

Now that we understand git objects, we can make our first commit.

$ echo -e "Has spring come indeed?\nOn that nameless mountain lie\nThin layers of mist.\n\n - Matsuo Bashō" > haiku.txt

(use "git add ..." to include in what will be committed)

nothing added to commit but untracked files present (use "git add" to track)

Okay, so normally we would need to add the file to our staging area before we can commit it. But we don’t really need to do that because we’re handcrafting our artisanal repository. Plus, the index is a binary file located in .git/index which makes it a) not massively interesting and b) not particularly easy to manipulate using just command line tools.

If you’re interested in learning more about the index file format and enjoy sitting down with a cup of hot mocha and an internal documentation page describing binary file formats, take a look over at the git docs reference page on the index format: https://git-scm.com/docs/index-format[^me-disclaimer].

#

Creating our file blob

Okay, so we need to construct the commit object from scratch. Let’s start with the blob, then use that to construct the tree, then we can build the commit object from that tree.

# Note: SHA-1 is done before applying zlib compression.

$ echo -e "blob 94\x00$(cat haiku.txt)" | sha1sum

e5d59773e77daf9f9b9129781ca77d475a451831 -

$ mkdir -p .git/objects/e5

$ echo -e "blob 94\x00$(cat haiku.txt)" | pigz -z > .git/objects/e5/d59773e77daf9f9b9129781ca77d475a451831

# Check whether we messed it up yet

$ git cat-file -p e5d59773e77daf9f9b9129781ca77d475a451831

On that nameless mountain lie

#

Creating our tree blob

Phew, so now let’s use that file blob object to create a tree blob with just that file in it, using file mode 100644 again:

# Let's make a helper function for the hex to bytes conversion.

# Note: I use fish, where this works but you need to add another `\\` before the `x&` in

$ function hex-to-bytes() { printf "$(printf "$1" | sed 's/../\\x&/g')"; }

Note for Fish shell users: your hex-to-bytes() function will look like this:

function hex-to-bytes; printf "$(printf "$argv[1]" | sed 's/../\\\\x&/g')"; end

Back to creating our tree object:

$ printf "100644 haiku.txt\x00$(hex-to-bytes e5d59773e77daf9f9b9129781ca77d475a451831)" | hexyl

┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐

│00000000│ 31 30 30 36 34 34 20 68 61 69 6b 75 2e 74 78 74 │100644 h┊aiku.txt│

│00000010│ 00 e5 d5 97 73 e7 7d af 9f 9b 91 29 78 1c a7 7d │⋄×××s×}×┊×××)x•×}│

│00000020│ 47 5a 45 18 31 0a │GZE•1_

└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘

$ printf "100644 haiku.txt\x00$(hex-to-bytes e5d59773e77daf9f9b9129781ca77d475a451831)" | wc -c

$ printf "tree 37\x00100644 haiku.txt\x00$(hex-to-bytes e5d59773e77daf9f9b9129781ca77d475a451831)" | sha1sum

4aff48f6390a65b88d343ea5d23c03007646b5c2 -

$ mkdir -p .git/objects/4a

$ printf "tree 37\x00100644 haiku.txt\x00$(hex-to-bytes e5d59773e77daf9f9b9129781ca77d475a451831)" | pigz -z > .git/objects/4a/ff48f6390a65b88d343ea5d23c03007646b5c2

#

Creating our commit

Now that we have our tree, we can create the commit! (Finally…)

# This will vary on your machine, obviously

# Feel free to replace with your name/email and timestamp/timezone – just bear in mind

# that the SHA-1 hash will be different.

# Note: Commits usually have a "parent" field but as this is our first commit, it has no

$ cat <<EOF > /tmp/my-commit

tree 4aff48f6390a65b88d343ea5d23c03007646b5c2

author Drew Silcock 1752659127 +0100

committer Drew Silcock 1752659127 +0100

Initial artisanal commit.

# Optional – I like to sign my commits.

$ cat /tmp/my-commit | gpg --armor --detach-sign

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEaZwozZ5d++BpkqZmtEW8+mMmNyAFAmh3diEACgkQtEW8+mMm

NyDomRAAvWYhK9Eg+ZjmChFR2ZX9bB/KZH+H3ksziy2UHp8LiaHgOb3Ira02rpSm

LvVQjxmgurzYBd3nl1e/8E+V3TH1kGOzmvaoCcjJkSUj6togvD7+eImulc1/xkri

q/qqPXxvj2UoRMbSc4cVy/8SZ/MTxNWtCJsuFRe6iKLRiqk67h3PY+gvebCuJteC

TevKxWV/ra+NRX2Q0w52SEUpGTVcnnxYPyMEi28Kmd9VZUsOvuC43RMm/p7u/eiC

kAzJ3GKN4oQvN/3Xz8akb09VX66M/xbMYNv/J0pbSdeIGofMDfLA3oKeZzhrvUVf

zsrpiJ9kq2CTGIuZMJZQvPc8aEEMbr/PAHgSnSTicayon7JLoi5aaoyhZLCD+pgK

Sd6OMMjrKs61UL2qxelVVde2tZumfOL4GmILrhxQgqZbZsdfDUvPMee9yFkEZVam

re8ekkUlYmlmckTqJ0yQ7VTLYhdxPN+0DRynuiKKQaQlsCHhQi2MTYEn9l+mfbrO

7gUAe697+kDwo2VECs4Z7wtPG9F+kGNFpsC0CnGMWjKRR8ZBV9BBiLyc/SZgNd9Q

MZLQWPPvnMw0YlS59rHbUk3VwebxBKx8vX2WBt1NPFtmzkFRr73yL+e69JczspoM

t6FRuXH0bTtF+uVf7qD0saFXCC9lphLYFe5PuyzpIKwWbazGDFA=

-----END PGP SIGNATURE-----

# We need to insert this signature into the commit in the format `gpgsig `.

# The single space at the start of each line is important.

# NOTE: My auto-formatter is removing the whitespace on the line after "BEGIN PGP

# SIGNATURE" – it should have a single space. You need that single space.

$ cat <<EOF > /tmp/my-commit-signed

tree 4aff48f6390a65b88d343ea5d23c03007646b5c2

author Drew Silcock 1752659127 +0100

committer Drew Silcock 1752659127 +0100

gpgsig -----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEaZwozZ5d++BpkqZmtEW8+mMmNyAFAmh3diEACgkQtEW8+mMm

NyDomRAAvWYhK9Eg+ZjmChFR2ZX9bB/KZH+H3ksziy2UHp8LiaHgOb3Ira02rpSm

LvVQjxmgurzYBd3nl1e/8E+V3TH1kGOzmvaoCcjJkSUj6togvD7+eImulc1/xkri

q/qqPXxvj2UoRMbSc4cVy/8SZ/MTxNWtCJsuFRe6iKLRiqk67h3PY+gvebCuJteC

TevKxWV/ra+NRX2Q0w52SEUpGTVcnnxYPyMEi28Kmd9VZUsOvuC43RMm/p7u/eiC

kAzJ3GKN4oQvN/3Xz8akb09VX66M/xbMYNv/J0pbSdeIGofMDfLA3oKeZzhrvUVf

zsrpiJ9kq2CTGIuZMJZQvPc8aEEMbr/PAHgSnSTicayon7JLoi5aaoyhZLCD+pgK

Sd6OMMjrKs61UL2qxelVVde2tZumfOL4GmILrhxQgqZbZsdfDUvPMee9yFkEZVam

re8ekkUlYmlmckTqJ0yQ7VTLYhdxPN+0DRynuiKKQaQlsCHhQi2MTYEn9l+mfbrO

7gUAe697+kDwo2VECs4Z7wtPG9F+kGNFpsC0CnGMWjKRR8ZBV9BBiLyc/SZgNd9Q

MZLQWPPvnMw0YlS59rHbUk3VwebxBKx8vX2WBt1NPFtmzkFRr73yL+e69JczspoM

t6FRuXH0bTtF+uVf7qD0saFXCC9lphLYFe5PuyzpIKwWbazGDFA=

-----END PGP SIGNATURE-----

Initial artisanal commit.

# Now we're ready to add the commit as an object into git's loose object storage.

$ cat /tmp/my-commit-signed | wc -c

$ echo -e "commit 1027\x00$(cat /tmp/my-commit-signed)" | sha1sum

d62016426c1b7b4125d47bad267aeaaa78bb817c -

$ mkdir -p .git/objects/d6

$ echo -e "commit 1027\x00$(cat /tmp/my-commit-signed)" | pigz -z > .git/objects/d6/2016426c1b7b4125d47bad267aeaaa78bb817c

Okay, so now we’ve created our commit, but we haven’t told git that our main branch points to this commit yet. That’s fine, we just need to create the main branch reference file.

#

Okay, but what is a reference?

Well, an object is something that has data within it – it has a type and size and gets stored in .git/objects as either loose or packed files. A reference is just a hash – no contents. References live in .git/refs and there are 3 common types:

  • local branches, that live in .git/refs/heads/, e.g. .git/refs/heads/main for the “main” branch
  • remote branches, that live in .git/refs/remotes, e.g. .git/refs/remotes/origin/main for the “main” branch in the remote called “origin”
  • lightweight tags, that live in .git/refs/tags, e.g. .git/refs/tags/v1.2.3 for the “v1.2.3” tag (as mentioned before, annotated tags are objects because they contain messages).

When you do git fetch, git checks the refs on the server to see whether they match the ones you’ve got in your .git/refs/remotes folder, and updates your local folder accordingly. When git says Your branch is up to date with 'origin/main'., it’s just saying that your local branch has the same reference in it than the remote branch, i.e. .git/refs/heads/main is identical to .git/refs/remotes/origin/main.

So what’s in one of these reference files? It’s just a commit hash! We already know our commit hash so let’s make our main branch and point it to our artisanally created commit:

echo "d62016426c1b7b4125d47bad267aeaaa78bb817c" > .git/refs/heads/main

#

Checking our work

We can use our good-old porcelain commands to check our handiwork:

nothing to commit, working tree clean

$ git log --show-signature

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓

commit d62016426c1b7b4125d47bad267aeaaa78bb817c (HEAD -> main) ┃

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

gpg: Signature made Wed 16 Jul 10:51:29 2025 BST

gpg: using RSA key 699C28CD9E5DFBE06992A666B445BCFA63263720

gpg: Good signature from "Drew Silcock " [unknown]

gpg: WARNING: This key is not certified with a trusted signature!

gpg: There is no indication that the signature belongs to the owner.

Primary key fingerprint: 699C 28CD 9E5D FBE0 6992 A666 B445 BCFA 6326 3720

Date: Wed Jul 16 10:45:27 2025 +0100

Initial artisanal commit.

Nice! We did it 🎉🎊

#

Troubleshooting

If you get stuck with any of these points or get an error like error: bad tree object HEAD, try running git fsck --full and it’s likely to tell you what’s gone wrong.

#

Future Topics

It’s taken long enough to get to this point, but there’s a bunch of really interesting stuff that we didn’t have a chance to talk about – leave a comment, send me an email or shout into the wind which of the following you’d like to hear in a follow-up post:

  • Stashes – how do they work? (Spoiler: they’re basically just commits)
  • Reflog – what is it and why should I care?
  • Packfile format and indices – I want to hear more about these packfiles and how git looks through them.
  • Index file format – I am deeply upset that you skipped over this and demand a full blog post covering the binary format in full detail bit-by-bit, otherwise I will be seeking legal action.
  • Networking – how does git communicate with the server? What’s the actual on-the-wire difference between using https://github.com/... vs. [email protected]:/... vs. [email protected]:.... I actually have no idea about this so it’d be interesting to explore.

#

Conclusion

While it was fun to learn how all these internals work, it’s not a good idea to do this in an actual repository – you will break things, and it will make you sad.

Hopefully, you found this interesting. To be honest, if you make it this far, you either a) skipped over the rest to see whether the conclusion said anything interesting (sorry to disappoint) or b) are the kind of person who reads all the way through highly technical articles about artisanal hand-crafted git repositories. Either way, I hope you enjoyed the read and maybe even learned something.

If there’s one thing to take away from this, it is the elegance of the design of git, and how it’s actually not that complicated once you understand the underlying file formats. Yes, I know there’s rebases and reflogs and all these more complicated things but git is not magic, and implementing a git clone (pun intended) from scratch in a language of your choice actually wouldn’t be that hard! At least, if you ignore the more complex things.

#

Further reading

The main inspiration for this was reading Julia Evans’ blog posts about git, so if you found this interesting, check our her posts about git. Then check out all the others too, they’re all good.

Source link

Subscribe to our magazine

━ more like this

Is AI Just Software? – Feld Thoughts

I decided to ask Claude (Opus 4), Gemini (2.5 Pro), and ChatGPT a few personal questions. I’m tired of the excessive anthropomorphizing of “AI...

Representative Line: National Exclamations

Carlos and Claire found themselves supporting a 3rd party logistics package, called IniFreight. Like most "enterprise" software, it was expensive, unreliable, and incredibly complicated....

25 On-Trend Summer Items from Banana Republic, Free People, and More

Walk into our NYC office on any given day this summer and you'll spot editors wearing at least one of the following items: beaded...

Ozempic Butt, a New Side Effect of the Drug

By now, you've probably heard all about Ozempic. The popular drug is a brand name for semaglutide, which is primarily used to help people...

Dead City’ Renewed for Season 3

<!-- _ _ _ ____ ...