Digging into Git objects

Keywords: Git

Git is a version control system that is decentralized (as oppossed to SVN). Can be used only locally or with a server where different users work collaboratively.

In a local git project there are different areas:

Every commit creates two special objets:

The current HEAD is updated with the value of this new pointer to commit object.

git log
# Output a list of pointer to commit objects
commit ad7d95b28ffc6515cde8dc00542b4623225853a9 (HEAD -> branch)
Author: ME
Date:   DATE
#git cat-file -p HASH-OF-POINTER-TO-COMMIT
git cat-file -p ad7d95b28ffc6515cde8dc00542b4623225853a9
tree dda1543ead42cd7e1f8d3a9a6012f991facb4c72
parent cff8ee38faf009a4f67521eabce4c9e58403acc3
author ME
committer ME

Message provided in the commit
#git cat-file -p HASH-OF-TREE
git cat-file -p dda1543ead42cd7e1f8d3a9a6012f991facb4c72
100644 blob 86fd1fdb58c162c79346d84fad97aa71704a85e0	lorem.txt
040000 tree 2b297e643c551e76cfa1f93810c50811382f9117	prueba

We see that there is a file lorem.txt and a directory named prueba.

git cat-file -p 2b297e643c551e76cfa1f93810c50811382f9117
100644 blob 9daeafb9864cf43055ae93beb0afd6c7d144bfa4	test.txt

The directory contained the file test.txt.

With that knowledge, it is possible to write a toy script to extract a commit to a directory, only the files that are in the parent directory, not subdirectories and their contents. For that you need to think harder (a recursive algorithm or may be using a Stack):

# Use in a test repository
# This is the pointer to a commit (provide yours)
POINTER=ad7d95b28ffc6515cde8dc00542b4623225853a9
DIR=/tmp/test3

# Obtain the tree
tree=$(git cat-file -p $POINTER | head -n1 | cut -d' ' -f2)

# Obtain all objects in the tree
objects=$(git cat-file -p $tree)

while IFS= read -r line; do
    # Only extract files in the parent dir
    blobp=$(echo "$line" | grep "blob")
    if [ -n "$blobp" ]; then
       obj=$(echo "$line" | awk -F' ' '{print $3}')
       fname=$(echo "$line" | awk -F' ' '{print $4}')
       echo "Object: $obj"
       echo "Saved to file: $DIR/$fname"
       git cat-file -p $obj > $DIR/$fname
    fi
done < <(printf '%s\n' "$objects")

Hopefuly it is much simpler to use git archive:

git archive --format zip --output /tmp/out.zip POINTER-TO-COMMIT

Some considerations about objects and files:

We can show this with the help of two diagrams. Let's assume that we have a commit with two files:

git-pointers-trees-objects_1.png

Now we change the file file2.txt in our working directory and perform a commit, we will have:

git-pointers-trees-objects_2.png

Goto index

Date: 15/02/2021

Author: Juan Gutiérrez Aguado

Emacs 27.1 (Org mode 9.4.4)

Validate