So much of the foundation for what we do we as developers relies on effective version control. But for many, working with GitHub, not to mention git itself, is a mysterious and at times intimidating experience.
Working with git shouldnât feel like wrestling with some arcane language. In this article, I review the four fundamental data structures that git uses to store and manage your project history: blob
, tree
, commit
, and tag
.
In this article:
- Exploring the four main git object types
- Helpful git commands so your code is the best version of itself
Blobs
Each version of a file is represented as a blob. Blob, a contraction of binary large object, is a term thatâs commonly used in computing to refer to some variable or file that can contain any data and whose internal structure is ignored by the program[1], [3], [4].
Trees
A tree object represents one level of directory information. It records blob identifiers, path names, and a bit of metadata for all the files in one directory. It can also recursively reference other (sub)tree objects and thus build a complete hierarchy of files and subdirectories[1], [3], [4].
Commits
A commit object holds metadata for each change introduced into the repository, including the author, committer, commit date, and log message. Each commit points to a tree object that captures, in one complete snapshot, the state of the repository at the time the commit was performed[1], [3], [4].
Tags
A tag object assigns an arbitrary yet presumably human readable name to a specific object, usually a commit. Although the string of hexadecimal characters that git spit out references an easily accessible commit, a more familiar tag name like âfixed-user-loginâ
might make more sense[1], [3], [4].
These four objects constitute the foundation behind Gitâs higher level data structures[3]. Letâs create a quick demonstration to get a better understanding.
- Create
mkdir <directory-name>
and navigatecd directory-name
into a directory to demonstrate our learning - Create a test file with some dummy text by running
echo âhello testâ > test.txt
- Run
git hash-object test.txt
and review the deterministic output - Run
git cat-file -t <hash-object>
to see the type of object - Run
git cat-file -p <hash-object>
to read the object information
Objects in git have a specific hash made with the SHA1
algorithm. Roughly speaking, a hash algorithm is a mathematical function that takes in an input of arbitrary and variable length and returns a unique and fixed size representation of the input in the form of a string of charactersâfor our purposes, hexadecimal.
Whereas many traditional file systems organize data hierarchically and access files by their name and directory path, git stores and retrieves data based on the cryptographic hash of the content itself. In other words, when you create a branch, or commit a change, git uses these four objects to represent the state and version history of your code.
To access your repoâs hidden .git foldersâspecifically taking a peak at the objects folder usecmd + shft + .
on mac or run ls .git/objects
in the terminal.
Fun Commands:
git log âgraph
provides visual representation of commit timeline in terminalgit grep <search-term>
searches across commits your repo for referenced search valuecat .git/HEAD
=> ref: refs/heads/main demonstrates that head is a reference pointer
TL;DR:
Git uses the SHA-1
hash algorithm and four fundamental data structures to manage the file contents, directories, and commit information in your project, creating a content-addressable file system.
References
[1] "Git Documentation," in git-scm.com, git-scm.com/doc, September 28, 2023.
[2] E. Xie. âDissecting Gitâs Guts, Emily Xie â Git Merge 2016,â YouTube, [Online]. Available: youtu.be/Y2Msq90ZknI?si=9b0PtiiPhBRqnHXf. Accessed: September 28, 2023.
[3] J. Leoliger, M McCullough, Version Control with Git, 2nd Edition. [Online]. Available: oreilly.com/library/view/version-control-wi... Accessed: September 28, 2023.
[4] B. Staschuk. âGit and GitHub-the complete guide,â [Online]. Available: stashchuk.com/git-and-github-complete-guide. Accessed: September 28, 2023.