This git is lit đŸ”„

·

3 min read

So much of the foundation for what we do we as developers relies on effective version control. But for many, working with GitHub, not to mention git itself, is a mysterious and at times intimidating experience.

Working with git shouldn’t feel like wrestling with some arcane language. In this article, I review the four fundamental data structures that git uses to store and manage your project history: blob, tree, commit, and tag.


In this article:

  • Exploring the four main git object types
  • Helpful git commands so your code is the best version of itself

Blobs

Each version of a file is represented as a blob. Blob, a contraction of binary large object, is a term that’s commonly used in computing to refer to some variable or file that can contain any data and whose internal structure is ignored by the program[1], [3], [4].

Trees

A tree object represents one level of directory information. It records blob identifiers, path names, and a bit of metadata for all the files in one directory. It can also recursively reference other (sub)tree objects and thus build a complete hierarchy of files and subdirectories[1], [3], [4].

Commits

A commit object holds metadata for each change introduced into the repository, including the author, committer, commit date, and log message. Each commit points to a tree object that captures, in one complete snapshot, the state of the repository at the time the commit was performed[1], [3], [4].

Tags

A tag object assigns an arbitrary yet presumably human readable name to a specific object, usually a commit. Although the string of hexadecimal characters that git spit out references an easily accessible commit, a more familiar tag name like ‘fixed-user-login’ might make more sense[1], [3], [4].

These four objects constitute the foundation behind Git’s higher level data structures[3]. Let’s create a quick demonstration to get a better understanding.

  • Create mkdir <directory-name> and navigate cd directory-name into a directory to demonstrate our learning
  • Create a test file with some dummy text by running echo “hello test” > test.txt
  • Run git hash-object test.txt and review the deterministic output
  • Run git cat-file -t <hash-object> to see the type of object
  • Run git cat-file -p <hash-object> to read the object information

Objects in git have a specific hash made with the SHA1 algorithm. Roughly speaking, a hash algorithm is a mathematical function that takes in an input of arbitrary and variable length and returns a unique and fixed size representation of the input in the form of a string of characters—for our purposes, hexadecimal.

Whereas many traditional file systems organize data hierarchically and access files by their name and directory path, git stores and retrieves data based on the cryptographic hash of the content itself. In other words, when you create a branch, or commit a change, git uses these four objects to represent the state and version history of your code.

To access your repo’s hidden .git folders—specifically taking a peak at the objects folder usecmd + shft + . on mac or run ls .git/objects in the terminal.

Fun Commands:

  • git log –graph provides visual representation of commit timeline in terminal
  • git grep <search-term> searches across commits your repo for referenced search value
  • cat .git/HEAD => ref: refs/heads/main demonstrates that head is a reference pointer

TL;DR:

Git uses the SHA-1 hash algorithm and four fundamental data structures to manage the file contents, directories, and commit information in your project, creating a content-addressable file system.

References

[1] "Git Documentation," in git-scm.com, git-scm.com/doc, September 28, 2023.

[2] E. Xie. “Dissecting Git’s Guts, Emily Xie – Git Merge 2016,” YouTube, [Online]. Available: youtu.be/Y2Msq90ZknI?si=9b0PtiiPhBRqnHXf. Accessed: September 28, 2023.

[3] J. Leoliger, M McCullough, Version Control with Git, 2nd Edition. [Online]. Available: oreilly.com/library/view/version-control-wi... Accessed: September 28, 2023.

[4] B. Staschuk. “Git and GitHub-the complete guide,” [Online]. Available: stashchuk.com/git-and-github-complete-guide. Accessed: September 28, 2023.

Â