How to implement your own Git
How to implement your own version control system based on Git (based on UCB CS 61B)Permalink
This blog is all based on UCB CS 61B (Data Structures, Spring 2021). More see the official website.
Git is a powerful, distributed VCS (vision control system) widely used in software development. Created by Linus Torvalds in 2005, Git enables developers to track changes, collaborate seamlessly, and manage their codebase efficiently.
1. Git BasicPermalink
Git is a distributed version control system that allows multiple developers to work on a project simultaneously without overwriting each other’s changes. It helps track changes, revert to previous states, and collaborate efficiently. More detailed introduction see MIT Missing Semester, Vision Control (Git).
Key ConceptsPermalink
- Repository: A repository (or repo) is a directory that contains all of your project files and the history of their changes.
- Commit: A commit is a snapshot of your repository at a specific point in time. Each commit has a unique ID.
o <-- o <-- o <-- o (main)
- Branch: A branch is a separate line of development. The default branch name in Git is main.
o <-- o <-- o <-- o (main) \ \ o <-- o (new_feature)
- Merge: Merging is the process of combining the changes from one branch into another.
o <-- o <-- o <-- o <---- o (main) ^ / \ v --- o <-- o (new_feature)
- Staging Area (Index): The staging area is where you place changes that you want to include in your next commit.
- HEAD: HEAD is a pointer to the current commit or the tip of the current branch.
- Checkout: Checkout is used to switch branches.
Git TerminologiesPermalink
// a file is a bunch of bytes
type blob = array<byte>
// a commit has parents, metadata, and the top-level tree
type commit = struct {
parents: array<commit>
author: string
message: string
snapshot: tree
}
// an object is a blob or commit
type object = blob | commit
// all objects are content-addressed by their SHA-1 hash
objects = map<string, object>
def store(object):
id = sha1(object)
objects[id] = object
def load(id):
return objects[id]
2. Overview of GitletPermalink
Let’s call our version Gitlet, which mimics some basic features of Git. A version-control system helps manage and track changes in projects, allowing you to save, restore, and view different versions of files. Key functionalities in Gitlet include:
- Committing: Saving the state of a directory of files.
- Checking out: Restoring a previous version of files or commits.
- Logging: Viewing the history of commits.
- Branching: Maintaining separate sequences of commits.
- Merging: Integrating changes from one branch into another.
Gitlet visualizes commits as a linked list, with each commit pointing to its parent. The HEAD pointer tracks the current state in the commit history. Gitlet also supports branching, allowing you to maintain and switch between different versions of your project, visualized as a commit tree. Once created, commits in Gitlet are immutable, ensuring that past states are preserved and not accidentally deleted.
3. .gitlet
Directory StructurePermalink
Our .gitlet
directory is similar to the official .git
, which is created when you initialize a Git repository with git init. It contains all the metadata and object database for your repository.
.gitlet
├── HEAD
├── stageIndex
├── objects
├── refs
└── heads
└── main
- HEAD: Points to the current branch reference. It contains the path to the current branch in
refs/heads/
.HEAD -> refs/heads/main
- stageIndex: A binary file that keeps track of the staged files for the next commit. It’s also called the staging area.
- objects/: Contains all the objects (blobs, trees, and commits) that make up the history of the repository.
objects/ ├── 1234567890abcdef... ├── 34567890abcdef12... └── ...
- refs/: Stores references to commit objects. It contains
heads/
for local branches.refs/ └── heads/ └── main
4. Project Structure (Java)Permalink
Gitlet ├── Main.java ├── Commands.java ├── Branch.java ├── Commit.java └── Utils.java
More see my implementation.
Now it is done!