A Deep Dive into How Git Works Internally
Jul 26, 2025 am 01:40 AMGit is a file system based on content addressing. Each object is uniquely identified through SHA-1/SHA-256 hash, and only one copy of the same content is immutable; 2. There are four types of objects inside: blob (file content), tree (directory structure), commit (pointing to tree and containing metadata), and tag (notation commit); 3. git add compresses and stores the file as a blob, git commit creates tree and commit objects, and records a complete snapshot instead of a difference; 4. The essence of a branch is a pointer to a commit, located under .git/refs/heads/; 5. Merges are intelligently merged by comparing three trees (ancestral, current, and target branches), and conflicts need to be resolved manually; understanding these mechanisms helps debug, write tools, recover lost commits and master the principles of historical rewriting. Git The core is the complete chain of content → object → pointer → history.
Git might seem like magic when you're just starting out—how does it track changes so efficiently? Why do commits feel like snapshots, not diffs? What's actually happening when you run git add
or git commit
?

Let's peel back the curtain and explore how Git works internally. This isn't just for curiosity—it helps you debug issues, understanding branching better, and write smarter scripts or tools around Git.
? Git Is a Content-Addressable Filesystem
At its core, Git is a key-value store . Every object (file, commit, tree, etc.) is stored based on its SHA-1 hash (or SHA-256 in newer versions). That means:

- Identical content = same hash = stored once
- No duplication
- Immutable objects—you can't change something without changing its hash
When you run git add file.txt
, Git:
- Compresses the file content using zlib
- Prepends a header (
"blob <size>\0"</size>
) - Hashes the whole thing → this becomes the object ID
- Stores it in
.git/objects/<first chars>/<remaining chars></remaining></first>
Example:

echo "hello world" | git hash-object --stdin # Output: 22596363b3de4f59589994c199e5a5fb25661360
That hash is now the “address” of your data blob.
? The Four Types of Objects
Git stores four internal object types:
- Blob : Raw file data (no filename, just content)
- Tree : Like a directory—maps filenames to blob or subtree hashes
- Commit : Points to one tree (the project state), plus metadata (author, message, parent commit)
- Tag : Optional annotation for commits (used in releases)
You can inspect any object with:
git cat-file -p <hash>
Try it after making a commit—you'll see the tree hash, author info, and parent commit hash.
? How a Commit Actually Works
When you run git commit
:
- Git takes the current staging area (built via
git add
) and writes each file as a blob - It builds a tree object representing the directory structure
- It creates a commit object pointing to that tree
- The commit also stores:
- Pointer(s) to parent commit(s) (one for normal commits, two for merges)
- Author/committer info
- Commit message
This is why Git commits are snapshots , not diffs—they capture the full state at that moment. Diffs are computed on demand by comparing trees.
? Branches Are Just Pointers
A branch in Git is literally a file in .git/refs/heads/
containing a single commit hash.
cat .git/refs/heads/main # Output: 9a8d2... (a commit hash)
When you make a new commit, Git:
- Creates the new commit object
- Updates the branch file to point to the new commit
No copying, no moving files—just updating a pointer.
?? What Happens During a Merge?
Git doesn't merge files—it merges commits by combining their tree structures.
When you run git merge feature
:
- Git finds the common ancestor (via commit graph)
- Compares the three trees: ancestor, current branch, and feature branch
- Applies changes intelligently—if both branches modified the same line differently, you get a conflict
- Creates a new merge commit with two parents
This is why Git can handle complex histories—it's all about the graph of commits.
? Why This Matters
Understanding these internals helps you:
- Fix corrupted repos (check
.git/objects
) - Write better Git hooks or tools
- Recover lost commits (
git reflog
) - Grasp why rebasing rewrites history (new commits with same different hashes)
It also demystifies commands like git fsck
(checks object integrity) or git gc
(garbage collects unreachable objects).
Git isn't magic—it's elegant design. Once you see how it stores content, builds trees, and links commits, the whole system clicks.
Basically: content → objects → points → history.
That's all there is.
The above is the detailed content of A Deep Dive into How Git Works Internally. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Python development experience sharing: How to carry out version control and release management Introduction: In the Python development process, version control and release management are very important links. Through version control, we can easily track code changes, collaborate on development, resolve conflicts, etc.; and release management can help us organize the deployment, testing and release process of code to ensure the quality and stability of the code. This article will share some experiences and practices in Python development from two aspects: version control and release management. 1. Version control version control

Introduction to SVN SVN (Subversion) is a centralized version control system used to manage and maintain code bases. It allows multiple developers to collaborate on code development simultaneously and provides a complete record of historical modifications to the code. By using SVN, developers can: Ensure code stability and avoid code loss and damage. Track code modification history and easily roll back to previous versions. Collaborative development, multiple developers modify the code at the same time without conflict. Basic SVN Operations To use SVN, you need to install an SVN client, such as TortoiseSVN or SublimeMerge. Then you can follow these steps to perform basic operations: 1. Create the code base svnmkdirHttp://exampl

PHP code version control: There are two version control systems (VCS) commonly used in PHP development: Git: distributed VCS, where developers store copies of the code base locally to facilitate collaboration and offline work. Subversion: Centralized VCS, a unique copy of the code base is stored on a central server, providing more control. VCS helps teams track changes, collaborate and roll back to earlier versions.

Version Control: Basic version control is a software development practice that allows teams to track changes in the code base. It provides a central repository containing all historical versions of project files. This enables developers to easily rollback bugs, view differences between versions, and coordinate concurrent changes to the code base. Git: Distributed Version Control System Git is a distributed version control system (DVCS), which means that each developer's computer has a complete copy of the entire code base. This eliminates dependence on a central server and increases team flexibility and collaboration. Git allows developers to create and manage branches, track the history of a code base, and share changes with other developers. Git vs Version Control: Key Differences Distributed vs Set

How to perform version control and code management in Java development requires specific code examples. Summary: With the expansion of project scale and the need for team collaboration, version control and code management have become crucial aspects of Java development. This article will introduce the concept of version control, commonly used version control tools, and how to manage code. At the same time, specific code examples will also be provided to help readers better understand and practice. 1. The concept of version control Version control is a way of recording changes in file content so that specific versions of files can be accessed in the future.

How to implement version control and code collaboration in PHP development? With the rapid development of the Internet and the software industry, version control and code collaboration in software development have become increasingly important. Whether you are an independent developer or a team developing, you need an effective version control system to manage code changes and collaborate. In PHP development, there are several commonly used version control systems to choose from, such as Git and SVN. This article will introduce how to use these tools for version control and code collaboration in PHP development. The first step is to choose the one that suits you

1. Branching and merging Branches allow you to experiment with code changes without affecting the main branch. Use gitcheckout to create a new branch and use it when trying new features or fixing bugs. Once complete, use gitmerge to merge the changes back to the master branch. Sample code: gitcheckout-bnew-feature // Make changes on the new-feature branch gitcheckoutmain gitmergenew-feature2. Staging work Use gitadd to add the changes you want to track to the staging area. This allows you to selectively commit changes without committing all modifications. Sample code: gitaddMyFile.java3

Version Control in Collaborative Development Version control is a vital technology in software development that allows developers to track code changes, resolve conflicts, and collaborate on development. Version control is especially important in PHP continuous integration because it enables multiple developers to work on the same project at the same time without worrying about overwriting each other's changes. Choosing the Right Version Control System There are a variety of version control systems to choose from, the most popular include: Git: A distributed version control system that is highly scalable and feature-rich. Subversion (svn): A centralized version control system that is easy to use but has poor scalability. Mercurial: Another distributed version control system that is fast and lightweight. For most p
