Git stores snapshots, not changesets, using a content-addressable file system with four object types: 1. Blobs store file content without metadata, identified by the SHA-1 hash of the content; 2. Trees represent directories, containing references to blobs and subtrees with file modes and names; 3. Commits point to a root tree and include metadata like author, timestamp, message, and parent commits, forming a history chain; 4. Tags are named, often signed, references to objects, typically commits, for marking versions. Branches are movable pointers to commits stored in .git/refs/heads/, automatically updated on commit, while tags remain fixed. Each object is stored under its SHA-1 hash in .git/objects/, ensuring deduplication and integrity. When committing, Git hashes files in the index as blobs, builds directory trees, creates a commit pointing to the root tree and parent, and updates the branch reference. The commit history forms a directed acyclic graph (DAG) via parent links. All objects are immutable, and Git uses compression to stay efficient. Understanding this model clarifies commands like git rebase, git reset, and git reflog as operations that manipulate pointers or create new objects, revealing Git’s core as a system of hashes, trees, and chains for versioning files.
Git’s internal data model is simpler than most people think — once you understand the core ideas. At its heart, Git is a content-addressable file system with a few higher-level tools to manage versioned code. It doesn’t store changesets like traditional VCS tools; instead, it stores snapshots of your entire project at different points in time. Let’s break down how this works.
The Four Main Object Types
Git uses four types of objects internally: blobs, trees, commits, and tags. All of these are stored in .git/objects/
and are identified by their SHA-1 hash.
1. Blobs (Binary Large Objects)
- Represents the content of a file.
- Does not store filenames, permissions, or metadata — just raw file content.
- Example: If you have a file
hello.txt
with content"Hello, world!"
, Git creates a blob object for that content. - The key is the SHA-1 hash of the content:
echo -n "Hello, world!" | git hash-object --stdin
→ gives you the blob’s ID.
2. Trees
- Acts like a directory — it stores references to blobs and other trees (subdirectories).
- Contains a list of entries: file mode, object type, filename, and SHA-1 of the blob/tree.
- Trees represent the structure of your project at a point in time.
- You can think of a tree as a snapshot of a directory.
3. Commits
- Points to a single tree (the root directory of your project at that moment).
- Contains metadata: author, committer, timestamp, log message, and crucially — zero or more parent commits.
- This is what creates the history chain.
- A commit with one parent is a normal commit; two parents means a merge.
4. Tags
- A named reference to another object (usually a commit).
- Used to mark specific points in history (e.g.,
v1.0.0
). - Can be signed for authenticity.
How Git Builds History: The Commit Chain
When you make a commit:
- Git takes the current state of your staging area (index).
- Creates blobs for each file.
- Builds a tree (or nested trees) representing the directory structure.
- Creates a commit object pointing to that root tree.
- The commit also references its parent(s) — the previous commit(s).
Because each commit points back to its parent, Git forms a directed acyclic graph (DAG) of history.
Example:
A ← B ← C ← D
Each letter is a commit. D points to C, C to B, and so on. The chain is built via the parent field in commit objects.
References: Branches and Tags
Git uses references (refs) to make it easier to refer to commits.
- Branches are just pointers to a commit.
git branch feature
creates a file.git/refs/heads/feature
containing a commit hash. - When you commit, the branch reference automatically moves forward.
- HEAD is a special pointer that says which branch (or commit) you’re on.
Tags are also refs but usually don’t move — they’re fixed labels.
Content-Addressable Storage: Everything is Named by Hash
Every object is stored under a filename that is the SHA-1 hash of its content:
.git/objects/de/adbabe12... → a blob, tree, or commit
This means:
- Identical content → same hash → stored only once (deduplication).
- Any change in content → completely different hash.
- Tampering is detectable — the hash would change.
You can inspect objects using:
git cat-file -p <hash>
What Happens When You Commit?
Here’s a simplified view:
- Files in the index are hashed and stored as blobs.
- Git builds a tree for each directory, referencing blobs and subtrees.
- A commit object is created pointing to the root tree and parent commit.
- The current branch reference is updated to point to this new commit.
No deltas — just snapshots. Even though it sounds inefficient, Git uses compression and packing (via git gc
) to stay lean.
Key Takeaways
- Git stores snapshots, not diffs.
- Everything is immutable — once created, objects never change.
- The SHA-1 hash acts as a unique ID and integrity check.
- Branches are just moveable pointers to commits.
- The history is a graph built from commit parent links.
Understanding this model helps demystify commands like git rebase
, git reset
, and git reflog
, because you realize they’re just moving pointers or creating new objects.
Basically, Git is a smart way to version files using hashes, trees, and chains. Once you see it, it clicks.
The above is the detailed content of Understanding Git's Internal Data Model. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Packfile is an efficient mechanism used by Git to package, compress and transfer repository objects. When you execute gitpush, gitfetch or gitclone, what Git actually transmits is the packfile; 1. It is initially generated by loose objects through gitgc or gitrepack commands and stored in the .git/objects/pack/ directory; 2. The packfile not only contains object data, but also records the delta relationship between objects, and achieves rapid search with index file (.idx). 3. This design reduces the transmission volume and improves synchronization efficiency; 4. A large number of small packfiles may affect performance, and can be used through gitgc or git

To view Git commit history, use the gitlog command. 1. The basic usage is gitlog, which can display the submission hash, author, date and submission information; 2. Use gitlog--oneline to obtain a concise view; 3. Filter by author or submission information through --author and --grep; 4. Add -p to view code changes, --stat to view change statistics; 5. Use --graph and --all to view branch history, or use visualization tools such as GitKraken and VSCode.

To delete a Git branch, first make sure it has been merged or no retention is required. Use gitbranch-d to delete the local merged branch. If you need to force delete unmerged branches, use the -D parameter. Remote branch deletion uses the gitpushorigin-deletebranch-name command, and can synchronize other people's local repositories through gitfetch-prune. 1. To delete the local branch, you need to confirm whether it has been merged; 2. To delete the remote branch, you need to use the --delete parameter; 3. After deletion, you should verify whether the branch is successfully removed; 4. Communicate with the team to avoid accidentally deleting shared branches; 5. Clean useless branches regularly to keep the warehouse clean.

ToswitchGitbranches,firstupdatethelocalrepowithgitfetch,checkexistingbrancheswithgitbranchcommands,thenusegitcheckoutorgitswitchtochangebranches,handlinguncommittedchangesbycommitting,stashing,ordiscardingthem.WhenswitchingGitbranches,ensureyourlocal

To discard the modifications in the Git working directory and return to the state of the last commit, 1. For the modifications of the tracked files, use gitcheckout-- or gitcheckout--. Discard all modifications; 2. For new files that are not tracked, use gitclean-f to delete the files. If the directory is included, use gitclean-fd. Before execution, use gitclean-fd to preview the delete content; 3. If you need to reset all changes (including the temporary storage area and the working directory), use gitreset-hard. This command will reset the working directory and the temporary storage area. Be sure to operate with caution. These methods can be used individually or in combination to achieve the purpose of cleaning up the working directory.

To add a subtree to a Git repository, first add the remote repository and get its history, then merge it into a subdirectory using the gitmerge and gitread-tree commands. The steps are as follows: 1. Use the gitremoteadd-f command to add a remote repository; 2. Run gitmerge-srecursive-no-commit to get branch content; 3. Use gitread-tree--prefix= to specify the directory to merge the project as a subtree; 4. Submit changes to complete the addition; 5. When updating, gitfetch first and repeat the merging and steps to submit the update. This method keeps the external project history complete and easy to maintain.

Git hooks are used to automatically run scripts before and after commits, pushes and other operations to execute tasks. Specific uses include: 1. Run code checks or tests before submission; 2. Forced submission information format; 3. Send notifications after push. They help unify team specifications and reduce manual steps, such as preventing submissions when tests fail. Git hooks are located in the .git/hooks/ directory in the repository and are not shared by default. They need to be copied manually or used tools such as Husky for team collaboration. Writing a basic hook requires creating an executable file and naming the corresponding event, such as pre-commit, and writing logical judgments there to block or allow operations.

Soundstageafafileiititwittingchatcase, USEGITIZEADTORDOREMEVOME FROMARNINGAREAILACT.TOUNDACT Rungit Reset.ForPartialStialing, Usgit rests-PtointelavEevstehuncificisshunissehunissue
