How does Git use SHA-1 hashes to guarantee that no part of a project's history can be changed without detection?
Git uses SHA-1 (Secure Hash Algorithm 1) hashes as unique, fixed-size fingerprints for all pieces of data it stores, ensuring that any alteration to a project's history is immediately detectable. A SHA-1 hash is a 40-character hexadecimal string, mathematically derived from the exact content of the data. Its key properties are determinism—the same input always produces the same output—and collision resistance, meaning it's computationally infeasible to find two different inputs that produce the same hash, or to find an input that matches a given hash.
Git stores three main types of objects, each identified by its content's SHA-1 hash:
1. Blob objects: These store the exact content of a file. If even a single byte within a file changes, its blob object's SHA-1 hash will be completely different.
2. Tree objects: These represent a directory snapshot. A tree object contains a list of entries, each pointing to a blob object (for a file) or another tree object (for a subdirectory), along with its name and permissions. The hash of a tree object is calculated from its entire content—the list of its entries. If any file or subdirectory content, name, or permission within that directory changes, the tree object's hash changes.
3. Commit objects: These represent a specific point in the project's history. A commit object stores: a pointer (SHA-1 hash) to its top-level tree object (which represents the entire project's file and directory structure at that moment), pointers (SHA-1 hashes) to its parent commit(s) (linking it to previous history), the author, committer, date, and the commit message. The commit object's own SHA-1 hash is calculated from all this information combined.
This system guarantees detection of changes through a chain of dependencies. If any part of a project's history is altered:
Changing a file's content: This changes the blob object's SHA-1 hash. Because the blob's hash is referenced by its containing tree object, the tree object's hash also changes. This propagates up to the top-level tree object, changing its hash.
Changing a directory's structure, file name, or permissions: This directly changes the tree object's content, thus changing its hash, which propagates up to the top-level tree object.
Changing the top-level tree or any metadata (author, message, date) of a specific commit: Since the commit object's hash is calculated from all its contents, including the top-level tree pointer and its own metadata, any such change will result in a different SHA-1 hash for that commit.
Crucially, because each commit object includes the SHA-1 hash of its parent commit(s) as part of its own content, any change to an old commit's SHA-1 hash (due to changes in its files, directories, or metadata) will cause the SHA-1 hash of *all subsequent commitsthat trace back to it to also change. This creates an unbroken, cryptographically verifiable chain. If a Git repository's history is ever altered, the SHA-1 hashes of the affected commit and all its descendants will no longer match the originally recorded hashes. When Git compares histories (e.g., during a fetch or clone operation), it immediately detects this discrepancy, signaling an integrity compromise without ambiguity.