Hash Collision
A hash collision occurs when two distinct inputs produce the same output hash value from a cryptographic hash function, potentially undermining the integrity guarantees that the hash function is designed to provide.
Cryptographic hash functions are designed to be collision-resistant, meaning it should be computationally infeasible to find two different inputs that produce the same hash output. However, due to the pigeonhole principle (infinite possible inputs mapped to a finite output space), collisions are mathematically inevitable. The security question is whether an attacker can find them in a practical timeframe.
Hash collision attacks have had significant real-world impact. The MD5 algorithm was shown to be vulnerable to practical collision attacks as early as 2004, and researchers later demonstrated that these attacks could be used to forge digital certificates. Similarly, the SHA-1 algorithm was theoretically weakened by 2005, and Google demonstrated a practical collision (the SHAttered attack) in 2017, leading to its deprecation for security-sensitive applications.
The consequences of collision vulnerabilities are severe. An attacker who can produce collisions may be able to forge digital signatures, create fraudulent certificates, or substitute malicious files that pass integrity checks. Modern security standards require the use of collision-resistant hash functions such as SHA-256 or SHA-3. Organizations should audit their systems to ensure they are not relying on deprecated hash functions like MD5 or SHA-1 for any security-critical operations including certificate validation, code signing, or password storage.