Birthday Paradox

If you asked how many people there need to be in a (post-covid) room for two people to have the same birthday, the correct answer would be 366, applying the pigeonhole principle. Well, technically 367, with Feburary 29th. In any case, the point stands, that 366 people in a single room could potentially all have different birthdays, and we would need one more person to ensure that two people would have the same birthday.

This all seems to make sense. Then, how many people does it take for there to be a 50% chance for two people to have the same birthday?

If you guessed 183, you're in for a surprise. Actually, it only takes 23 people in a room for there to be a 50% chance for any two people to have the same birthday. 23!! And actually, it only takes 70 people in a room for there to be 99.9% chance that two people have the same birthday!

How is this possible? This is known as the birthday paradox. It's not technically a paradox, but it rather an unintuitive idea. The key idea is this: we don't care about any single birthday, but we compare every combination of birthdays possible, so we actually have more comparisons than meets the eye. For instance, when we check if any of the 23 people share a birthday, we make 23 C 2 = 23!/(21! * 2!) = 253 comparisons.

A standard method of calculating probabilities is by calculating the chance that some event will not happen, and then subtract that from 1 (or 100%). In this case, to calculate the probability that two people will have the same birthday, we can just calculate the chance that nobody will have the same birthday. Let's walk through this example again with 23 people. When we select people at random, the first person will have a 100% chance not to share the same birthday with anybody else, by default. Then the next person will have a 364/365 chance not to share the same birthday as the first person. Then the next person will have a 363/365 chance to not have the same birthday as the first or second person. Then a 362/365 probability. Then 361/365. And so on. Therefore we get that the proability that 23 people will all have different birthdays is:

1 * 364/365 * 363/365 * ... 343/365 = 0.4927.

So, the chance that two people will have the same birthday is, magically, 1 - 0.4927 = 50.73, or, 50.73%!

Similarly, we can use this calculation to find that with 70 people, we have an astounding 99.91% chance for two people to have the same birthday.

A sample function to calculate the chance that two people will have the same birthday given n people can be written in Haskell as below:

              
                calcBdaySame:: Int -> Float
                calcBdaySame n | n > 365 = 1
                calcBdaySame n = 1 - foldr (\x acc-> acc * x/365) 1 [365,364..(365-n'+1)]
                  where n' = fromIntegral n

Or, if you prefer Python:

              
                def calc_bday_same(num_people):
                   if num_people > 365:
                       return 1
                   bday_not_same = 1    
                   while num_people:
                       bday_not_same *= (365 - num_people + 1)/365
                       num_people -= 1
                   return 1 - bday_not_same

The implication of the birthday paradox is that brute-force securtiy attacks can actually be a lot easier than it seems. For hash collision security attacks, it seems impossible to brute-force 2^128 GUIDs, but in reality, we only need to use about 2^64 until we have a 50% chance of a successful attack. Since the attack may only need to succeed once, having a 50% chance of success -- or in a different perspective, only having a 50% to defend -- is a very dangerous concept.

Hope you got a fun kick out of this article. Next time you are in a room with 22 other people (possibly after vaccination, or in zoom), ask them their birthdays -- the result may surprise you!