DIONE
Here we demonstrate our DIONE Bayesian network learning program by solving a probability puzzle.
Observed data typically do not have deterministic relations to possible causes that may explain the data. In most real-world data analysis problems data have probabilistic relations to their causes. This is why probability is an essential concept for data analysis and needs to be well understood to make correct inferences from observed data.
In this blog we will examine probability puzzles and paradoxes and their possible solutions, with the aim of better understanding probability and its application to data analysis.
Solutions to probability puzzles can be very counter-intuitive. This is often related to conditional probabilities and Bayesian reasoning, as in the famous Monty Hall Problem.
Monty Hall Problem
Suppose you’re on a game show, and you’re given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door, say #1, and the host, who knows what’s behind the doors, opens another door, say #3, which has a goat. He says to you, "Do you want to pick door #2?" Is it to your advantage to switch your choice of doors? (vos Savant 1990)
Intuitive Solution
If one door is shown to be a loser, that information changes the probability of either remaining choice, neither of which has any reason to be more likely, to 1/2\", according to Robert Sachs, Ph.D. of George Mason University in (vos Savant 1990).
Other Solution
\"Yes; you should switch. The first door has a 1/3 chance of winning, but the second door has a 2/3 chance. Here’s a good way to visualize what happened. Suppose there are a million doors, and you pick door #1. Then the host, who knows what’s behind the doors and will always avoid the one with the prize, opens them all except door #777,777. You’d switch to that door pretty fast, wouldn’t you?\", according to Marilyn vos Savant in (vos Savant 1990).
Who is right?
We can solve this problem with Dione by loading a set of data to represent the probability space of all possibilities and learning a Bayesian network for these data. The data look like this:
Assuming the doors are numbered 1, 2 and 3, variable CarDoor represents the door that hides the car, GuessedDoor represents the door that is guessed by the player, and OpenedDoor represents the door that is opened by the host after the initial guess. Variables Stay and Switch indicate whether the player wins the car after staying with their initial guess and after switching respectively.
After learning the network structure, we edit the network to reflect the given dependency structure of the problem, as shown at the right hand of this diagram. At the start of the show, there is a car behind one of the doors, represented by variable CarDoor, and the player makes an initial guess, represented by variable GuessedDoor. The door opened by the host, represented by variable OpenedDoor, depends on the door hiding the car and the player\'s initial guess. Depending on the door opened by the host, the player then decides to stay with the initial guess or to switch to the other still closed door, with variables Stay and Switch representing win or loss for each case. We calculate conditional probabilities of each node, as shown here for the nodes that represent the door hiding the car (CarDoor) and the door guessed by the player (GuessedDoor):
At the start of the show, we do not know behind which door the car is, so each door has equal probability of 1/3 to hide the car. We also do not know the initial guess of the player, so each guess has probability of 1/3.
If the initial guess was the door with the car, the game host opens a random other door to show a goat. If the initial guess was a door with a goat, the host opens the other door with a goat, as shown here for the conditional probabilities of OpenedDoor after an initial guess by the player of Door 1:
The first line in the conditional probabilities table shows that, if the car is behind Door 1 and the player correctly guesses Door 1, the host opens Door 2 or Door 3 with equal probability 1/2. If the player guesses Door 1 and the car is behind Door 2, the host opens Door 3 with probability 1, and if the player guesses Door 1 and the car is behind Door 3, the host opens Door 2 with probability 1.
References
Selvin, S. (1990). A Problem in Probability. https://www.tandfonline.com/doi/abs/10.1080/00031305.1975.10479121
vos Savant, M. (1990). Game Show Problem. https://web.archive.org/web/20130121183432/http://marilynvossavant.com/game-show-problem/
Wikipedia. Monty Hall problem.