Some clarification for Chapter 8 Observer Bias model formulation

Chapter 8 makes an interesting point about Observer Bias on the Red Line, but it took me a while to understand why the distribution over passengers' observed wait times is greater than the true wait times. After some thought it turns out I was assuming a more complicated model than the text. I don't think either model is unreasonable; my intuition just wasn't on the same page and I didn't find an explicit reason in the text to invalidate my model. The correct model might be obvious to most but perhaps the clarification below will help someone in the future:

The text reads:

> The average time between trains, as seen by a ran- dom passenger, is substantially higher than the true average.
> Why? Because a passenger is more like (sic) to arrive during a large interval than a small one. Consider a simple example: suppose that the time between trains is either 5 minutes or 10 minutes with equal probability. In that case the average time between trains is 7.5 minutes.
> But a passenger is more likely to arrive during a 10 minute gap than a 5 minute gap; in fact, twice as likely. If we surveyed arriving passengers, we would find that 2/3 of them arrived during a 10 minute gap, and only 1/3 during a 5 minute gap. So the average time between trains, as seen by an arriving passenger, is 8.33 minutes.

For this to be true, I believe we have to assume a passenger arriving 0 minutes after the previous train has the same observed waiting time as a  passenger arriving any arbitrary `n > 0` minutes after the train. In other words, a passenger who just missed the previous train and waited the full gap is treated the same as a passenger who just barely made it the train.

My intuition was as follows: In reality, a passenger can arrive at the 9th minute of a 10 minute gap or the 4th minute of a 5 minute gap. Both passengers wait 1 minute. If you model it this way, the biased distribution actually shifts to the left. Why? Let's say there are two passengers arriving per minute (`lam = 2`). For a 2 minute gap, you might have the following wait times for 4 passengers: `[0, 0, 1, 1]`. For a 3 minute gap, you might have the following wait times for 6 passengers: `[0, 0, 1, 1, 2, 2]`. A passenger who waits 0 has arrived just before the train departs. For an `n` minute gap, wait time `n-1` indicates the passenger arrived within the first minute after the previous train departed. From the 2-minute and 3-minute gaps above, you can deduce that across all trains `P(wait n) < P(wait n-1)`. I.e., there is always be a chance for a passenger to wait 0 minutes. But for an e.g. 5 minute gap, it's impossible to wait 6 minutes.

Here is some code to simulate the process and the resulting histogram.
```
from math import floor
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(0)

n = 50000  # Number of trains.
l = 2     # Passengers arriving per minute.
T = np.random.normal(10, 2, n) # True time between trains.
W1 = []   # Passengers' observed waiting time (my initial formulation).
W2 = []   # Passengers' observed waiting time (Think Bayes Formulation).

for t in T:
    size = int(floor(t * l)) # This many passengers will end up on the next train.
    W1 += list(np.random.uniform(0, floor(t), size))
    W2 += list(np.ones(size) * t)

bins = int(T.max() - T.min())
plt.hist(T, color='red', bins=bins, alpha=0.3, normed=True, label='True wait $\mu=%.3lf$' % T.mean())
plt.hist(W1, color='blue', bins=bins, alpha=0.3, normed=True, label='Observed wait $\mu=%.3lf$' % np.mean(W1))
plt.hist(W2, color='green', bins=bins, alpha=0.3, normed=True, label='Observed wait simplified $\mu=%.3lf$' % np.mean(W2))
plt.legend(fontsize=8)
plt.show()
```

![figure_1](https://user-images.githubusercontent.com/8015228/34655951-13ef059e-f3e0-11e7-8aa6-f7bbd2a9ee3c.png)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some clarification for Chapter 8 Observer Bias model formulation #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Some clarification for Chapter 8 Observer Bias model formulation #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions