You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've also worked on matching Fatal Encounters data to FARS data for WA State for police pursuit related fatalities. In the process I have had numerous conversations with the person who runs the research department for the WA Traffic Safety Commision -- this is our local agency that collects, cleans and sends the WA State data to the US NTSC FARS. Like you, I used the police involvement flag to identify the cases in FARS.
The matching I ran produced a venn diagram, with about 50% of the cases identified in both the public (FE) and official (FARS) datasets, and the other half roughly evenly split between single identification in one or the other. All of the FE cases have media article backup, so they're verified. So I hand checked the FE cases not found in FARS for exact date/location matches that were not identified by the police involvement flag (found quite a few), and the remaining cases for nearby dates/locations (since the media sometimes get these wrong, only a couple of these seemed likely).
On the public/FE side, the reason for missing cases is clear: if there's no digital media trace to be scraped, the pursuit will be invisible.
But on the FARS side it's not obvious why cases would be missed.
When I asked the WTSC research director about the missing cases, she said that the police involvement flag is not a reliable indicator for many reasons. One is that the definition is very strict (I'd have to pull out my notes to detail this, will do if you're interested) another is that the system was not originally designed to capture this info, and it often just isn't entered. All of the data entry for this system is voluntary, and the effort made to verify this particular field varies from state to state, but is probably minimal, and more focused on removing identified cases if they don't meet the definition than adding unidentified cases.
I raise all of this b/c in your README you state: "We excluded fatalities identified by other research organizations if a) we could not find news reports or other public records indicating a pursuit occurred, and b) we could not find a match in NHTSA’s “pursuit-involved” fatal crash data in FARS." Because we see missing cases in both datasets (when the other dataset has the record), it's possible that some cases will be in neither dataset.
There is a statistical method for estimating this unobserved fraction -- "mark-recapture" aka "capture-recapture". If anyone in your organization is interested in working on something like this, pls lmk. I think there could be both an academic and a media article in this, and it might even provide a methodology that the Feds would approve to estimate this on a regular basis.
The text was updated successfully, but these errors were encountered:
I've also worked on matching Fatal Encounters data to FARS data for WA State for police pursuit related fatalities. In the process I have had numerous conversations with the person who runs the research department for the WA Traffic Safety Commision -- this is our local agency that collects, cleans and sends the WA State data to the US NTSC FARS. Like you, I used the police involvement flag to identify the cases in FARS.
The matching I ran produced a venn diagram, with about 50% of the cases identified in both the public (FE) and official (FARS) datasets, and the other half roughly evenly split between single identification in one or the other. All of the FE cases have media article backup, so they're verified. So I hand checked the FE cases not found in FARS for exact date/location matches that were not identified by the police involvement flag (found quite a few), and the remaining cases for nearby dates/locations (since the media sometimes get these wrong, only a couple of these seemed likely).
On the public/FE side, the reason for missing cases is clear: if there's no digital media trace to be scraped, the pursuit will be invisible.
But on the FARS side it's not obvious why cases would be missed.
When I asked the WTSC research director about the missing cases, she said that the police involvement flag is not a reliable indicator for many reasons. One is that the definition is very strict (I'd have to pull out my notes to detail this, will do if you're interested) another is that the system was not originally designed to capture this info, and it often just isn't entered. All of the data entry for this system is voluntary, and the effort made to verify this particular field varies from state to state, but is probably minimal, and more focused on removing identified cases if they don't meet the definition than adding unidentified cases.
I raise all of this b/c in your README you state: "We excluded fatalities identified by other research organizations if a) we could not find news reports or other public records indicating a pursuit occurred, and b) we could not find a match in NHTSA’s “pursuit-involved” fatal crash data in FARS." Because we see missing cases in both datasets (when the other dataset has the record), it's possible that some cases will be in neither dataset.
There is a statistical method for estimating this unobserved fraction -- "mark-recapture" aka "capture-recapture". If anyone in your organization is interested in working on something like this, pls lmk. I think there could be both an academic and a media article in this, and it might even provide a methodology that the Feds would approve to estimate this on a regular basis.
The text was updated successfully, but these errors were encountered: