Skip to content

Master (Patient) Index and training dataset #2200

Answered by RobinL
niquola asked this question in Q&A
Discussion options

You must be logged in to vote

👋

Question 1 - what dataset should be use to train the model

Your intuition is correct: the m probabilities of the model of measure characteristics of the data amongst truly matching records. If there are no matches within the training dataset, they cannot be estimated correctly.

You therefore probably want to estimate the m probabilities using a link_type=link_only job, matching one or more of your 'incoming record' datasets to the master patient index. For a generalisable model that works against many different 'incoming record' sources, you might want to use a dataset that includes records from a variety of incoming sources, to 'average out' the parameter estimates.

Conversely, you don…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by niquola
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants