Skip to content

Duplicate data bias #889

Answered by MaxHalford
occoder asked this question in Q&A
Mar 21, 2022 · 1 comments · 1 reply
Discussion options

You must be logged in to vote

Hello.

It's an interesting question. This sounds a tad application specific. Why would your application be generating duplicate data? Do you have an example in mind? Or is this something you have encountered?

There's a bunch of things that could be done to pass through observations that have already been seen. You could hash the observation and maintain a lookup table to check if an observation has already been seen. You could limit the memory of this lookup table so it doesn't grow too big.

I would also like to point out that "model bias" is a bit vague there. In fact, I don't think the term means anything in this context. One could argue that seeing the same observation several times co…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@smastelini
Comment options

Answer selected by MaxHalford
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants