Duplicate data bias #889
-
The streaming data source may come in with duplicate data that was seen by the online mode before. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hello. It's an interesting question. This sounds a tad application specific. Why would your application be generating duplicate data? Do you have an example in mind? Or is this something you have encountered? There's a bunch of things that could be done to pass through observations that have already been seen. You could hash the observation and maintain a lookup table to check if an observation has already been seen. You could limit the memory of this lookup table so it doesn't grow too big. I would also like to point out that "model bias" is a bit vague there. In fact, I don't think the term means anything in this context. One could argue that seeing the same observation several times could be helpful. This is the premise of the I hope this helps! |
Beta Was this translation helpful? Give feedback.
Hello.
It's an interesting question. This sounds a tad application specific. Why would your application be generating duplicate data? Do you have an example in mind? Or is this something you have encountered?
There's a bunch of things that could be done to pass through observations that have already been seen. You could hash the observation and maintain a lookup table to check if an observation has already been seen. You could limit the memory of this lookup table so it doesn't grow too big.
I would also like to point out that "model bias" is a bit vague there. In fact, I don't think the term means anything in this context. One could argue that seeing the same observation several times co…