Online mean calculation (piece-by-piece)
const Mean = require('online-mean')
// Create a new mean object calling Mean() or new Mean()
// Each of these object stores inner variables
// So we need this extra step to be able to calculate means of multiple data flows
const m1 = Mean()
const m2 = new Mean() // same as const m2 = Mean()
// Update calling mean objects directly, as a function:
;[1, 2, 3, 4, 5].forEach(v => { m1(v) })
// Or via .fit() method. These two ways are identical!
;[4, 5, 6, 7, 8].forEach(v => { m2.fit(v) })
// Using with arrays:
m1([9, 10, 11]) // Keep in mind - this line updates m1 with 3 new data values, not just averages the array
// Get mean value calling mean object with no arguments:
console.log('m1:', m1()) // -> 5.625
// Or via .value getter
console.log('m2:', m2.value) // -> 6
// Total number of observations:
console.log('m2 n:', m2.n) // -> 5
// Merge multiple mean objects:
const m3 = Mean.merge(m1, m2)
console.log('m3:', m3.value) // -> ~5.769
// 'n' and 'value' are not just object keys, but object getters.
// If you really need to change their values use .setN() and .setValue() methods:
m2.setValue(0)
m2.setN(0)
console.log(m2.value, m2.n) // -> 0 0
A classical formula of a sample mean is defined as follows:
ān = (Σi=1..n xi) / n
It basically iterates over all values of x
, sums them and divides on total number of samples. That works when the length of x
is small and you don't need to update the mean when new values added. To calculate mean ā
in on-line fashion we will use a recursive formula:
ān = ān-1 + (xn - ān-1) / n
It's still based on the classical formula, however instead of summing all values we calculate ā
recursively. Derivation:
ān = (Σi=1..n xi) / n
ān = (Σi=1..n-1 xi + xn) / n
ān = ((n-1) × ān-1 + xn) / n
ān = ān-1 + (-ān-1 + xn) / n
ān = ān-1 + (xn - ān-1) / n
To merge multiple means without iterating over all values of x, we will expand the formula a little:
ānm = (Σi=1..n xi + Σj=1..m xj) / (n + m)
ānm = (n * ān + m * ām) / (n + m)
Such multiplications in the numerator could produce quite big numbers, so it's better to change the expression:
ānm = (n * ān + m * ān - m * ān + m * ām) / (n + m)
ānm = ān + (- m * ān + m * ām) / (n + m)
ānm = ān + (ām - ān) * m / (n + m)
Much better! :)