Fix bug in dpMean$release to fix "variable 'fun' not found" error #49

MeganFantes · 2019-06-05T16:49:30Z

Bug found 6/4

error:

When running the dp-mean.Rmd vignette, at line boot_mean$release(PUMS5extract10000), get the error:

Error in formals(targetFunc) : object 'fun' not found

source:

mechanism-bootstrap.R, line 41:

mechanismBootstrap$methods(
    bootStatEval = function(xi) {
        fun.args <- getFuncArgs(fun, inputList=list(...), inputObject=.self)
        input.vals = c(list(x=x), fun.args)
        stat <- do.call(boot.fun, input.vals)
        return(stat)
})

getFuncArgs uses a variable called fun, but there is no fun parameter in the method signature.

The text was updated successfully, but these errors were encountered:

MeganFantes · 2019-06-06T21:05:15Z

tracing the bug:

the boot_mean object is created in dp-mean.Rmd:

boot_mean <- dpMean$new(mechanism='mechanismBootstrap', var.type='numeric', 
                        variable='income', n=10000, epsilon=0.1, rng=c(0, 750000), 
                        n.boot=n.boot)

Then we have the call:

boot_mean$release(PUMS5extract10000)

which refers to the release method of a dpMean object called boot_mean. This throws the error above.

release method of dpMean object in statistic_mean.R:

dpMean$methods(
    release = function(data, ...) {
        x <- data[, variable]
        sens <- diff(rng) / n
        .self$result <- export(mechanism)$evaluate(mean, x, sens, .self$postProcess, ...)
})

export(mechanism) exports all fields and methods of the mechanism passed into the object.
In this case, the mechanism is the mechanismBootstrap, which has the method evaluate

evaluate method of the mechanismBootstrap class in mechanism-boostrap.R:

mechanismBootstrap$methods(
    evaluate = function(fun, x, sens, postFun) {
        x <- censordata(x, .self$var.type, .self$rng)
        x <- fillMissing(x, .self$var.type, .self$impute.rng[0], .self$impute.rng[1])
        epsilon.part <- epsilon / .self$n.boot
        release <- replicate(.self$n.boot, bootstrap.replication(x, n, sens, epsilon.part, fun=.self$bootStatEval))
        std.error <- .self$bootSE(release, .self$n.boot, sens)
        out <- list('release' = release, 'std.error' = std.error)
        out <- postFun(out)
        return(out)
})

Interesting to note that the ... operator is passed into evaluate, but ... is not in the method signature

According to the stack trace from the error, the problem here is the replicate() method. This method repeats the bootstrap.replication method n.boot times. The problem is in bootstrap.replication.

bootstrap.replication function in mechanism-boostrap.R:

bootstrap.replication <- function(x, n, sensitivity, epsilon, fun) {
    partition <- rmultinom(n=1, size=n, prob=rep(1 / n, n))
    max.appearances <- max(partition)
    probs <- sapply(1:max.appearances, dbinom, size=n, prob=(1 / n))
    stat.partitions <- vector('list', max.appearances)
    for (i in 1:max.appearances) {
        variance.i <- (i * probs[i] * (sensitivity^2)) / (2 * epsilon)
        stat.i <- fun(x[partition == i])
        noise.i <- dpNoise(n=length(stat.i), scale=sqrt(variance.i), dist='gaussian')
        stat.partitions[[i]] <- i * stat.i + noise.i
    }
    stat.out <- do.call(rbind, stat.partitions)
    return(apply(stat.out, 2, sum))
}

fun(x[partition == i]) calls the function that was passed in, which was bootStatEval

Here, I wanted to figure out what x[partition == i] actually means.
x is a vector of values indicating the income of each person in the original dataset.
partition == i is a vector of booleans.
x[partition == i] should be a subset of x, with the values at the indices with TRUE at the original value, and the rest at 0. This is mostly true, except the values at FALSE seem to be random values. I think this is because of a differentially private protocol?
(I figured this out using many print statements throughout the code)

bootStatEval method of the mechanismBootstrap class in mechanism-bootstrap.R:

mechanismBootstrap$methods(
    bootStatEval = function(xi) {
        fun.args <- getFuncArgs(fun, inputList=list(...), inputObject=.self)
        input.vals = c(list(x=x), fun.args)
        stat <- do.call(boot.fun, input.vals)
        return(stat)
})

I think I found the problem:

The mean function is passed into evaluate() and then nothing is done with it. Instead, the function passed into replicate() is set to fun=.self$bootStatEval.

Then, in bootstrap.replication, the function applied to x[partition == i] is bootStatEval.

In the replicate() function call, we do not want to repeat bootStatEval n times, we want to calculate the mean n times, that is what bootstrapping is.

I think the call to bootStatEval should be a hard-coded call somewhere in bootstrap.replication, because bootStatEval is a sanity check (I think?) it is not a parameter that needs to be passed around. mean as the function we are interested in bootstrapping is a parameter we would want to pass around, because we will want to bootstrap different values eventually.

The error happens because bootStatEval expects a parameter called fun, but no such parameter is passed in. I think this fun is the mean function passed into evaluate().

(Similarly, a ... operator is passed into evaluate() and then never used (evaluate does not have a ... in its method signature). Eventually bootStatEval will look for a ... operator and will not find one, and I think the ... from the evaluate() function call is it.)

MeganFantes · 2019-06-06T22:20:22Z

fixed the bug:

in mechanism-bootstrap.r:

changed evaluate = function(fun, x, sens, postFun) {
to: evaluate = function(fun, x, sens, postFun, ...) {

Add the ... operator to the method signature, so we can pass it to getFuncArgs() later

changed release <- replicate(.self$n.boot, bootstrap.replication(x, n, sens, epsilon.part, fun=.self$bootStatEval))
to: replicate(.self$n.boot, bootstrap.replication(x, n, sens, epsilon.part, fun=fun, inputObject = .self, ...))

Change the fun to the input function (in this case, mean)
Add a parameter inputObject to pass the bootstrap mechanism object to bootstrap.replication so we can call bootStatEval later
Add the ... operator to pass to bootStatEval later

changed bootstrap.replication <- function(x, n, sensitivity, epsilon, fun) {
to: bootstrap.replication <- function(x, n, sensitivity, epsilon, fun, inputObject, ...) {
and added: @param inputObject the Bootstrap mechanism object on which the input function will be evaluated

Add a parameter inputObject so we can call bootStatEval
Add the ... operator
Add a line to the documentation noting the new input parameter

changed stat.i <- fun(x[partition == i])
to: stat.i <- inputObject$bootStatEval(x[partition == i], fun, ...)

Add input parameters that will be passed to bootStatEval

changed bootStatEval = function(xi) {
to: bootStatEval = function(xi, fun, ...) {

Add input parameters to the method signature that will be passed to bootStatEval

changed input.vals = c(list(x=x), fun.args)
to: input.vals = c(list(x=xi), fun.args)

Update the variable name to xi instead of x

changed stat <- do.call(boot.fun, input.vals)
to: stat <- do.call(fun, input.vals)

Update the variable name to fun instead of boot.fun

MeganFantes · 2019-06-06T22:20:35Z

Now the dp-mean vignette runs, but the bootstrapped mean will occasionally return NaN as the result

MeganFantes · 2019-06-07T00:01:14Z

The NaNs being produced are from when the partition vector is created in bootstrap.replication. Sometimes when the partition vector is created, one partition is empty.

Also fix a bug that becomes apparent after fixing the original bug: Not all partitions in the bootstrap replication are necessarily be filled, yielding NaN values when the statistic is calculated. Add data validation to ensure only calculating statistic on partitions that contain values.

MeganFantes · 2019-06-07T03:01:26Z

Fixed all problems. Added validation in bootstrap.replication to ensure it is only calculating a statistic for a partition that contains values.

…tStatEval Fix bug in Issue #49

Will discuss with Ira which option is best

…an of the partition means (instead of the sum)

…n instead of sum. This addresses the huge standard error that results from bootstrapping. This was another bug found after fixing Issue #49.

MeganFantes added the bug label Jun 5, 2019

MeganFantes changed the title ~~Fix bug in bootStatEval in mechanismBootstrap to fix variable 'fun' not found error~~ Fix bug in dpMean$release to fix "variable 'fun' not found error" Jun 5, 2019

MeganFantes changed the title ~~Fix bug in dpMean$release to fix "variable 'fun' not found error"~~ Fix bug in dpMean$release to fix "variable 'fun' not found" error Jun 5, 2019

MeganFantes closed this as completed Jun 7, 2019

MeganFantes added a commit that referenced this issue Jun 7, 2019

Merge pull request #50 from privacytoolsproject/MF_Issue49_bugFix_boo…

c303813

…tStatEval Fix bug in Issue #49

MeganFantes added a commit that referenced this issue Jun 7, 2019

Add 2 options to handle empty partitions in Issue #49

5845582

Will discuss with Ira which option is best

MeganFantes reopened this Jun 7, 2019

MeganFantes referenced this issue Jun 7, 2019

Change the return statement of bootstrap.replication to return the me…

32d61d5

…an of the partition means (instead of the sum)

MeganFantes added a commit that referenced this issue Jun 7, 2019

create test file for Issue #49

74b99fe

MeganFantes added a commit that referenced this issue Jun 7, 2019

Change function applied to partitions in bootstrap.replication to mea…

e2fb65c

…n instead of sum. This addresses the huge standard error that results from bootstrapping. This was another bug found after fixing Issue #49.

sktran assigned MeganFantes Jun 11, 2019

globusharris added this to the "Soft" V1 Release milestone Jul 22, 2019

MeganFantes added the Open PR label Sep 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug in dpMean$release to fix "variable 'fun' not found" error #49

Fix bug in dpMean$release to fix "variable 'fun' not found" error #49

MeganFantes commented Jun 5, 2019

MeganFantes commented Jun 6, 2019

MeganFantes commented Jun 6, 2019

MeganFantes commented Jun 6, 2019

MeganFantes commented Jun 7, 2019 •

edited

Loading

MeganFantes commented Jun 7, 2019

Fix bug in dpMean$release to fix "variable 'fun' not found" error #49

Fix bug in dpMean$release to fix "variable 'fun' not found" error #49

Comments

MeganFantes commented Jun 5, 2019

Bug found 6/4

error:

source:

MeganFantes commented Jun 6, 2019

tracing the bug:

MeganFantes commented Jun 6, 2019

fixed the bug:

MeganFantes commented Jun 6, 2019

MeganFantes commented Jun 7, 2019 • edited Loading

MeganFantes commented Jun 7, 2019

MeganFantes commented Jun 7, 2019 •

edited

Loading