Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in dpMean$release to fix "variable 'fun' not found" error #49

Open
MeganFantes opened this issue Jun 5, 2019 · 5 comments
Open
Assignees

Comments

@MeganFantes
Copy link
Contributor

Bug found 6/4

error:

When running the dp-mean.Rmd vignette, at line boot_mean$release(PUMS5extract10000), get the error:

Error in formals(targetFunc) : object 'fun' not found

source:

mechanism-bootstrap.R, line 41:

mechanismBootstrap$methods(
    bootStatEval = function(xi) {
        fun.args <- getFuncArgs(fun, inputList=list(...), inputObject=.self)
        input.vals = c(list(x=x), fun.args)
        stat <- do.call(boot.fun, input.vals)
        return(stat)
})

getFuncArgs uses a variable called fun, but there is no fun parameter in the method signature.

@MeganFantes MeganFantes added the bug label Jun 5, 2019
@MeganFantes MeganFantes changed the title Fix bug in bootStatEval in mechanismBootstrap to fix variable 'fun' not found error Fix bug in dpMean$release to fix "variable 'fun' not found error" Jun 5, 2019
@MeganFantes MeganFantes changed the title Fix bug in dpMean$release to fix "variable 'fun' not found error" Fix bug in dpMean$release to fix "variable 'fun' not found" error Jun 5, 2019
@MeganFantes
Copy link
Contributor Author

tracing the bug:

the boot_mean object is created in dp-mean.Rmd:

boot_mean <- dpMean$new(mechanism='mechanismBootstrap', var.type='numeric', 
                        variable='income', n=10000, epsilon=0.1, rng=c(0, 750000), 
                        n.boot=n.boot)

Then we have the call:

boot_mean$release(PUMS5extract10000)

which refers to the release method of a dpMean object called boot_mean. This throws the error above.

release method of dpMean object in statistic_mean.R:

dpMean$methods(
    release = function(data, ...) {
        x <- data[, variable]
        sens <- diff(rng) / n
        .self$result <- export(mechanism)$evaluate(mean, x, sens, .self$postProcess, ...)
})

export(mechanism) exports all fields and methods of the mechanism passed into the object.
In this case, the mechanism is the mechanismBootstrap, which has the method evaluate

evaluate method of the mechanismBootstrap class in mechanism-boostrap.R:

mechanismBootstrap$methods(
    evaluate = function(fun, x, sens, postFun) {
        x <- censordata(x, .self$var.type, .self$rng)
        x <- fillMissing(x, .self$var.type, .self$impute.rng[0], .self$impute.rng[1])
        epsilon.part <- epsilon / .self$n.boot
        release <- replicate(.self$n.boot, bootstrap.replication(x, n, sens, epsilon.part, fun=.self$bootStatEval))
        std.error <- .self$bootSE(release, .self$n.boot, sens)
        out <- list('release' = release, 'std.error' = std.error)
        out <- postFun(out)
        return(out)
})

Interesting to note that the ... operator is passed into evaluate, but ... is not in the method signature

According to the stack trace from the error, the problem here is the replicate() method. This method repeats the bootstrap.replication method n.boot times. The problem is in bootstrap.replication.

bootstrap.replication function in mechanism-boostrap.R:

bootstrap.replication <- function(x, n, sensitivity, epsilon, fun) {
    partition <- rmultinom(n=1, size=n, prob=rep(1 / n, n))
    max.appearances <- max(partition)
    probs <- sapply(1:max.appearances, dbinom, size=n, prob=(1 / n))
    stat.partitions <- vector('list', max.appearances)
    for (i in 1:max.appearances) {
        variance.i <- (i * probs[i] * (sensitivity^2)) / (2 * epsilon)
        stat.i <- fun(x[partition == i])
        noise.i <- dpNoise(n=length(stat.i), scale=sqrt(variance.i), dist='gaussian')
        stat.partitions[[i]] <- i * stat.i + noise.i
    }
    stat.out <- do.call(rbind, stat.partitions)
    return(apply(stat.out, 2, sum))
}

fun(x[partition == i]) calls the function that was passed in, which was bootStatEval

Here, I wanted to figure out what x[partition == i] actually means.
x is a vector of values indicating the income of each person in the original dataset.
partition == i is a vector of booleans.
x[partition == i] should be a subset of x, with the values at the indices with TRUE at the original value, and the rest at 0. This is mostly true, except the values at FALSE seem to be random values. I think this is because of a differentially private protocol?
(I figured this out using many print statements throughout the code)

bootStatEval method of the mechanismBootstrap class in mechanism-bootstrap.R:

mechanismBootstrap$methods(
    bootStatEval = function(xi) {
        fun.args <- getFuncArgs(fun, inputList=list(...), inputObject=.self)
        input.vals = c(list(x=x), fun.args)
        stat <- do.call(boot.fun, input.vals)
        return(stat)
})

I think I found the problem:

The mean function is passed into evaluate() and then nothing is done with it. Instead, the function passed into replicate() is set to fun=.self$bootStatEval.

Then, in bootstrap.replication, the function applied to x[partition == i] is bootStatEval.

In the replicate() function call, we do not want to repeat bootStatEval n times, we want to calculate the mean n times, that is what bootstrapping is.

I think the call to bootStatEval should be a hard-coded call somewhere in bootstrap.replication, because bootStatEval is a sanity check (I think?) it is not a parameter that needs to be passed around. mean as the function we are interested in bootstrapping is a parameter we would want to pass around, because we will want to bootstrap different values eventually.

The error happens because bootStatEval expects a parameter called fun, but no such parameter is passed in. I think this fun is the mean function passed into evaluate().

(Similarly, a ... operator is passed into evaluate() and then never used (evaluate does not have a ... in its method signature). Eventually bootStatEval will look for a ... operator and will not find one, and I think the ... from the evaluate() function call is it.)

@MeganFantes
Copy link
Contributor Author

fixed the bug:

in mechanism-bootstrap.r:

changed evaluate = function(fun, x, sens, postFun) {
to: evaluate = function(fun, x, sens, postFun, ...) {

  • Add the ... operator to the method signature, so we can pass it to getFuncArgs() later

changed release <- replicate(.self$n.boot, bootstrap.replication(x, n, sens, epsilon.part, fun=.self$bootStatEval))
to: replicate(.self$n.boot, bootstrap.replication(x, n, sens, epsilon.part, fun=fun, inputObject = .self, ...))

  • Change the fun to the input function (in this case, mean)
  • Add a parameter inputObject to pass the bootstrap mechanism object to bootstrap.replication so we can call bootStatEval later
  • Add the ... operator to pass to bootStatEval later

changed bootstrap.replication <- function(x, n, sensitivity, epsilon, fun) {
to: bootstrap.replication <- function(x, n, sensitivity, epsilon, fun, inputObject, ...) {
and added: @param inputObject the Bootstrap mechanism object on which the input function will be evaluated

  • Add a parameter inputObject so we can call bootStatEval
  • Add the ... operator
  • Add a line to the documentation noting the new input parameter

changed stat.i <- fun(x[partition == i])
to: stat.i <- inputObject$bootStatEval(x[partition == i], fun, ...)

  • Add input parameters that will be passed to bootStatEval

changed bootStatEval = function(xi) {
to: bootStatEval = function(xi, fun, ...) {

  • Add input parameters to the method signature that will be passed to bootStatEval

changed input.vals = c(list(x=x), fun.args)
to: input.vals = c(list(x=xi), fun.args)

  • Update the variable name to xi instead of x

changed stat <- do.call(boot.fun, input.vals)
to: stat <- do.call(fun, input.vals)

  • Update the variable name to fun instead of boot.fun

@MeganFantes
Copy link
Contributor Author

Now the dp-mean vignette runs, but the bootstrapped mean will occasionally return NaN as the result

@MeganFantes
Copy link
Contributor Author

MeganFantes commented Jun 7, 2019

The NaNs being produced are from when the partition vector is created in bootstrap.replication. Sometimes when the partition vector is created, one partition is empty.

MeganFantes added a commit that referenced this issue Jun 7, 2019
Also fix a bug that becomes apparent after fixing the original bug:
Not all partitions in the bootstrap replication are necessarily be filled, yielding NaN values when the statistic is calculated. Add data validation to ensure only calculating statistic on partitions that contain values.
@MeganFantes
Copy link
Contributor Author

Fixed all problems. Added validation in bootstrap.replication to ensure it is only calculating a statistic for a partition that contains values.

MeganFantes added a commit that referenced this issue Jun 7, 2019
MeganFantes added a commit that referenced this issue Jun 7, 2019
Will discuss with Ira which option is best
@MeganFantes MeganFantes reopened this Jun 7, 2019
MeganFantes referenced this issue Jun 7, 2019
…an of the partition means (instead of the sum)
MeganFantes added a commit that referenced this issue Jun 7, 2019
MeganFantes added a commit that referenced this issue Jun 7, 2019
…n instead of sum.

This addresses the huge standard error that results from bootstrapping.
This was another bug found after fixing Issue #49.
@globusharris globusharris added this to the "Soft" V1 Release milestone Jul 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants