Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipe messing with the environment #146

Open
Enchufa2 opened this issue May 11, 2017 · 10 comments
Open

Pipe messing with the environment #146

Enchufa2 opened this issue May 11, 2017 · 10 comments
Labels

Comments

@Enchufa2
Copy link

Enchufa2 commented May 11, 2017

Sorry in advance because the minimal reprex is not that minimal: it involves Rcpp, and the issue I found is that some objects are not being garbage-collected because of magrittr (I'm adding @eddelbuettel to the loop, just in case I'm mistaken and this has something to do with Rcpp instead). Let's consider the following mini-module:

#include <Rcpp.h>
using namespace Rcpp;

class Foo {
public:
  ~Foo() { warning("foo destructor called"); }
  void add(Function fun) { funcs.push_back(fun); }
private:
  std::vector<Function> funcs;
};

//[[Rcpp::export]]
SEXP Foo__new() { return XPtr<Foo>(new Foo()); }

//[[Rcpp::export]]
SEXP Foo__add(SEXP foo_, Function fun) {
  XPtr<Foo> foo(foo_);
  foo->add(fun);
  return foo_;
}

There is a class Foo that stores pointers to Bar, and Bar stores an R function, which is received through Foo::add. This code can be compiled with Rcpp::sourceCpp(code='<code_here>'). Then, let's play around:

foo <- Foo__new()
foo <- Foo__add(foo, function() 1)
rm(foo)
gc()
# Warning: foo destructor called

Foo's destructor is correctly called and we see the warning. Perfect. But now,

library(magrittr)

foo <- Foo__new() %>%
  Foo__add(function() 1)
rm(foo)
gc()

No warning! This is very serious, because the memory is not being deallocated. My guess is that magrittr could be messing with the environments and there is still a reference to foo somewhere, hidden, so that R is not removing the object. Is it possible?

@Enchufa2
Copy link
Author

Also note that it works correctly with a named function:

func <- function() 1

foo <- Foo__new() %>%
  Foo__add(func)
rm(foo)
gc()
# Warning: foo destructor called

I'm unable to replicate this with R6 classes or reg.finalizer().

@kevinushey
Copy link
Contributor

In this call:

library(magrittr)

foo <- Foo__new() %>%
  Foo__add(function() 1)
rm(foo)
gc()

The parent environment of the function created through function() 1 appears to be not what one would expect -- e.g. I see

Browse[2]> environment(fun)
<environment: 0x110d74518>

rather than R_GlobalEnv. Doing some debugging, that appears to be the same environment associated with function_list[[k]](value) in the call stack.

Browse[2]> as.data.frame(cbind(sys.calls(), sys.frames()))
                                                   V1                         V2
1               Foo__new() %>% Foo__add(function() 1) <environment: 0x103d504a8>
2 withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) <environment: 0x110d74ce0>
3              eval(quote(`_fseq`(`_lhs`)), env, env) <environment: 0x110d74f10>
4                           eval(expr, envir, enclos) <environment: 0x103d50860>
5                                     `_fseq`(`_lhs`) <environment: 0x110d750d0>
6                    freduce(value, `_function_list`) <environment: 0x110d742e8>
7              withVisible(function_list[[k]](value)) <environment: 0x110d74438>
8                           function_list[[k]](value) <environment: 0x110d74518>
9                           Foo__add(., function() 1) <environment: 0x110d746d8>

My guess is that this ultimately comes down to something going wrong with how magrittr modifies the environment of functions it executes.

@Enchufa2
Copy link
Author

Moreover:

Browse[2]> ls(envir = environment(fun), all.names=TRUE)
[1] "."
Browse[2]> environment(fun)$.
<pointer: 0x3369e20>
Browse[2]> foo_
<pointer: 0x3369e20>

There it is the other copy of the pointer! That's why the destructor is not being called. Thanks, @kevinushey, for your help.

@smbache
Copy link
Member

smbache commented May 16, 2017

Is it the same issue with the update branch? Has a simpler approach...

@Enchufa2
Copy link
Author

Just checked and yes, same issue.

@smbache
Copy link
Member

smbache commented May 16, 2017

Hmm it's not clear to me what's going on - one of the Rcpp experts may know... @romainfrancois ?

@Enchufa2
Copy link
Author

The gc destroys an object when it loses all the references to it. In the first and third examples (no pipe or named function), the function's environment is the R_GlobalEnv. Therefore, you remove foo and the gc destroys the object. In the second one (pipe + anonymous function), the function's environment is a new one created by magrittr, and this environment stores another reference to the object. The function is stored inside the object. Thus, when you remove foo, there is this other reference alive, and the object cannot be destroyed.

@eddelbuettel
Copy link

Seems to be an unhappy intersection of @Enchufa2 having code that would really like the object to be destroyed so that gc() is called, and the non-standard approach to evaluating things requiring another copy preventing exactly this.

Sometimes one can fix the data type to add an explicit 'I am done, please unwind' member function to manually do some of the work of the destructor (at the cost of some local code complexity) -- but I am not sure one could do that here with Modules.

@Enchufa2
Copy link
Author

Seems to be an unhappy intersection of @Enchufa2 having code that would really like the object to be destroyed

Yes, basically, I have users' reports about machines running out of memory after many replications of a simulation model with simmer. I recommend them to reset the simulation environment, calling explicitly a function that deallocates resources, but I cannot rely on that.

Nevertheless, knowing what's the problem, I think I can implement in my package a workaround to transparently remove the reference that magrittr creates until this issue is solved.

@Enchufa2
Copy link
Author

Ok, I finally managed to find an example not involving Rcpp (sorry, guys, and thanks for your help). Consider the following not-so-useful function:

store <- function(env1, env2, func) {
  env2$func <- func
  env1
}

Then, as expected,

a <- new.env()
b <- new.env()
reg.finalizer(a, function(e) print("deleting 'a'"))
a <- store(a, b, function() 1)
rm(a); gc()
# deleting 'a'
rm(b); gc()

but

library(magrittr)

a <- new.env()
b <- new.env()
reg.finalizer(a, function(e) print("deleting 'a'"))
a <- a %>% store(b, function() 1)
rm(a); gc()
rm(b); gc()
# deleting 'a'

That's not ok. Finally, a workaround:

library(magrittr)

a <- new.env()
b <- new.env()
reg.finalizer(a, function(e) print("deleting 'a'"))
a <- a %>% store(b, function() 1)
rm(".", envir=environment(b$func)) # the trick
rm(a); gc()
# deleting 'a'
rm(b); gc()

That's ok again.

@hadley hadley added the bug an unexpected problem or unintended behavior label Jul 2, 2018
@lionel- lionel- added wontfix and removed bug an unexpected problem or unintended behavior labels Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants