Proposal for a soft-memoization API #2868

thadumi · 2021-12-06T21:01:51Z

thadumi
Dec 6, 2021

I'd like to discuss about the possibility of improving the current memoization.

Right now we are relaying on java.util.HashMap for storing the values computed by a memoized function. Thus, we are using hard references to store each pair <$argument, $result>, hence if the function will be called with a wide rande of arguments it will lead to a huge amount of memory consumed. This is even more critic when multiple memorized functions are used in the same codebase.

I'd like to address this by proposing an alternative solution or a new api along the current one.
We could create an API

Function$N Function$N#memoization() // current API
Function$N Function$N#softMemoization() // new implementation

or a new module vavr-memoization having an API such as

Function$N Memoization#softMemoize(Function$N)
Function$N Memoization#memoize(Function$N)

A soft-memoized function is a memoized function using a Map<SoftReference<$argument>, SoftReference<$result>>. Using soft references, allows the GC to manage the memorization memory consumption by removing some pair <$argument, $return> when needed.
Also for the soft-memoization we could implement something like guava's LocalCache with softValues

A soft-memoization is also useful for scenario such as during a given continuous time window, Tn, the function is called with a arguments for a limited set of possibile arguments, Sn, and in the next time window, Tn+1, the set, Sn+1, will be composed by others values. Soon or later, the GC will remove from memory all the previous cached values which are no longer used.

danieldietrich · 2021-12-07T15:22:26Z

danieldietrich
Dec 7, 2021

Hi Theodor! 👋

I like the idea and propose the following. We keep the existing function an add one or two new variants. Variant 1 is simple but does only work with Java's Map. Variant 2 is universal and does not assume a specific cache type.

By contract, the cache needs to be mutable. We can't check that at function call time.

This will work for Function1..N the same way, Function0 does not need a cache argument because it only provides one value.

// variant 1
Function2<T1, T2, R> memoized() {
    return memoized(new HashMap<>());
}

// variant 2
Function2<T1, T2, R> memoized() {
    var cache = new HashMap<Tuple2<T1, T2>, R>();
    return memoized(cache::get, cache::put);
}

// variant 1, e.g. works with Java's WeakHashMap
Function2<T1, T2, R> memoized(Map<Tuple2<T1, T2>, R> cache) {
   ...
}

// variant 2, e.g. works with Guava's LoadingCache
Function2<T1, T2, R> memoized(Function<Tuple2<T1, T2>, R> get, BiConsumer<Tuple2<T1, T2>, R> put) {
   ...
}

0 replies

thadumi · 2021-12-07T23:05:35Z

thadumi
Dec 7, 2021
Author

Hi Daniel 😄

I like your second proposal. Actually both variants looks good to me, but personally I'll use more the second one (through function concatenation I could do a lot of funny things).

However let me clarify this. Are you say that you preferer to externalize the responsibility for providing a caching solution rather then creating our own mutable data structure for caching? Maybe we could use the second variant and also provide a module, such as vavr-mutable-ds , containing our default implementation for local caching.

0 replies

danieldietrich · 2021-12-07T23:48:57Z

danieldietrich
Dec 7, 2021

Hi :)

Are you say that you preferer to externalize the responsibility for providing

when having a cache that removes items after a certain time, then memoization will not work as expected. example: we memoize a function that returns random values. the memoized version will return constant values if the the cache keeps items forever. if the cache loses items, the random number will probably change the next time we ask the memoized function.

However, Vavr is built with immutability in mind, so by default we expect that re-calculated values are the same as before and the user needs to keep in mind that they might change if their function isn't pure.

Therefore, it is perfectly ok if we would implement a mutable SoftHashMap on our own (which implements java.util.Map). (Java has no Map with 'soft' entires, right?) We could ship it with the core library, no problem!

What do you think?

0 replies

thadumi · 2021-12-15T05:08:33Z

thadumi
Dec 15, 2021
Author

the user needs to keep in mind that they might change if their function isn't pure.

I strongly agree. Here we are focusing on pure data-structures and pure functions, and the memoization itself cannot make sense if a function is not pure. Furthermore, forcing the usage of any memoization technique on a impure function leads to a completely different behavior anyway. Considering your example, if we memoize a random generator we get the behavior of a Function1.constant(R value) .

Java has no Map with 'soft' entires, right?

Unfortunately I doesn't . During the past week I've been studying the implementation of guava's LocalCache. I'm going to draft our version during the Christmas holidays.

0 replies

ben-manes · 2021-12-17T07:18:30Z

ben-manes
Dec 17, 2021

Please be aware that soft references are problematic as a caching approach. They are collected in lru order, which may mean a flood of low valued objects force out more useful ones. They also had a tendency to cause GC thrashing by filling the old gen, a STW event recovering just enough to unpause, and the cycle repeated. Newer collectors are more aggressive, which means less caching. There is still a higher cost for the gc to handle soft reference processing so the overall penalty is probably higher than expected.

Guava’s cache was originally designed and optimized for this approach. That looked great in microbenchmarks but suffered in practice. It was non trivial to inherit and refocus this towards a size based approach, which is the recommended configuration. Caffeine optimizes for that, causing soft support being a little more expensive footprint wise.

I’d recommend against soft references in practice and consider other approaches (weak is okay).

0 replies

danieldietrich · 2021-12-17T09:12:57Z

danieldietrich
Dec 17, 2021

Thanks for the input @ben-manes, that's interesting! This might be the reason that there isn't a SoftHashMap available in Java's std lib. I agree, we should provide an API (like suggested above) that is independent of the cache implementation. We will describe possible caching solutions in the Javadoc with hints about drawbacks.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vavr

Proposal for a soft-memoization API #2868

Uh oh!

{{title}}

Uh oh!

Replies: 6 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

vavr

Proposal for a soft-memoization API #2868

Uh oh!

thadumi Dec 6, 2021

Replies: 6 comments

Uh oh!

danieldietrich Dec 7, 2021

Uh oh!

Uh oh!

thadumi Dec 7, 2021 Author

Uh oh!

danieldietrich Dec 7, 2021

Uh oh!

thadumi Dec 15, 2021 Author

Uh oh!

ben-manes Dec 17, 2021

Uh oh!

danieldietrich Dec 17, 2021

thadumi
Dec 6, 2021

danieldietrich
Dec 7, 2021

thadumi
Dec 7, 2021
Author

danieldietrich
Dec 7, 2021

thadumi
Dec 15, 2021
Author

ben-manes
Dec 17, 2021

danieldietrich
Dec 17, 2021