Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add some comments on common pitfalls #4240

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ Thank you for taking time to contribute to Joern! Here are a few guidelines to e
* Remember to:
- Remember to format your code, i.e. run `sbt scalafmt Test/scalafmt`
- Add a unit test to verify your change.
* We have started to collect some general [best practices](./changelog/generalBestPractices.md) for common pitfalls, especially with respect to performance and concurrency.

### IDE setup

Expand Down
143 changes: 143 additions & 0 deletions changelog/generalBestPractices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# Summary
This document contains some best practices and general remarks about the performance
of various common tasks, as well as best practices for joern devolopment.

This document exists because we have at various points encountered these issues.

Some points are very general to programming -- but if something is observed as a common
pitfall inside joern, then it deserves to be addressed here.


## Folds

Suppose we have `input: Iterable[List[T]]` and want to construct the concatenation.

A natural way could be
```
val res = input.foldLeft(Nil){_ ++ _}
```
This is a catastrophy: It boils down to
```
( ((Nil ++ input(0)) ++ input(1)) ++ input(2) ) ++ ...
```
However, the runtime of `A ++ B` scales like `A.length` for linked list, and this can
end up costing us `res.length ** 2`, i.e. quadratic runtime.

Instead, the correct way is
```
val res = input.foldRight(Nil){_ ++ _}
```
This is, however, contingent on the internals of the List implementation! If we
instead were to collect into an Arraybuffer, the correct way would be
```
input.foldLeft(mutable.ArrayBuffer.empty[T]){case (acc, items) => acc.appendAll(items)}
```
This is because List allows O(1) prepend, and ArrayBuffer allows O(1) append.

Do not write `buf.appendedAll` -- that pulls a copy and runtime will always be quadratic!

Now, if you write `input.fold`, then the associativity is indeterminate. This means that
you must always assume the worst imaginable execution order!

### Fundamental Theorem
The fundamental theorem on getting the right complexity class for your folds is the following:

Assume that each item has a length, and that
`combine(left, right).weight == left.weight + right.weight`, and assume that the runtime of
`combine(left, right)` is bounded by `min(left.weight, right.weight)`.

Then, the total runtime of accumulation is upper bounded by
`input.map{_.weight}.sum * log2(input.map{_.weight}.sum)`.

Proof: Consider the function `F(a)=a*log2(a)` and then track the evolution of
```time_already_spent - remaining_work.map{F(_.weight)}.sum```
We will show that this quantity is non-increasing. Since this quantity is initially negative
(no time was spent!) it must be negative once we are done, which gives the desired equation
```
time_spent - F(remaining_work.map{_.weight}.sum) == time_spent - F(output.weight) < 0
```
So to prove this inductively, suppose we combine two items with weights `a <= b`. We
compare the critical quantity before and after this update, to obtain
```
Delta == time_already_spent_after - remaining_work_after.map{F(_.weight)}.sum
- time_already_spent_before + remaining_work_before.map{F(_.weight)}.sum
<= a - F(a+b) + F(a) + F(b)
== a - a*log2(a+b) - b*log2(a+b) + a*log2(a) + b log2(b)
== a - a*log2(1 + b/a) - b*log2(1 + a/b)
< a - a*log2(1 + b/a)
<= 0
```
The last inequality was because `a <= b` and hence `log2(1+b/a) >= 1`.

### More examples

A good example of what not to do is java's `Collectors.toList()`. This is intended for
use on parallel streams; it uses an internal ArrayList. However, `ArrayList` only permits
fast append, no fast prepend. Yeah, lol, quadratic runtime (because associativity depends
on races).

To see how it should be done is seen in ForkJoinParallelCpgPass. Morally speaking, the relevant
function is
```
def combine[T](left:mutable.ArrayDeque[T], right:mutable.ArrayDeque[T]):mutable.ArrayDeque[T] = {
if(left.size < right.size) right.prependAll(left)
else left.appendAll(right)
}
```
This is the fundamental point: If you want `fold` to be fast, then you must handle both cases.

Another example that works is `++` for scala `Vector`. It is a fun exercise to step over the
code and see that both prependedAll and appendedAll are fast!

With respect to performance, it is recommended to use mutable datastructures by default, and
only use all the immutable stuff if necessary: That is, if you would otherwise require locks
or if your algorithm requires snapshots.

### Java stream collect
It is worthwhile to take another look at the java stream collector. It is used in CpgPass like this

```
//parts:Array[T]
externalBuilder.absorb(
java.util.Arrays
.stream(parts)
.parallel()
.collect(
new Supplier[DiffGraphBuilder] {
override def get(): DiffGraphBuilder =
new DiffGraphBuilder
},
new BiConsumer[DiffGraphBuilder, AnyRef] {
override def accept(builder: DiffGraphBuilder, part: AnyRef): Unit =
runOnPart(builder, part.asInstanceOf[T])
},
new BiConsumer[DiffGraphBuilder, DiffGraphBuilder] {
override def accept(leftBuilder: DiffGraphBuilder, rightBuilder: DiffGraphBuilder): Unit =
leftBuilder.absorb(rightBuilder)
}
)
)
```
The stream collect API is, at its core, a glorified parallel fold. Noteworthy things are:
1. We don't supply a single accumulator, we supply an accumulator factory. This is important for parallelism, otherwise we'd degrade to a foldLeft!
2. Ideally we only get one accumulator per cpu core. Each accumulator uses its `runOnPart` to absorb the output of the next `part`. We especially don't
allocate an accumulator (i.e. a new DiffGraphBuilder) for every `part` -- that would limit us to parallelisms where each `part` is large (e.g. each
`method`) and would be bad for the case of many cheap `parts`.
3. Note how we collect everything into an array before building the stream. We do _not_ use some kind of generic SplitIterator like `java.util.Spliterators.spliteratorUnknownSize`.
Read the code of that function and consider whether that would be a good idea in the context of CpgPass.
4. The code for merging, i.e. `leftBuilder.absorb(rightBuilder)` has already been discussed above (especially the fact that it is ArrayDeque-based).

### API considerations: Beware of Iterator!
You should not offer a generic API that works on (lazy) Iterator. If you decide to do that, then immediately and single-threadedly collect the
Iterator, before passing it on to complex higher order functions like fold or stream-collect.

The reason is that iterators are lazy. You don't know what side-effects and computations are hidden in them! And if execution of the iterator is triggered by
complex higher order functions, then this may very well contain data races!

Or look at some code that we found in joern: `iter.map{item => executer.submit{() => expensive_function(item)}}.map{_.get()}.toList`, where `iter` was an Iterator.
This code wanted to run `expensive_function` in parallel on all items in the iterator, and collect the results into a list.

In fact this code was single-threaded, because iterators and maps of iterators are lazy. So the first time something will be called is when `toList` tries to collect the first
output; then it schedules the first computations, awaits its result, and only then schedules the second computation.

TLDR: Iterators are scary. Collect them eagerly.
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ type NamespaceToTypeMap = Map[String, Set[CSharpType]]
* [[CSharpProgramSummary.jsonToInitialMapping]] for generating initial mappings.
*/
class CSharpProgramSummary(initialMappings: List[NamespaceToTypeMap] = List.empty) extends ProgramSummary[CSharpType] {

//fixme: Use mutable datatypes, otherwise folds are quadratic
override val namespaceToType: NamespaceToTypeMap = initialMappings.reduceOption(_ ++ _).getOrElse(Map.empty)
def findGlobalTypes: Set[CSharpType] = namespaceToType.getOrElse(Constants.Global, Set.empty)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import io.joern.x2cpg.Defines as XDefines
import scala.annotation.targetName

class RubyProgramSummary(initialMap: Map[String, Set[RubyType]] = Map.empty) extends ProgramSummary[RubyType] {
//fixme: Use mutable datatypes, otherwise folds are quadratic

override val namespaceToType: Map[String, Set[RubyType]] = initialMap

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ object ProgramSummary {
/** Combines two namespace-to-type maps.
*/
def combine[T <: TypeLike[_, _]](a: Map[String, Set[T]], b: Map[String, Set[T]]): Map[String, Set[T]] = {
//fixme: Use mutable datatypes, otherwise folds are quadratc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this suggesting that we'd want something more like DiffGraphBuilder.absorb?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

The reason is described in the generalBestPractices.md: We parallelize via map-reduce / fold style, with a combine function.

In order to avoid quadratic runtime with arbitrary folds, the cost of combining is only allowed to scale linearly (or log-linearly) in the smaller of the two to-be-merged guys.

Alternatively we can (and maybe should?) do something simpler: Use a parallel map to construct all the ProgrammSummaries, and then use a sequential foldl to combine.

If we do that, we can use a simpler pattern like having a ProgramSummaryBuilder that uses mutable structures and has two functions: summaryBuilder.addSubSummary(summary: ProramSummary): this.type and summaryBuilder.build(): ProgramSummary.

This simpler pattern doesn't work well for general CpgPass, because some of our passes parallelize over a generateParts() that create millions of parts; but it works well if we only parallelize over e.g. files or methods.

val accumulator = mutable.HashMap.from(a)

b.keySet.foreach(k =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ package io.joern.x2cpg.utils
import java.util
import java.util.concurrent.{Callable, Executors}
import java.util.stream.{Collectors, StreamSupport}
import java.util.{Collections, Spliterator, Spliterators}
import java.util.{Collections, Spliterator, Spliterators, stream}
import scala.jdk.CollectionConverters.*
import scala.util.Try

Expand Down Expand Up @@ -50,13 +50,13 @@ object ConcurrentTaskUtil {
* @return
* an array of the executed tasks as either a success or failure.
*/
def runUsingSpliterator[V](tasks: Iterator[() => V]): List[Try[V]] = {
StreamSupport
.stream(Spliterators.spliteratorUnknownSize(tasks.asJava, Spliterator.NONNULL), /* parallel */ true)
def runUsingSpliterator[V](tasks: Iterator[() => V]): Seq[Try[V]] = {
scala.collection.immutable.ArraySeq.ofRef(
java.util.Arrays
.stream(tasks.toArray)
.parallel()
.map(task => Try(task.apply()))
.collect(Collectors.toList())
.asScala
.toList
.toArray).asInstanceOf[scala.collection.immutable.ArraySeq[Try[V]]]
}

}