Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a union of N disjunctive abstract values #139

Open
arthaud opened this issue Jul 11, 2019 · 2 comments
Open

Implement a union of N disjunctive abstract values #139

arthaud opened this issue Jul 11, 2019 · 2 comments
Labels
C-feature-request Category: Feature Request P-medium Priority: Medium

Comments

@arthaud
Copy link
Member

arthaud commented Jul 11, 2019

We could implement an abstract domain containing a union of at most N disjunctive abstract values. This should be implemented as an option to the analyzer, so that the user can specify N.

This could be used to unroll loops and fix false positives, for instance #39 or #102. This is somewhat similar to #137.

There are still questions on how to implement the abstract operators such as join, meet, widening and narrowing.

@arthaud arthaud added C-feature-request Category: Feature Request P-medium Priority: Medium labels Jul 11, 2019
@yakobowski
Copy link

I would suggest you think twice about implementing this. This is basically what Frama-C/Value used to do for trace partitioning and loop unrolling, under the name -slevel. The argument of this option was the number of states to propagate in parallel.

Over the years, we ran into tons of problems caused by the partitioning strategy. It is utterly impossible to predict where the states will be consumed, even in simple looking function. The worst offender is

for (...) {  if (c) { ...  } }

where c is not entirely determined by the loop counter. You are going to spend an exponential amount of states to go through the loop, meaning that even very small loops will not get properly unrolled. (And the analysis time will increase dramatically.)

Other problems include, in no particular order:

  • the fact that users see -slevel as a silver bullet, and specify values that are too high;
  • the instability of the analysis results: as soon as you change something in your propagation algorithm, the precision of the analysis will be impacted (in both directions, of course);
  • the impossibility to "guess" what the analysis will do, because there is no specification on where -slevel will get consumed;
  • you will be adding algorithmic complexity if you try avoid re-propagating states that you already saw once.

My former colleagues at CEA finally managed to implement other partitioning strategies (1) to unroll loops; (2) to propagate disjuncts according to a clear criterion. My understanding is that they are much happier with this approach. Of course, the dataflow iterator is quite a bit more complex, but this is definitely worth it.

Feel free to ask questions here or in private if you're not convinced. I can also put you in contact with my former team, which has first-hand experience with the new engine.

@arthaud
Copy link
Member Author

arthaud commented Jul 19, 2019

Thanks for your feedback @yakobowski!

I was not aware of all these issues.


Over the years, we ran into tons of problems caused by the partitioning strategy. It is utterly impossible to predict where the states will be consumed, even in simple looking function. The worst offender is

for (...) {  if (c) { ...  } }

where c is not entirely determined by the loop counter. You are going to spend an exponential amount of states to go through the loop, meaning that even very small loops will not get properly unrolled. (And the analysis time will increase dramatically.)

Interesting. Did you see this pattern a lot in real codes?

LLVM has an optimization called loop invariant code motion that can move the if outside the loop, if it is sound. Maybe we could try that.


* the fact that users see `-slevel` as a silver bullet, and specify values that are too high;

I know a few Frama-C users and they all use -slevel. It seems to improve the precision a lot. If you are concerned about users using too high values, maybe we could just write this down in the documentation?


* the instability of the analysis results: as soon as you change something in your propagation algorithm, the precision of the analysis will be impacted (in both directions, of course);
* the impossibility to "guess" what the analysis will do, because there is no specification on where `-slevel` will get consumed;

Agreed!


* you will be adding algorithmic complexity if you try avoid re-propagating states that you already saw once.

I didn't really think about this. Did you get a big performance win after implementing this optimization?


My former colleagues at CEA finally managed to implement other partitioning strategies (1) to unroll loops; (2) to propagate disjuncts according to a clear criterion. My understanding is that they are much happier with this approach. Of course, the dataflow iterator is quite a bit more complex, but this is definitely worth it.

I'm actually focusing on #137 first.

The main idea is to propagate disjunctions for the different return statements of a function, to avoid unions between the success states and the failure states (see #136 for instance). I'm still thinking about how to implement this. We could use the return value as the disjunction criteria. This probably works well when returning an integer, but what about other cases?

Is it something you have implemented in Frama-C? I saw the -eva-split-return parameter. I would like to talk to you about this. We can communicate here, by email or by Skype, as you want.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: Feature Request P-medium Priority: Medium
Projects
None yet
Development

No branches or pull requests

2 participants