Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with multiple allocation sites #215

Open
peckto opened this issue Jul 18, 2023 · 4 comments
Open

Problems with multiple allocation sites #215

peckto opened this issue Jul 18, 2023 · 4 comments
Labels
C-feature-request Category: Feature Request L-c Language: C

Comments

@peckto
Copy link

peckto commented Jul 18, 2023

The following c code has two allocation sites for the memory referenced by pointer data, with different size values:

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

void foo()
{
    char * data;
    if ((rand() % 2) == 0)
    {
        data = malloc(11);
//        if (data == NULL) {exit(-1);}
    }
    else
    {
        data = malloc(10);
//        if (data == NULL) {exit(-1);}
    }
    char source[11];
    memcpy(data, source, sizeof(source));
    free(data);
}

int main() {
    foo();
    return 0;
}

When analyzing this code with IKOS, the buffer overflow bug in case of the else branch (malloc(10)) is not detected.
Instead, IKOS reports a possible use after free and a possible double free bug:

# Results
main2.c: In function 'foo':
main2.c:19:9: warning: possible use after free, pointer 'data' points to:
                       	* dynamic memory allocated at 'foo:10:16', which might be released
                       	* dynamic memory allocated at 'foo:15:16', which might be released
        memcpy(data, source, sizeof(source));
        ^
main2.c: In function 'foo':
main2.c:20:9: warning: possible double free, pointer 'data' points to:
                       	* dynamic memory allocated at 'foo:10:16', which might be already released
                       	* dynamic memory allocated at 'foo:15:16', which might be already released
        free(data);
        ^

The listed allocations sites are correct, but the memory is not freed.
(Is there any way, that IKOS reports where it thinks the free happens?)

I tried all available abstract domains (including APRON), with the same results, except congruence domain.
With the congruence domain, additionally the following error is reported:

main2.c: In function 'foo':
main2.c:19:9: warning: possible buffer overflow, pointer '&source[0]' accesses 11 bytes at offset 0 bytes of local variable 'source' of size 11 bytes
        memcpy(data, source, sizeof(source));
        ^

Either the finding is indeed related to data instead of source and the message is wrong, or the finding is false positive.

A similar issue is #52, but in my case no pointer to int conversion happens.

When I modify the example to have only one allocation site, with two possible size values, IKOS successfully detects the bug:

#include <string.h>
#include <stdio.h>
#include <stdlib.h>


void foo()
{
    char *data;
    int len;
    if ((rand() % 2) == 0)
    {
        len = 11;
    }
    else
    {
        len = 10;
    }
    data = malloc(len);
    char source[11];
    memcpy(data, source, sizeof(source));
    free(data);
}

int main() {
    foo();
    return 0;
}
# Results
main4.c: In function 'foo':
main4.c:20:5: warning: pointer 'data' might be null
    memcpy(data, source, len);
    ^
main4.c: In function 'foo':
main4.c:20:5: warning: possible buffer overflow, pointer 'data' accesses 11 bytes at offset 0 bytes of dynamic memory allocated at 'foo:18:12' of size at most 11 bytes
    memcpy(data, source, len);
    ^

I know, dynamically allocated memory is not the main priority of IKOS,
still I'm confused why it works in the one case and not in the other.
Can you describe, why this is the case?
Is this a known limitation, or a bug?
Do I need some additional analyzer options?
Or have I missed something else?

@arthaud
Copy link
Member

arthaud commented Jul 19, 2023

Hi @peckto, thanks for reaching out.

Unfortunately, this is a limitation of the analysis. If I remember correctly, this is because we store the size and whether the memory is allocated in hidden variables, so from a low level perspective we see this:

if (cond) {
  data = &dynamic_alloc_line_10;
  size_dynamic_alloc_line_10 = 11;
  dynamic_alloc_line_10_is_allocated = true;
} else {
  data = &dynamic_alloc_line_15;
  size_dynamic_alloc_line_15 = 10;
   dynamic_alloc_line_15_is_allocated = true;
}
memcpy(data, source, sizeof(source));

Those implicit variables are then considered like normal variables by the analysis.
We perform a join between the the two branch and since those variables are only defined on one side, we lose the information, i.e we get:

  size_dynamic_alloc_line_10 -> [0, +oo]
  dynamic_alloc_line_10_is_allocated -> [true, false]
  size_dynamic_alloc_line_15 -> [0, +oo]
   dynamic_alloc_line_15_is_allocated -> [true, false]

We could fix this by treating those variables differently than "traditional" variables.

The reason you get a possible use after free warning is because the buffer overflow checker performs the checks in this order:

  • check if the pointer is null
  • check if the pointer points to allocated memory
  • check if it is a buffer overflow

Here we fail at step 2 because we cannot prove that dynamic_alloc_line_10_is_allocated and dynamic_alloc_line_15_is_allocated are true, we don't report anything for the next steps.

Hence there are 2 things we could improve:

  • Special case those variables so we keep the invariant if there are only defined on one branch (or better, have disjunctive invariants, i.e "variable is uninitialized OR variable is between [a, b]" - but this is complicated with relational domains)
  • Improve the user report, which mentions a use after free although there could be a buffer overflow.

@peckto
Copy link
Author

peckto commented Jul 19, 2023

Hi @arthaud, thank you for the detailed explanation!
Now I better understand the issue.

Taking your example:

if (cond) {
  data = &dynamic_alloc_line_10;
  size_dynamic_alloc_line_10 = 11;
  dynamic_alloc_line_10_is_allocated = true;
} else {
  data = &dynamic_alloc_line_15;
  size_dynamic_alloc_line_15 = 10;
   dynamic_alloc_line_15_is_allocated = true;
}
memcpy(data, source, sizeof(source));

Does this mean, that after join, data points to [dynamic_alloc_line_10, dynamic_alloc_line_15]?
And you determine the size and allocation state based on the related __is_allocated and size_ variables?
Is this something that is part of the abstract domain, or does this happen independent of the domain?

I understand why you do the join the way you do, it's sound.
If you do it this way and lose the precession of the allocation state, the sound answer for the buffer overflow checker should probably be use-after-free and buffer-overflow, because we cannot distinguish.

Regarding the join operation, could you preserve a little bit more precession if you merge only with the values of the branches?

size_dynamic_alloc_line_10 -> [10, 11]
dynamic_alloc_line_10_is_allocated -> [true]
size_dynamic_alloc_line_15 -> [10, 11]
dynamic_alloc_line_15_is_allocated -> [true]
data -> [dynamic_alloc_line_10, dynamic_alloc_line_15]

In this case, we don't have path sensitivity, but the approximation should still be sound?
This would mean to join two variables which are only indirectly connected via the data pointer, not sure if this is possible.

@arthaud
Copy link
Member

arthaud commented Jul 20, 2023

Does this mean, that after join, data points to [dynamic_alloc_line_10, dynamic_alloc_line_15]?

Yes.

And you determine the size and allocation state based on the related _is_allocated and size variables?

Yes.

Is this something that is part of the abstract domain, or does this happen independent of the domain?

The transfer function is done by the NumericalExecutionEngine:
https://github.com/NASA-SW-VnV/ikos/blob/master/analyzer/include/ikos/analyzer/analysis/execution_engine/numerical.hpp#L252
The abstract domain is the value domain:
https://github.com/NASA-SW-VnV/ikos/blob/master/core/include/ikos/core/domain/memory/value.hpp#L95
which wraps a scalar domain, called the composite domain since it is the product of the uninitialized domain, integer domain, points-to domain:
https://github.com/NASA-SW-VnV/ikos/blob/master/core/include/ikos/core/domain/scalar/composite.hpp#L93

Regarding the join operation, could you preserve a little bit more precession if you merge only with the values of the branches?

Yes that's what we should do, but that requires changing a bit our domains.
Right now the uninitialized domain (a map from Variable -> {Bottom, Top, Initialized, Uninitialized}) and the integer domain (say if you use intervals, that's a map Variable -> [LowerBound, UpperBound] are stored into a domain product, and the join is the regular join on domain products:
https://github.com/NASA-SW-VnV/ikos/blob/master/core/include/ikos/core/domain/scalar/composite.hpp#L293-L296
When joining branches you will have:

# left branch
dynamic_alloc_line_10_is_allocated  -> Initialized
dynamic_alloc_line_10_is_allocated  -> [1] # 1=true
# right branch
dynamic_alloc_line_10_is_allocated  -> Uninitialized
dynamic_alloc_line_10_is_allocated  -> [0, +oo]
# join
dynamic_alloc_line_10_is_allocated = Join(Initialized, Uninitialized) = Top (unknown)
dynamic_alloc_line_10_is_allocated = Join([1], [0, +oo]) = [0, +oo]

If we want to fix this, we need to either:

  • Have some hacky logic where we first join the uninitialized domain, find variables that are initialized only on one side and change the logic of the join on the integer domain
  • Change the structure to have a mapping Variable -> (InitializedDomain, IntegerDomain) instead of Variable -> InitializedDomain x Variable -> IntegerDomain. The problem is that this doesn't work with relational numerical abstract domains (they are not a mapping Variable -> SomeInfo)

Now that I'm writing this, I realized we also have another problem, we only set the information for those _is_allocated and _size variables on one branch, so we would actually get dynamic_alloc_line_10_is_allocated -> Top instead of dynamic_alloc_line_10_is_allocated -> Uninitialized in the uninitialized domain.
To fix this, we need to perform some kind of scope analysis so that we initialize those variables before the if/then/else condition.

@peckto
Copy link
Author

peckto commented Jul 24, 2023

Ah ok, that's why the problem is independent of the abstract domain.
I more and more understand how IKOS works under the hood :)

If changing the domain value space is not possible, it sounds like the "hacky logic" to handle the special situation is the best option.
Always good, if one issue leads to another ^^
Maybe the scope analysis goes hand in hand with the "hacky logic" from above?

@ivanperez-keera ivanperez-keera added C-feature-request Category: Feature Request L-c Language: C labels Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: Feature Request L-c Language: C
Projects
None yet
Development

No branches or pull requests

3 participants