Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in decompression, strlen(NULL) #47

Open
moneromooo-monero opened this issue Aug 14, 2022 · 18 comments
Open

Crash in decompression, strlen(NULL) #47

moneromooo-monero opened this issue Aug 14, 2022 · 18 comments

Comments

@moneromooo-monero
Copy link

The following program dies dereferencing USX_TEMPLATES[4], which is NULL.

#include <stdio.h>
#include <string.h>
#include "unishox2.h"

int main()
{
  static const char *input = "\252!\355\347;멠<\322\336\346\070\205X\200v\367b\002\332l\213\022\n\003P\374\267\002\266e\207.\210r:\021\225\224\243\353\204\305\352\255\017L/(HH4i\223~\270-\223\206\221\246\212\261\221e\254\375\341\350\037\240X\211lj\325\330u\365\303ʂ\200гM\236&\375\377\071%'?V\025\070\374\026\346s\323$\276\350F\224\r-\226\347ɋ\317\344\214\v\032U\303\353\215\335GX\202\371B\302\355\a\247\273\356C\372\a-\262\006\\\343\"ZH|\357\034\001";
  char out[4096];
  const size_t len = strlen(input); // no zeroes in it

  unishox2_decompress(input, len, out, /*4096,*/ USX_HCODES_DFLT, USX_HCODE_LENS_DFLT, USX_FREQ_SEQ_TXT, USX_TEMPLATES);

  return 0;
}
@siara-cc
Copy link
Owner

Hi, I agree the decompress code should check for NULL, but is the input valid? Was it obtained from unishox2_compress?

@moneromooo-monero
Copy link
Author

The input is invalid. In my case, it may come from untrusted parties.

@siara-cc
Copy link
Owner

ok, I will fix the code to check NULL. Even if it might fix this and not crash, I am not sure if there will be other cases where it might crash for invalid input. I will check those too, but validating the input using a checksum mechanism may be the safest option.

@moneromooo-monero
Copy link
Author

The input is unknown so a malicious attacker can generate any checksum they like along with a malicious input.

@siara-cc
Copy link
Owner

:-o
ok, understood!

@moneromooo-monero
Copy link
Author

Also crashes with input 310a (two bytes with those hex values) in a different place.
Do you want to make this code safe against arbitrary inputs, or do you think it'd take too much work ?

@moneromooo-monero
Copy link
Author

moneromooo-monero commented Aug 14, 2022

If the former, you can build unishox2 with -fsanitize=address, and buikd this test program:

#include <stdint.h>
#include <stdio.h>
#include <malloc.h>
#include "common/unishox2.h"

int LLVMFuzzerTestOneInput(const uint8_t *buf, size_t len)
{
  static char out[16 * 65536];
  unishox2_decompress((const char*)buf, len, out, sizeof(out), USX_HCODES_DFLT, USX_HCODE_LENS_DFLT, USX_FREQ_SEQ_TXT, USX_TEMPLATES);
  return 0;
}

int main(int argc, const char **argv)
{
  FILE *f;
  long pos;
  size_t len;
  char *buf;

  if (argc < 2)
  {
    printf("usage: %s <filename>\n", argv[0]);
    return 1;
  }

  f = fopen(argv[1], "r");
  if (!f) return 1;
  if (fseek(f, 0, SEEK_END) < 0) return 1;
  pos = ftell(f);
  if (pos < 0) return 1;
  len = pos;
  if (fseek(f, 0, SEEK_SET) < 0) return 1;

  buf = malloc(len);
  if (!buf) return 1;
  if (fread(buf, 1, len, f) != (size_t)len) return 1;
  
  return LLVMFuzzerTestOneInput((const uint8_t*)buf, len);
}

Then run:

mkdir -p fuzz-data && echo '       ' > fuzz-data/SEED && afl-fuzz -m none -i fuzz-data -o fuzz-out ./a.out @@

This will leave all crashing inputs in the fuzz-data directory for you to look at.

@moneromooo-monero
Copy link
Author

You need to install ASAN (for -fsanitize=address) and American fuzzy lop (for afl-fuzz).

@siara-cc
Copy link
Owner

This is very nice! 🥇 I will run this and fix the issues. Thanks!

@siara-cc
Copy link
Owner

Hi, I have fixed the issues that showed up on AFL but it is still running and does not look like it will finish anytime soon. But I have gone through the code and think there aren't any more issues. I have made a release with the fixes 1.0.3.

image

@moneromooo-monero
Copy link
Author

That was fast, thanks!

AFL will run for a loooong time. It may find more crashes as it finds data that exercises different code paths. It's an exploration with a very short view range.

I notice it only executes 6 times a second. That is very slow, the test program executes more than 100 times a second for me, on a not too recent laptop. This makes finding new crashes
much faster.

@siara-cc
Copy link
Owner

Hm.. I am running it on Macbook Air 2017. I tried compiling it with -O3 and on a Ramdisk but not much improvement. looks like I will have to find a better machine to run it on. If there are more crashes I will update here again. Thanks !!

@moneromooo-monero
Copy link
Author

Possibly runtime linking ? Loading/linking at every process start is quite heavy.
Running ldd on the test binary should have as few dynamically loaded objects as possible.
My test program has 7 libs.
Running that program here, I get about 1k runs a second.

@siara-cc
Copy link
Owner

Unishox does not depend on other libraries. I ran the equivalent of ldd and got this:

% otool -L a.out   
a.out:
	@rpath/libclang_rt.asan_osx_dynamic.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0)

and I could not find a way of statically linking them. Also it seems it won't make much difference even if I did so. So looks like the only choice to find a faster machine :)

@moneromooo-monero
Copy link
Author

Fair enough. Running on my laptop atm, I'm at 9.6 million execs and zero crashes so far, so it looks like you did a very good job :)

@moneromooo-monero
Copy link
Author

Forgot to mention it when I stopped, but I ran the test till about 116 million runs IIRC, no further issues. It did slow down a lot from the starting 1k/second unfortunately, as afl started generating large test cases (like 80kB).

@siara-cc
Copy link
Owner

siara-cc commented Sep 2, 2022

@moneromooo-monero Thats great news! Thanks for letting me know!!

This exercise was a good one also because I am working on other such projects, especially Unishox3, which is expected to surpass GZip for compression ratio and hopefully reach closer to its speed (I am no Jean-loup Gailly :)), but it would have some demands on RAM/Flash proportionate to desired compression ratio.

@kolinfluence
Copy link

will be great if u have some benchmarks on performance of unishox in time and memory used other than just the obvious storage savings like this just compare with lz4 and snappy will do i guess coz the rest we can reference here:
https://www.scylladb.com/2019/10/07/compression-in-scylla-part-two/
at least hv a guideline on where to use the rest e.g. > 1kb or something

but please do make unishox 3 available first coz seems exciting to have it as production ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants