Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change to get_value_str() to escape regexes broke capa2yara.py #1909

Open
ruppde opened this issue Dec 13, 2023 · 5 comments
Open

Change to get_value_str() to escape regexes broke capa2yara.py #1909

ruppde opened this issue Dec 13, 2023 · 5 comments
Labels
good first issue Good for newcomers

Comments

@ruppde
Copy link
Contributor

ruppde commented Dec 13, 2023

Description

With 58e94a3 the regexes returned by get_value_str() are escaped which breaks e.g.

bytesv = kid.get_value_str()

Steps to Reproduce

Run

python ./scripts/capa2yara.py  rules/host-interaction/file-system/reference-absolute-stream-path-on-windows.yml 2>&1 |grep x5D

The 2nd line shows the regex escaped, which is of no use in yara:

...
INFO:capa2yara:doing kids: [regex(string =~ /^(\\\\\?\\)?([\w]\:|\\)(\\((?![\<\>\"\/\|\*\?\:\\])[\x20-\x5B\x5D-\x7E])+)+\:\$?[a-zA-Z0-9_]+/)] - len: 1
INFO:capa2yara:doing regex: '/^(\\\\\\\\\\\\\\\\\\\\?\\\\\\\\)?([\\\\w]\\\\:|\\\\\\\\)(\\\\\\\\((?![\\\\<\\\\>\\\\\\"\\\\/\\\\|\\\\*\\\\?\\\\:\\\\\\\\])[\\\\x20-\\\\x5B\\\\x5D-\\\\x7E])+)+\\\\:\\\\$?[a-zA-Z0-9_]+/'
...

Expected behavior:

No escaping

Actual behavior:

See above

Versions

Most recent github version

Additional Information

How should we fix this? Introduce another function which returns the regex unescaped?

(capa2yara.py is the only script in scripts/ which uses the function, so shouldn't have broken more)

@williballenthin
Copy link
Collaborator

williballenthin commented Dec 14, 2023

here's the regex in the raw rule yaml:

- string: /^(\\\\\?\\)?([\w]\:|\\)(\\((?![\<\>\"\/\|\*\?\:\\])[\x20-\x5B\x5D-\x7E])+)+\:\$?[a-zA-Z0-9_]+/

https://github.com/mandiant/capa-rules/blob/57b3911a72462e0597ca0d6685f8b02b38857765/host-interaction/file-system/reference-absolute-stream-path-on-windows.yml#L17C1-L17C112

and as logged above, from get_value_str:

[regex(string =~ /^(\\\\\?\\)?([\w]\:|\\)(\\((?![\<\>\"\/\|\*\?\:\\])[\x20-\x5B\x5D-\x7E])+)+\:\$?[a-zA-Z0-9_]+/)]

and they look the same to me. So get_value_str returns the regex as provided in the capa rule. This can (almost directly) be passed to re.compile to create a regular expression instance. I assume it can be converted to a yara-compatible regex trivially, too.

Can you explain the problem in a little more detail?

@ruppde
Copy link
Contributor Author

ruppde commented Dec 14, 2023

The line after that is the problem. That's what ends up in the yara rule and it's escaped when it shouldn't:
...
INFO:capa2yara:doing regex: '/^(\\\\\\\\\\?\\\\)?([\\w]\\:|\\\\)(\\\\((?![\\<\\>\\\"\\/\\|\\*\\?\\:\\\\])[\\x20-\\x5B\\x5D-\\x7E])+)+\\:\\$?[a-zA-Z0-9_]+/'
...

@williballenthin
Copy link
Collaborator

is this possibly because the logging statement uses %r instead of %s? i don't think there's any extra escaping being done by capa.

@ruppde
Copy link
Contributor Author

ruppde commented Dec 15, 2023

Should be this line from the commit above:
image

@mr-tz mr-tz added the good first issue Good for newcomers label Feb 5, 2024
@acelynnzhang
Copy link
Contributor

I took a look at this a few days ago, but I couldn't figure out what was going on. Here's my output after reverting 58e94a3:

❯ python scripts/capa2yara.py -q rules/host-interaction/file-system/reference-absolute-stream-path-on-windows.yml > test.yar
...

import "pe"


private rule capa_pe_file : CAPA {
    meta:
        description = "Match in PE files. Used by other CAPA rules"
    condition:
        uint16be(0) == 0x4d5a
        or uint16be(0) == 0x558b
        or uint16be(0) == 0x5649
}


rule capa_reference_absolute_stream_path_on_Windows : CAPA  {
  meta:
 	description = "reference absolute stream path on Windows (converted from capa rule)"
	namespace = "host-interaction/file-system"
	static scope = "basic block"
	dynamic scope = "call"
	hash = "51828683DC26BFABD3994494099AE97D"
	reference = "This YARA rule converted from capa rule: https://github.com/mandiant/capa-rules/blob/master/rules/host-interaction/file-system/reference-absolute-stream-path-on-windows.yml"
	capa_nursery = "False"
	date = "2024-04-05"
	minimum_yara = "3.8"
	license = "Apache-2.0 License"

  strings:
 	$re_aaa = /\x00(\\\\\?\\)?([\w]\:|\\)(\\((?![\<\>\"\/\|\*\?\:\\])[\x20-\x5B\x5D-\x7E])+)+\:\$?[a-zA-Z0-9_]+/ ascii wide

  condition:
    capa_pe_file and
 (
			$re_aaa
	)
}

It seems like YARA doesn't like spaces in meta field names, so I had to add underscores to static scope and dynamic scope. Even after that, it still doesn't work with the converted rule. I'm also not sure where the null byte came from. I tried to play around with the original capa regex, but couldn't get that to work either.

❯ yara test.yar 51828683dc26bfabd3994494099ae97d.elf_
error: rule "capa_reference_absolute_stream_path_on_Windows" in test.yar(43): invalid regular expression "$re_aaa": syntax error

What did the regex look like when the script was working?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants