Search

Know Your YARA Rules Series: #2 Rewrite Your Rules

In the second post of the Know Your YARA Rules series, we will continue to explore more hurdles that keep you from achieving the maximum performance from your YARA rules.

Today, we will demonstrate several examples where even a tiny change in your YARA rules can drastically improve the scanning performance.

Before we start, note that YARA is a live project that changes over time. What we recommend today might not be the best available solution next year, so we recommend keeping up with the current version and recommendations as the project changes. For that, you can, for example, follow this blog site, as we will continue to write about news from the YARA world.

A race between strings and condition

In the first post from our series, we discussed that we must be careful with the string section because it is always evaluated before the condition section and cannot be skipped with methods like filesize. This section will discuss the special case where you can use it to your advantage. Let us say we want to match a file with specific bytes at the beginning of the file with a filesize limit, as shown below: 

rule slow_rule
{
    strings: 
        $h00 = { 42 ?? ?? 00 00 61 62 } 
    condition: 
        filesize < 1KB and 
        $h00 at 0 
} Code language: PHP (php)

There are three reasons for slowing down the scanning. We have already learned that the filesize condition will not help us limit which files will be scanned for string $h00 and which will be skipped. The second reason is that not even limitation to position 0 would help us, as the position of the match will also be checked after we have called all matches of the bytes sequence in files. The last reason is too general nature of the strings. It can cause many false positives during the evaluation phase and slow the whole process down. But what can we do to improve the situation?

If we know the exact position we want to match, we strongly recommend using intXY and uintXY functions. You can match uint8(0) or unit32(3), where the number in the function name represents how many bits you want to match starting with the offset or virtual address described as an argument. Be aware that both 16 and 32-bit integers are little-endian. For big-endian, you can use functions intXYbe() or uintXYbe(). The rewritten rule then looks like the following:

rule faster_rule 
{ 
    condition: 
        filesize < 1KB and 
        uint8(0) == 0x42 and 
        uint32(3) == 0x62610000 
} Code language: CSS (css)

Notice that the string section is completely gone. This is good because now, we first check the size of the file, and if the condition is not met, the file is skipped for the rest of the rule evaluation. If the size is correct, we check the position 0 for the value 0x42. Only when this is fulfilled is the second position in the file checked.

Note that there is a fast scan mode that changes the way the rules are evaluated. But to keep things simple, we will skip the details and leave it for future posts in this series.

Help YARA figure out your strings 

YARA is generally very smart when scanning with your rules, but it has its limits. For this reason, there are ways to rewrite your rules so YARA can understand them better, focusing on short strings.

The most typical example of this is alternations, mainly in short strings.

rule slow_rule 
{ 
    strings: 
        $hex = { 44 (03|2E) 33 } 
    condition: 
        $hex 
} Code language: PHP (php)

It can be tempting to write these short strings into one. The problem is that YARA does not parse this notation that well. Yes, YARA can work with alternations during the atom selection process. For instance, in a case such as /(hey|bye)\.\w*/, it would prefer two atoms hey and bye over searching just for a dot (\w cannot be part of the atom). However, Yara does not concatenate the strings together, even if it means the atoms would be longer. In this case, the alternation does not provide enough information, and the problem is even bigger because even values outside the alternations are just one character in length. We want to match four or more bytes for effective matching, but sometimes, it is not very easy. In this case, we want or need to work with these strings. So, how can we write this string, so YARA understands them better?

rule faster_rule 
{ 
    strings: 
        $hex0 = { 44 03 33 } 
        $hex1 = { 44 2E 33 } 
    condition: 
        $hex0 or $hex1 
} Code language: PHP (php)

Rewriting the string into two will improve the scanning speed as your rules speak to YARA more clearly.

Aim for the string you really need 

Sometimes, our rules are slow because we match more strings than we actually need to. In some cases, it is acceptable to scan for a more general version of the string and check other parts of the condition after that. However, in most cases, you could struggle with some performance issues.

Let us say we want to detect IPv6 addresses. You could match a general format of the addresses, as in the following rule:

rule slow_rule 
{ 
    strings: 
        $ipv6 = /([a-f0-9:]+:+)+[a-f0-9]+/ fullword nocase ascii 
    condition: 
        $ipv6 
} Code language: CSS (css)

The question is, do you absolutely need to check for this general format? What are you trying to match? In our case, when assisting malware analysts, we discovered that we need only global unicast addresses starting with the prefix 2001. That is fantastic news for YARA and us because it means that we can simplify our regular expression.

rule faster_rule 
{ 
    strings: 
        $ipv6 = /2001:([a-f0-9]{0,4}:){1,6}[a-f0-9]{0,4}/ fullword nocase ascii 
    condition: 
        $ipv6 
} Code language: PHP (php)

Not only is this faster, in our tests, it was about 50% faster, but it does not falsely match strings that are not unicast IPv6 addresses. And that should always be our goal.

A similar case is when we want to find a string with a filename ending on extension exe. It is tempting to write something like /*.exe/, or /[0-9a-z].exe/. The fastest variant is, surprisingly, for some, the text string “.exe”. In this situation, the number of matches will probably not change that much, but you are speeding your YARA rules by 40%.

Conclusion 

In this post, we explained more examples of improving the speed and quality of your YARA rules.

The main takeaways are:

  • Think twice about the strings you want to match. Are you describing exactly what you need, or are you creating a more general version? Your rules will likely be prolonged if you count on a condition to sort out the real matches.
  • If you know the position of the match, help YARA and use (u)intXY functions. You can skip the scanning of large files and optimize the use of short-circuit evaluation.
  • Help YARA understand what strings you are trying to match. Sometimes, with slight changes, YARA will surprise you with better scanning speed. In future posts, we will show even more examples.

And that is all for today! We wish you happy YARA rules writing!