YARA is a well known tool in the security industry used to classify and identify malware samples. At Avast, we are trying our best to share our improvements and tools with the open-source community. An example of this are the two recent blog posts where we published two YARA related tools: YARA Language Server and YaraNG.
Writing YARA rules can be a challenging task, but debugging them once they are written is even more challenging. Imagine you have a YARA rule but it’s not matching the samples you expect. It looks like a problem, but how do you figure out what’s wrong and fix it? What part of the condition is causing the bug? How should you, as an author, approach this situation?
The first thing that comes to mind is that you can try to comment out specific parts of the condition, effectively reducing its size and complexity to a more manageable chunk. You are already familiar with this technique from other programming languages, so you start playing with the condition and, after some time, you eventually comment out the exact expression causing the bug.
But at what cost? You had to do a lot of guessing, created lots of comments, went through lots of undos, and run lots of test scans. But what if the condition consists of many expressions in the and
chain or you are dealing with the chain of rules that each depend on each other? This approach does not scale well.
If you don’t want to go this route, the second option is to use the yara -D
option to dump module data structures. This way you get more information. But you need to know what you are looking for and there is no support for function calls, so if you are looking for the result of pe.imphash()
, you are out of luck. You have to construct the command by hand, which also includes finding the path to the sample and the rule itself.
Then you learn that the output does not include the information about the strings. So you have to execute another command with -s
switch, which shows you the information about strings. Then you get back to your editor and you need to manually search for the value of a particular symbol in the scrollback of your terminal. This way you can get a solid idea of what some of the used symbols equal, but you need to do the rest of the logic (e.g. comparisons, logical operations, function calls, references to other rules) yourself. Finally, most of the users don’t even know about those features. Even though I have worked with YARA for a few years, I learned about them only recently.
In the end, you learn about the recently added console module, which is great. You can now use debug prints to investigate the expressions. However, you are still required to edit/change your condition with the risk of forgetting about the debug prints and keeping them in the final rule. You also need to work around the short circuiting of expressions to make sure your logging statements can be reached in the first place. Furthermore, all of the log functions return hardcoded True
value, so you need to be extra creative where you put them; you can’t just wrap your existing expressions.
As you can see, debugging YARA rules is notably hard, especially for new users. There are no dedicated tools for this task, only a few guides, and most of the time you are on your own. What if we can make this better?
Introducing YARI – YARA Interactive. The new addition to our open-source family allows you to evaluate YARA expressions and provides integration with YLS for the best user experience. You can find the source code in the avast/yari repository released under the MIT license.
Basically, YARI can take a generic YARA expression as an input and return the evaluation result. For example, passing in the string time.now()
will return Integer(1663178204)
, which is what YARA sees when the condition is evaluated. However, we are not limited to such simple queries. YARI supports all built-in modules, including constants, arrays, dictionaries, and you can even call functions with custom parameters. Furthermore, you are able to check strings and evaluate some more complex expressions involving, for example, binary operations.
Internally, YARI uses the concept of Context
to keep track of the initial state to evaluate expressions. The Context contains information about the input sample, the current rule, or supplied module data – most of the things you can supply to YARA as arguments. Before evaluating, all this information is put together and the final result is derived from it.
Designing a workflow for a YARA debugger was not an easy task. In the end, we chose not to go the route of conventional debugger (e.g. gdb
), because of YARA’s declarative nature. We wanted to provide a much higher-level experience for the user. Users will have the ability to select expressions from the rule and evaluate them in the context of the rule (exploring the more conventional style of debugging where users have the ability to step through the condition might be an interesting topic for the future). The goals of this project are the following:
- Provide really accurate results, no emulation, no reimplementation of YARA, and use what is already available.
- No need to change the condition in any way. This means no forgotten debug statements.
- Ease of use.
YLS integration
YARA Language Server (YLS) is a tool that we introduced in one of our recent blog posts. To make the debugging more accessible, we have prepared integration with YLS. Your editor can become a tool that will help you. There is no need to switch windows. First, check out this simple example:
Video: Showcase of the debugging workflow.
Rules in the clip use hash
meta to document which samples should generate hits or for the improved documentation. This is a good practice to apply to your rules, and it is also used as a starting point for debugging. To start the debugging session, click on the “Select hash for context” above the hash meta, which will initialize the debugging Context.
Once the Context is ready, you can hover over the expressions in the condition section and the hover popup will contain the evaluation result. YLS will try to guess the best expression for evaluation, but if you want more control, simply select the exact part of the expression and hover over it.
Now you might be wondering how you tell YLS where to search artifacts (samples or module data) to use. You can set up a directory for your project with all of those files in the editor settings for YLS. YLS will recursively search this folder and all of its subfolders, trying to find a file with the name of the hash. Samples have to be named with the hash of the file. You can also have a suffix with the YARA module name, which will instruct YLS to use this file as module data. For example, a file called <hash>.cuckoo
will be used as a cuckoo module data (ref.).
Installation is pretty straightforward. You need is to update your YLS VsCode plugin and your YLS installation. VsCode should be able to update the plugin automatically, in case of any problems you should be able to do it manually from the Extensions
tab. To update YLS run the following command pip install -U yls-yara
or, for more detailed steps, consult the YLS documentation.
Python package
In case you also want to use the debugger programmatically, we have prepared Python bindings for YARI. We currently support Linux and Windows platform, but adding a new is possible. Those bindings are also the core of the YLS integration. To use them, simply install yari-py
package using pip, which also contains compiled and linked libyara
:
pip install yari-py
Once everything is installed, you will be able to use the following code to evaluate expressions (more info in the lib.rs definition of the Python module):
import yari
try:
context = yari.Context(“/bin/sh”)
print(context.eval(“elf.number_of_sections”))
except yari.YariError as e:
print(str(e))
Code language: PHP (php)
The expression results are converted to Python native types. For example, YARA arrays are converted to Python lists so you can check the length using Python built-ins. The same applies to the structures and dictionaries. If you want to get a boolean representing if the YARA considers the expression as True
, use the Context.eval_bool(‘expression’)
function.
Command line shell
For power users, there is a command line application that spawns an interactive shell (inspired by Python shell). Users can input YARA expressions, and results are displayed. The debugger binary accepts similar parameters as the original yara
binary.
Architecture
We have decided to implement the debugger in the Rust language. The current implementation consists of three crates:
- yari-sys: The core and bridge to libyara. The implemented
eval
functions and theContext
are here. - yari-cli: Command line interface for YARI.
- yari-py: Python bindings.
Yari-sys is a wrapper around libyara adding the debugger functionality on top of it. FFI bindings are generated using bindgen. We also created a small parser for the subset of YARA expressions using nom. This core functionality is then exposed using different interfaces. Yari-cli uses rustyline to create an interactive shell where users can evaluate various expressions. Similarly, yari-py creates definitions for a new python module using maturin and pyo3.
Conclusion
In this blog post, we have introduced a new tool called YARI that can be used to effectively debug YARA rules. Sources are available in the avast/yari repository. We would love to hear feedback from the community. If you have any ideas or encounter any issues, don’t hesitate to contact us. Feel free to open issues or pull requests — we really appreciate it.