Search

Know Your YARA Rules Series: #4 YARA-X: The Next Generation

In the third post of the Know Your YARA Rules series, we mentioned that something is cooking up in the YARA world. In this post, we will investigate what the future holds for the YARA tool and its users. We are introducing the next generation of the YARA tool – YARA-X.   

Motivation

YARA has recently celebrated its 15th anniversary (calculated from the first commit in the public repository made by the main author Victor M. Alvarez.  

The YARA tool is constantly improving, with new features added by Virus Total and the very active community.  

At Gen, we have a long history of contributing to the YARA with new features and ideas that would further improve it. One example of this activity is our prototype of alternative scanner, called YaraNG. You can read more details about this implementation in our previous blogpost

However, recently, Victor opened a discussion about the future of YARA, where he described his desire to reimagine how the YARA works internally on a much bigger scale. Rather than changing the code gradually, he pointed out, maybe it is time to rewrite the YARA project completely. He expressed the goal to create a new version of YARA that would better reflect the current technology and needs of the users and asked the community for feedback and ideas for future changes.  

The discussion was more than productive, and the decision to create a new generation of our favorite tool was made.  

The initial comment in a discussion about the future of YARA from Victor Alvarez.

YARA-X is currently being developed by Victor and other volunteers. The code is available online, but be aware that the official release is still to come. However, even though Victor and other volunteers are working on creating the code from scratch, noticeable progress has already been made. We believe that the first release will come soon, and we will inform you about it when it happens. 

We are also happy to be part of the creation process of the whole new generation of the YARA tool and contribute to the project from its beginning.  

Let us show you what we have already done and what are the plans for future development. 

Rusted in C? Try Rust

The first noticeable change in the new project is a change in implementation language. While the original YARA was created in C language, YARA-X is being developed in Rust.   

Rust is becoming a more and more popular programming language, and we believe it is a perfect choice for the goals of the new generation of this tool. Rust offers additional security in addition to maintaining the speed of a C code. This is the perfect combination for tools in cyber security space. 

YARA-X will be more modular and customizable to users’ preferences so they can choose what modules they need and which they don’t.  

There will be bidding for other languages like C, C++, and Go, and the goal is also to have a whole ecosystem with tools that will help us to use it more effectively. 

Modules

YARA already has modules, such as the Cuckoo module. However, their role will be more prominent in the new version. There will be better support for the implementation and extension of the modules, including improved testing possibilities. 

Mach-O Module

The first more extensive module that has been ported to YARA-X is a Mach-O module. At Gen, we have taken the responsibility to port this module and provide the first options to use YARA-X for parsing binary files. We have taken a different approach than the one that was used in the original YARA. The Mach-O module is much more sustainable as we have used the ‘nom’ library for parsing binary files. It makes parsing much easier and contains built-in validation. The rest of the code copies the original Mach-O module, as we wanted to preserve full compatibility.  

The main difference would be the usage of protobuf to store the actual file output after parsing. YARA rules work directly with this serialized format compared to using just C structures in the original module. This `.proto` file can also be later used to easily dump the output of the module, which I will talk about later. Mach-O module also recently received new outside contributions that have expanded it even further. After Mach-O modules, ELF and LNK modules have started to be developed, and many more are coming soon.

Testing framework

Another bigger change that comes with YARA-X and its modular approach is that we can see a completely new framework for testing modules. In addition to internal unit function testing, we have contributed to creating a testing framework that will be able to test binary files and their parsed output and yet still be safe. At initial discussions with Victor, we were recommended to use Intel Hex representation for binary files. This representation provides possibilities to safely store and manipulate even malicious files in the testing framework and public GitHub repository.  

This so-called ihex representation is later converted to binary directly in tests, and the whole file content is then passed to modules, and the resulting protobuf structure is compared to its expected content. This framework works out-of-the-box for every new module that is added to the repository. The module author must only provide `.in` and corresponding `.out` files. We have enriched this testing experience by also adding fuzz tests that will take care of unexpected input and reveal potential weaknesses. 

YARA-X dumper module 

As mentioned earlier, the way module output is being stored in protobuf format creates new opportunities for other tools to use it. We have decided to develop a YARA-X subcrate (that can also work as a standalone module) to make module output available for users in many formats and options. This module takes stored protobuf format and transforms it into machine-readable formats such as JSON or YAML. Both formats provide human-friendly output with colors and pretty-print formatting.  

YAML is even extended with additional options to allow output to be as human-readable as possible. Protobuf integer fields can be marked with multiple formatting options. It includes comments with transformed timestamps into their readable format or using hexadecimal representation instead of decimal for fields where the user decides it makes sense.  

In the future, we want to also add an option to represent flags as a combination of specified values set for each bit. With the help of Victor and the YARA-X dumper initiative, from now on, you can get the module output even without using the YARA scanner and creating fake rules for triggering parsing. This is a big milestone in making YARA(-X) work as a toolchain.

Conclusion 

YARA has a long history, and it is not a coincidence that after all these years, it is a very popular tool for malware analysts and many more users worldwide.   

The main authors, as well as the YARA community, have done a lot of work to improve the tool as much as possible. We believe that the future of the new generation, YARA-X, will also be that successful, if not even more.  

As the development of the new tool continues, we will inform you about the progress, important news, and other helpful information. So, stay tuned!  

And that is all for today! We wish you happy YARA rules writing!