Making YARA better: Authenticode, .NET, Telfhash

YARA is a popular open-source tool for malware identification and classification. But if you are reading a blog post about YARA improvements, I will assume you are already familiar with what YARA is and what it offers. 

We have been using YARA extensively at Avast for several years. Over those years, we’ve added many new features and fixes, giving back to the community. Recently we’ve made large contributions to YARA, which further improve its malware identification capabilities:

  • PE signature parsing and verification.
  • Reconstruction of .NET user-defined types.
  • Calculation of telfhash value available on VirusTotal.

We will introduce these changes and show some of their use cases.

Authenticode parsing and verification

Until now, YARA could only extract the signer certificate information (issuer, subject, validity period, etc.) from the Authenticode signature. It could also walk all the possible signature locations, either by walking the certificate table or by extracting nested signatures as an unauthenticated attribute (often used due to the compatibility with both new and old systems). But that’s about it.

We have replaced existing Authenticode parsing with our open-source library based on OpenSSL, which offers additional information about signatures, countersignatures, and their verification. Verification allows for convenient checking for invalid signatures. In the case of malware, these are often just lazily ripped from a valid signed file and slapped onto a malicious binary in the hopes of evading detection. This is the case for the SigThief tool, for example.

Because the term verification can be confusing, I will elaborate and explain what I mean in more detail. It doesn’t mean you can trust the signature. For a signature to have a trust, it needs to have a trust anchor. These trust anchors can be different for different places: systems, organizations, browsers, and for different points in time. The trust is typically passed from a root certificate through the signature’s certificate chain. This means that to check if a signature is trustworthy, we’d probably need to have an exhaustive list of all root certificates and keep it updated, but this still wouldn’t be enough as not everyone would agree on the same set of root certificates. Due to these problems, we are only confirming that the signature satisfies all the formal constraints it should:

  • It contains everything necessary.
  • The hashes match.
  • The signer certificate exists.
  • The signer’s public key verifies the signature.

But verification doesn’t walk the certificate chain down to the trust anchor as there is none.

There is a large amount of extra information available in YARA now. For the complete list of data available, see the YARA documentation. For a quick peek, I’d like to point out the parsing of the signature certificate bundle, the signer chain, the program name, the file digest inside the signature, the actual digest, and the countersignatures.

import "pe"
rule deceptive_program_name {
       description = "Signed file with deceptive name in the UAC prompt"
       pe.is_signed and
       for any signature in pe.signatures :
           signature.signer_info.program_name contains "continue setu" or
           signature.signer_info.program_name contains "Click RUN/YES" or
           signature.signer_info.program_name contains "Click YES"
Code language: Bash (bash)

The above code is an example of checking for a deceptive program_name that misleads users into automatically allowing the program to execute. The signature program_name is shown at the UAC prompt when the file is correctly signed.

These changes were made backward-compatible and will not interfere with any existing YARA rules.

Extraction of .NET user-defined types

The former dotnet module processed part of the metadata information, namely user strings in the #US stream and some information from the tilde – #~ stream: the typelib, resources, assembly references, or module references.

The tilde stream can contain dozens of tables and information about the program. The majority of these tables hold type information. We have extended the parsing of this stream to reconstruct all of the user-defined types from these tables. Any type defined inside the binary will be available in YARA, all the classes with their names, namespaces, parent classes, methods, return types, argument types, and modifiers.

import "dotnet"
rule prometheus_known_sequences
       description = "Prometheus"
       hash = "1ee679b712b0a9fbe705ff3dd8a30cca596a0486751c1e0e66cc2bb901e58df5"
       hash = "44d49b6b8f2d5cd2e21c813eef082dc4b1a7bc4fa980a85cc39483dc8f2aac4f"
       hash = "ada2fdee1a22d4ed1428f3471373aabdaef1c033fcc5d4f5c3596b18a3bddc0c"
       $s01 = ")(/.;6" wide
       $s02 = "Selected compression algorithm is not supported." wide
       $s03 = "Unknown Header" wide
       $s04 = "FC216F5C5AE2947D800794ECD5F752EE8381073C2E5D0D095FDA040F541702F3"
       $s05 = "CC8CD41CEF907C4D216069122C4B89936211361F9050A717A1E37AD1862E952F"
       $s06 = "RijndaelManaged"
       $s07 = "AesManaged"
       dotnet.is_dotnet and all of them and
       for 4 class in dotnet.classes :
           for any method in class.methods :
      contains "CreateEncryptor" or
      contains "WorkerCrypter2"
Code language: Bash (bash)

The dotnet module was enabled in VirusTotal Livehunt not long ago. When YARA is released, these features can be used in the Livehunt service.

TLSH, Telfhash and import hash

We have integrated a TLSH – A locality-sensitive hash used for similarity comparisons. The integration into YARA involved porting the existing Trend Micro’s tlsh codebase from C++ to a separate C library. We used the hash function to compute Telfhash, a Trend Micro ELF hash meant for clustering and finding similar ELF files. It hashes symbols in the ELF file using the TLSH function. It is now available as part of the YARA ELF module and has been part of VirusTotal for some time. While at it, we’ve also included a simple MD5 import symbol hash, similarly to the popular PE Imphash.

import "elf"
rule xorddos {
       description = "Xorddos malware"
       hash = 8e0feb43f2137013fbbe42258dcc118104f9237cf41bfa52d342211ac823fad2
       elf.telfhash() == "T12E3162E118BC0C860DE0AC145C7C3B82CA8B91771FA4961CAF99CD89714F125F67BC06"
Code language: Bash (bash)


We made these changes to help analysts write more straightforward and robust YARA rules. We consistently improve YARA with new features and are glad to be part of the open-source community. The contributions over the years have made us part of the YARA authors list, for which we are grateful. We are also releasing many new YARA-related projects later this year, so stay tuned!

And suppose you want to check out any of our other work. In that case, I recommend looking at Dominika’s latest blog post about improving pattern matching in YARA, or plenty of it is visible at the YARA Github repository.