VTDIFF - Automatic YARA rules

VTDIFF helps threat analysts in creating YARA rules by automating the identification optimal patterns to detect groups of files (malware families, threat campaigns, threat actor toolset). These patterns can be used in Livehunt YARA rules, Retrohunt jobs and VT GREP content searches. 

What do you mean by optimal?VTDIFF takes into account binary sequence prevalence across the entire VirusTotal dataset in order to make sure it does not suggest patterns that are shared by a large number of files, e.g. the DOS stub in portable executables: "The program Cannot be run in DOS mode". In other words, VTDIFF tries to generate patterns that will not produce false positives. 

Answers to other frequently asked questions can be found in the VTDiff FAQ section.

Accessing VTDIFF

VTDIFF has two main entry points. Given a VT Enterprise file search, you can select a group of files by acting on their checkboxes, this will activate the VTDIFF button in the tools menu:

VTDiff send

You can also get to VTDIFF by using the left navigation bar in VT Enterprise:

VTDiff navigation bar

This will then take you to your historical VTDIFFs view, where you can set up a new session:

VTDiff new session

A session modal will open up allowing you to specify a list of hashes to match and also a list of hashes to exclude, that will fine tune your job:

VTDiff send

VT Enterprise search list workflow

Let’s imagine our starting point is a single hash, a malicious document that drops second stage payload via macros. The hash is:


In order to be able to build an optimal rule to detect this file and other variants of the same campaign/family we first need to identify a collection of files belonging to such family. 

We can pinpoint other variants by making use of the different clustering mechanisms that VT Intelligence provides (vhash, imphash, ssdeep, icon/thumbnail vizhash, antivirus labels, distinctive static properties, etc.). In this case we are going to pivot to other documents with the exact same visual layout by clicking on the file’s thumbnail:

VTDiff thumbnail

This will trigger the following search:


By hovering over the detection ratio you will see that the antivirus labels do indeed confirm that the matches belong to the same malware family. Similarly, the thumbnail of the files in the list clearly indicate that they have the very same visual layout. 

Select the first 8 matches:

VTDiff checkboxes

When you do so, the VTDIFF icon in the tools menu bar activates itself:

VTDiff send

Click on it in order to set up a VTDIFF session for those hashes. Commonly abused file types such as documents do not require you to provide a list of hashes to exclude, the backend will set up an exclusion list automatically. If you want to refine the VTDIFF session and avoid false positives produced in previous VTDIFF jobs, you can use the exclusion list.

VTDiff exclusion list

Launch the session. A VTDIFF job generally takes under a minute to conclude. Note that this is a heavy process, VTDIFF will iterate over all files provided with sliding windows of different sizes, checking whether binary subsequences are common to the selection and whether these are not too noisy when considering the entire VirusTotal dataset.

A list of binary patterns will be produced:

VTDiff patterns

Note that you can click on the search icon next to each binary pattern in order to trigger an n-gram content search for it:

VTDiff vtgrep

This will allow you to understand the kind of files that match each pattern and whether they are prone to false positives. You can also easily search for an AND combination of some of them, which will probably be far more effective than an individual search. Select the first three patterns and click on the search icon in the top actions menu bar:

VTDiff patterns search

That will trigger for files that contain those 3 binary patterns. VirusTotal maintains a 5PB n-gram index to power lightning-fast content searches:


VTDiff thumbnails

Judging by the thumbnails, sizes, antivirus results, tags, etc. it is clear that those three binary patterns have done a good job at finding other variants of the family/campaign. It is now clear that these patterns can be used to build a YARA rule or Retrohunt to either be notified of any new future file upload that belongs to this family or to map out the entire historical campaign. Livehunt YARA rules and Retrohunt jobs can also be set up using the top actions menu in VTDIFF listings:

VTDiff hunting

One last word, note that the top actions menu also allows you to filter the patterns in order to focus exclusively on certain categories:

VTDiff filter


Hash list workflow

Sometimes you might want to create a VTDIFF session based on a list of pre-existing hashes, your starting point will no longer be a VT Intelligence search. In these cases you will use the navigation sidebar to access directly access VTDIFF:

VTDiff navbar

The landing page allows you to create a new VTDIFF session. This requires you to copy&paste the list of hashes to match in the inclusion text area.

VTDiff hashes

Exclusion hash lists will only be needed for less prevalent file types, however, even if exclusions are not required, you might often want to set up your own selection in order to filter out files that match previous iterations of a VTDIFF job that caused false positives.

Note that the VTDIFF landing site lists the historical VTDIFF jobs that you launched in the past, this allows you to continue working with the identified patterns at a later stage and can come in handy if you mistakenly closed the browser tab.

From this point onward the workflow is exactly the same as the one described in the previous section.

To learn more about VTDIFF please refer to the Frequently Asked Questions.