Transforming Data into Unique Identifiers

return

By:
Mauro Börner
(Security Automation Engineering)

References

- https://www.meshsecurity.io/blog/fuzzyhashing

- https://ssdeep-project.github.io/ssdeep/index.html

GENERAL Nov 21, 2023 • 67' READ

Transforming Data into Unique Identifiers

Hashing is a process used to transform data into unique identifiers of fixed length, known as hashes. This process is based on a function that takes an input and returns a fixed-length string of characters, which is usually a seemingly random combination of letters and numbers.

Here is a brief description of how hashing works:

1. Data Entry:

- A data set is taken, which can be of any length or type.

2. Hash function:

- A hashing algorithm is applied to the data. This function takes the input and produces a fixed length string.

3. unique Identifier:

- The resulting string is the hash or unique identifier associated with the original data.

Important Properties:

- Two different data sets should not generate the same hash ("no collision" property).

- Minimal changes in the data should result in significant changes in the hash ("avalanche" property).

Common Usage:

- These algorithms are commonly used in database indexing, file integrity verification, cryptography, among others.

Some common hashing algorithms include MD5, SHA-1, and SHA-256. However, it is important to note that some of these algorithms have proven to be vulnerable and are considered obsolete for certain purposes. In security applications, it is recommended to use modern and secure algorithms.

What is Fuzzy Hashing?

Fuzzy hashing is commonly used in anti-malware software and malware analysis to detect malicious or suspicious file variants. By breaking a file into smaller fragments and calculating the hashes of those fragments, structural similarities between files can be identified, even if they have undergone minimal modification.

This capability is especially useful in detecting "polymorphic malware," which is a type of malware that has the ability to change its "look and feel" without changing its underlying functionality.

By using fuzzy hashing techniques, anti-malware software can identify specific malware families and variants that share common characteristics, allowing for more effective detection. Fuzzy hashing not only applies to files, but can also be extended to other types of data, such as text strings. Its application is not only limited to cybersecurity, but is also used in document version management, duplicate detection in databases, and in general, in any scenario where similar, but not identical, data needs to be compared.

Phishing Scenario

In a phishing scenario employing fuzzy hashing, the technique would be used to compare similar files or data, especially in the context of identifying fake websites or phishing pages that visually mimic legitimate sites.

1. Creation of a Fake Web Site:

- An attacker creates a fake website that visually mimics a legitimate site, such as that of a bank. The goal is to make the fake site look as similar as possible to the original in order to trick victims.

2. Variant Generation:

- The attacker uses techniques to create variants of the fake website. This involves making slight changes to the source code, images, text or other elements of the site to create versions that are similar, but not identical.

3. Sending Phishing Emails::

- Phishing emails are sent to a group of recipients, making them believe that they need to perform some urgent action, such as verifying their bank account. The email includes links pointing to different variants of the fake site.

4. Fuzzy Hashing for Detection:

- Security systems, such as antivirus solutions or phishing detection tools, use fuzzy hashing algorithms to calculate hashes of web pages and compare them with databases of known hashes. If variants of the fake site have enough similarities, they can be detected as possible phishing pages.

5. Prevention and Alert:

- If a web page is found to have a hash similar to one known to be malicious, preventive measures can be taken, such as blocking access to the site, alerting the user or notifying the service provider.

The use of fuzzy hashing in this context helps to detect malicious websites that have been designed to be visually similar to legitimate sites, but have undergone slight modifications to circumvent conventional detection based on static hashes.

Tools

1. ssdeep:

- ssdeep is a tool (and a library) that implements the fuzzy hashing algorithm called context-triggered piecewise hashes (CTPH). It is widely used in malware detection and identification of similar files.

2. sdhash:

- sdhash is another tool that relies on fuzzy hashing to identify similarities between datasets. It can be useful in forensic analysis and duplicate detection.

3. spamsum:

- spamsum is a tool that uses fuzzy hashing techniques to identify similarities between data sets. Although it was originally designed for spam detection, it can be applied in a variety of contexts.

4. TLSH (Trend Micro Locality Sensitive Hash):

- TLSH is a fuzzy hashing algorithm developed by Trend Micro. It is designed to be fast and efficient in detecting similarities between data.

5. md5deep y hashdeep:

- md5deep and hashdeep are tools that allow the creation and verification of hashes for large data sets. Although not specifically for fuzzy hashing, they allow the creation of hashes to compare similarities between files.

Conclusion

Fuzzy hashing emerges as an essential tool in threat detection and data management, allowing the identification of similarities between sets of information, even in the face of minimal modifications. With tools such as ssdeep, sdhash, and others, greater efficiency is achieved in malware detection, version management and forensic analysis. Despite their usefulness, they highlight the importance of a comprehensive approach to security, including awareness, education and constant updating of protection measures.

By: Mauro Börner (Security Automation Engineering)

SHARE

References

By:
Mauro Börner
(Security Automation Engineering)