OWASP File Hash Repository
What is FHR?
Simply put, FHR is a repository of hashes of files. But the idea is to go beyond just keeping a list of hashes: I want the repository to indicate when the file in question is (part of) a malware or when a file is recognized as benign. Thus, anyone could see the hash of a file to see if it corresponds to a malware file or an already known good file.
Aren't there already other sources for this information?
Yes, and one of the ideas of the project is to aggregate and leverage information from already existing sources. For example, NIST has the NSRL, which provides hashes of known benign files. The problem is that NIST provides this information in a text file whose download is over 1GB in size. Other known sources are Team Cymru's MHR, SANS Institute's hash database and Virus Total. In addition to aggregating the information, one of the main goals for FHR is to allow free access to its database.
Isn't free access to a database that contains malware dangerous?
Yes, it's dangerous, but the project repository will not contain malware. The repository will only have the hashes of malware, which poses no danger.
Detecting malware using only hashes is not good strategy.
Certainly, and the project is not intended to replace the current anti-virus scanners. However, the creation of hashes is more efficient and easier than creating generic virus detection algorithms and it is a strategy which is being used as a complement to traditional antivirus products. Several commercial products include uses of cloud computing as part of their strategies. Unfortunately, the producers of these technologies do not allow queries to their hash databases. With FHR, the goal is to create a freely available database to be used by everyone.
Will the FHR be integrated into antivirus systems?
We intend to develop clients to the FHR database that can scan workstations and query FHR's database to try to identify malware. These clients will be created as a proof of concept and will be open source. It would be great if some antivirus vendors start supporting FHR, but only time will tell.
Technically, how does the FHR work?
As expected, the core of the system is its database of hashes. Today this database runs on MySQL. Around this database, we can develop several query interfaces. Some ideas of protocols for querying the FHR database are:
- web services
The current codebase includes a DNS-based query interface.
What data are available in the database?
We currently have the a little more than 20 million files in the database. These come mainly from the NSRL and we included several PE files from Windows Vista and other common software. For each registered file, we have the following information:
- date when the system saw the hash / file for the first time (not available for the files from NIST)
- status (GOOD, MALWARE, UNKNOWN, SUSPICIOUS)
- certainty (a percentage that indicates the degree of certainty about the status of the file).
Testing the system
To query the FHR, add the suffix .hash.sapao.net to the MD5 or SHA-1 hash (in hex format) of the file. For example:
dig TXT 84C0C5914FF0B825141BA2C6A9E3D6F4.hash.sapao.net
will query the database for the hash 84C0C5914FF0B825141BA2C6A9E3D6F4.
The FHR database contains a single table, called File, described below:
mysql> show columns from File in FHR; +-----------+------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------+------------+------+-----+---------+----------------+ | idFile | int(11) | NO | PRI | NULL | auto_increment | | SHA1 | char(40) | YES | MUL | NULL | | | MD5 | char(32) | YES | MUL | NULL | | | size | mediumtext | YES | | NULL | | | source | char(10) | YES | | NULL | | | date | date | YES | | NULL | | | status | char(10) | NO | | NULL | | | certainty | float | YES | | NULL | | +-----------+------------+------+-----+---------+----------------+ 8 rows in set (0.00 sec)
| PROJECT INFO
What does this OWASP project offer you?
| RELEASE(S) INFO|
What releases are available for this project?