This site is the archived OWASP Foundation Wiki and is no longer accepting Account Requests.
To view the new OWASP Foundation website, please visit https://owasp.org

Difference between revisions of "Category:OWASP Favicon Database Project"

From OWASP
Jump to: navigation, search
(Initial project filling)
(Added headings)
Line 1: Line 1:
==== Main ====
+
==== Main ====
Idea
 
  
Idea is to have software enumerated via favicon.ico. How to do that? Take MD5 of favicon.ico and compare it against the known database. This article is description of building the MD5 database of most popular/frequent favicon.ico.
+
= Idea =
  
I wrote .nse script for nmap to perform enumeration of software via favicon.ico. I've noticed that there is very small database of existing MD5 fingerprints of favicon.ico and also most of the current md5 fingerprinting implementations have only web server enumeration, I have added also some popular CMS, wikis, etc. I added some of them manually, but it's boring process. Fyodor suggested that we should do internet wide scan (nmap -iR 0) and gather the statistics and MD5 fingerprints of most usual favicons.ico and document them.
+
Idea is to have software enumerated via favicon.ico. How to do that? Take MD5 of favicon.ico and compare it against the known database. This article is description of building the MD5 database of most popular/frequent favicon.ico.  
  
Problem
+
I wrote .nse script for nmap to perform enumeration of software via favicon.ico. I've noticed that there is very small database of existing MD5 fingerprints of favicon.ico and also most of the current md5 fingerprinting implementations have only web server enumeration, I have added also some popular CMS, wikis, etc. I added some of them manually, but it's boring process. Fyodor suggested that we should do internet wide scan (nmap -iR 0) and gather the statistics and MD5 fingerprints of most usual favicons.ico and document them.
  
So, I have started the adventure of getting the statistics of MD5 fingerprints of most usual favicons.ico. I have faced problems how to enumerate http(s) hosts on Internet. I recognized two types of http servers which I want to cover. First type is http servers on network devices and appliances and the second type is normal web servers with virtual hosts support.
+
= Problem  =
  
First type is more or less straightforward to cover (with nmap -p80,443 -iR). But the second type is problematic. Because there is no straight way (ignore services like mynetworkneighborhood or live.msn.com, they only know what they crawled) of knowing all virtual hosts from the IP address you have (so you cannot do nmap -iR stuff).
+
So, I have started the adventure of getting the statistics of MD5 fingerprints of most usual favicons.ico. I have faced problems how to enumerate http(s) hosts on Internet. I recognized two types of http servers which I want to cover. First type is http servers on network devices and appliances and the second type is normal web servers with virtual hosts support.
 +
 
 +
First type is more or less straightforward to cover (with nmap -p80,443 -iR). But the second type is problematic. Because there is no straight way (ignore services like mynetworkneighborhood or live.msn.com, they only know what they crawled) of knowing all virtual hosts from the IP address you have (so you cannot do nmap -iR stuff).  
  
 
One of the ideas was to extract all links from wikipedia. From my viewpoint, Wikipedia is not good source. They started to remove http:// references from the usual articles and only top 5 (or some other number) links they put on External references in articles. I did small research and I found out that the best source for this would be Open Directory Project (DMOZ). It's interesting that DMOZ have XML files of their whole directory located here. They even have nice format to do so.
 
One of the ideas was to extract all links from wikipedia. From my viewpoint, Wikipedia is not good source. They started to remove http:// references from the usual articles and only top 5 (or some other number) links they put on External references in articles. I did small research and I found out that the best source for this would be Open Directory Project (DMOZ). It's interesting that DMOZ have XML files of their whole directory located here. They even have nice format to do so.
 +
 +
= Solution =
  
 
Note that I did not want to do only DMOZ gathering or only nmap -iR gathering. With only DMOZ favicon gathering, I would lose favicons from network and appliance as usually they are not entered into DMOZ. And with only nmap -iR gathering, I would lose virtual hosts as there is no easy way of enumerating of all virtual hosts behind specific IP. So, I'm doing it both because I want to cover all possible cases.  
 
Note that I did not want to do only DMOZ gathering or only nmap -iR gathering. With only DMOZ favicon gathering, I would lose favicons from network and appliance as usually they are not entered into DMOZ. And with only nmap -iR gathering, I would lose virtual hosts as there is no easy way of enumerating of all virtual hosts behind specific IP. So, I'm doing it both because I want to cover all possible cases.  
  
Solution of gathering via nmap -iR
+
== Solution of gathering via nmap -iR ==
  
This solution gathers all favicons from port 80 as example.
+
This solution gathers all favicons from port 80 as example.  
  
Gather the data using modified version of favicon.nse:
+
Gather the data using modified version of favicon.nse: nmap -v -sT -iR 0 -p80 -n -PN --script=http-favicon-get.nse -oN nmap-p80-ir-favicon  
nmap -v -sT -iR 0 -p80 -n -PN --script=http-favicon-get.nse -oN nmap-p80-ir-favicon
 
  
Extract only MD5 and IP from nmap output:
+
Extract only MD5 and IP from nmap output: grep -i "http-favicon.*Unknown" nmap-p80-ir-favicon | awk -F':' '{print $4,",",$2; } > content-p80.md5.url  
grep -i "http-favicon.*Unknown" nmap-p80-ir-favicon | awk -F':' '{print $4,",",$2; } > content-p80.md5.url
 
  
Display sorted list of most frequent MD5 and last IP:
+
Display sorted list of most frequent MD5 and last IP: ./get-favicon-md5-count.pl < content-p80.md5.url | sort -r -n | less  
./get-favicon-md5-count.pl < content-p80.md5.url | sort -r -n | less
 
  
...and that's it. But if you're brave enough, you can do it all at once (note that -iR number then must be specified):
+
...and that's it. But if you're brave enough, you can do it all at once (note that -iR number then must be specified): nmap -v -sT -iR 100000 -p80 -n -PN --script=http-favicon-get.nse | grep -i "http-favicon.*Unknown" | awk -F':' '{print $4,",",$ 2; } | ./get-favicon-md5-count.pl | sort -r -n | less  
nmap -v -sT -iR 100000 -p80 -n -PN --script=http-favicon-get.nse | grep -i "http-favicon.*Unknown" | awk -F':' '{print $4,",",$ 2; } | ./get-favicon-md5-count.pl | sort -r -n | less
 
  
Solution of gathering via DMOZ
+
== Solution of gathering via DMOZ ==
  
Grab the XML file from DMOZ, extract URLs, make them unique and store them in content.url (URL per line):
+
Grab the XML file from DMOZ, extract URLs, make them unique and store them in content.url (URL per line): wget -o /dev/null -O - http://rdf.dmoz.org/rdf/content.rdf.u8.gz | gunzip -dc | ./dmoz-extract-urls.pl -b -f - | sort | uniq &gt; content.url  
wget -o /dev/null -O - http://rdf.dmoz.org/rdf/content.rdf.u8.gz | gunzip -dc | ./dmoz-extract-urls.pl -b -f - | sort | uniq > content.url
 
  
For each URL get MD5 of favicon (if found) and write it to content.md5.url:
+
For each URL get MD5 of favicon (if found) and write it to content.md5.url: ./get-favicon-md5.rb &lt; content.url &gt; content.md5.url Note that perl equivalent of this script did not work due to broken threads in Perl(even in Perl 5.10) and I need threads for this badly (performance!).  
./get-favicon-md5.rb < content.url > content.md5.url
 
Note that perl equivalent of this script did not work due to broken threads in Perl(even in Perl 5.10) and I need threads for this badly (performance!).
 
  
Display sorted list of most frequent MD5 and last URL:
+
Display sorted list of most frequent MD5 and last URL: ./get-favicon-md5-count.pl &lt; content.md5.url | sort -r -n | less  
./get-favicon-md5-count.pl < content.md5.url | sort -r -n | less
 
  
...and that's it. But if you're brave enough, you can do it all at once:
+
...and that's it. But if you're brave enough, you can do it all at once: wget -o /dev/null -O - http://rdf.dmoz.org/rdf/content.rdf.u8.gz | gunzip -dc | ./dmoz-extract-urls.pl -b -f - | sort | uniq | ./get-favicon-md5.rb | ./get-favicon-md5-count.pl | sort -r -n | less  
wget -o /dev/null -O - http://rdf.dmoz.org/rdf/content.rdf.u8.gz | gunzip -dc | ./dmoz-extract-urls.pl -b -f - | sort | uniq | ./get-favicon-md5.rb | ./get-favicon-md5-count.pl | sort -r -n | less
 
  
Notes
+
== Notes ==
  
 
I have limited gathering scripts to fetch only favicon.ico from server root (i.e. /favicon.ico). So scripts will not parse HTML directives in order to find location of favicon. Reason is: simplicity.  
 
I have limited gathering scripts to fetch only favicon.ico from server root (i.e. /favicon.ico). So scripts will not parse HTML directives in order to find location of favicon. Reason is: simplicity.  
  
==== Project Identification ====
+
==== Project Identification ====
 
 
[[Category:OWASP Project]]
 
[[Category:OWASP Tool]]
 
[[Category:OWASP Alpha Quality Tool]]
 
[[Category:OWASP Project|Favicon Database Project]]
 
  
 
{{Template:OWASP Project Identification Tab
 
{{Template:OWASP Project Identification Tab
Line 152: Line 141:
 
}}  
 
}}  
  
__NOTOC__
+
__NOTOC__ <headertabs />  
<headertabs/>
+
 
 +
'''Project's License: '''
  
'''Project's License: '''
+
[[Category:OWASP_Project|Favicon Database Project]] [[Category:OWASP_Tool]] [[Category:OWASP_Alpha_Quality_Tool]]

Revision as of 18:49, 26 August 2009

Main

Idea is to have software enumerated via favicon.ico. How to do that? Take MD5 of favicon.ico and compare it against the known database. This article is description of building the MD5 database of most popular/frequent favicon.ico.

I wrote .nse script for nmap to perform enumeration of software via favicon.ico. I've noticed that there is very small database of existing MD5 fingerprints of favicon.ico and also most of the current md5 fingerprinting implementations have only web server enumeration, I have added also some popular CMS, wikis, etc. I added some of them manually, but it's boring process. Fyodor suggested that we should do internet wide scan (nmap -iR 0) and gather the statistics and MD5 fingerprints of most usual favicons.ico and document them.

So, I have started the adventure of getting the statistics of MD5 fingerprints of most usual favicons.ico. I have faced problems how to enumerate http(s) hosts on Internet. I recognized two types of http servers which I want to cover. First type is http servers on network devices and appliances and the second type is normal web servers with virtual hosts support.

First type is more or less straightforward to cover (with nmap -p80,443 -iR). But the second type is problematic. Because there is no straight way (ignore services like mynetworkneighborhood or live.msn.com, they only know what they crawled) of knowing all virtual hosts from the IP address you have (so you cannot do nmap -iR stuff).

One of the ideas was to extract all links from wikipedia. From my viewpoint, Wikipedia is not good source. They started to remove http:// references from the usual articles and only top 5 (or some other number) links they put on External references in articles. I did small research and I found out that the best source for this would be Open Directory Project (DMOZ). It's interesting that DMOZ have XML files of their whole directory located here. They even have nice format to do so.

Note that I did not want to do only DMOZ gathering or only nmap -iR gathering. With only DMOZ favicon gathering, I would lose favicons from network and appliance as usually they are not entered into DMOZ. And with only nmap -iR gathering, I would lose virtual hosts as there is no easy way of enumerating of all virtual hosts behind specific IP. So, I'm doing it both because I want to cover all possible cases.

Solution of gathering via nmap -iR

This solution gathers all favicons from port 80 as example.

Gather the data using modified version of favicon.nse: nmap -v -sT -iR 0 -p80 -n -PN --script=http-favicon-get.nse -oN nmap-p80-ir-favicon

Extract only MD5 and IP from nmap output: grep -i "http-favicon.*Unknown" nmap-p80-ir-favicon | awk -F':' '{print $4,",",$2; } > content-p80.md5.url

Display sorted list of most frequent MD5 and last IP: ./get-favicon-md5-count.pl < content-p80.md5.url | sort -r -n | less

...and that's it. But if you're brave enough, you can do it all at once (note that -iR number then must be specified): nmap -v -sT -iR 100000 -p80 -n -PN --script=http-favicon-get.nse | grep -i "http-favicon.*Unknown" | awk -F':' '{print $4,",",$ 2; } | ./get-favicon-md5-count.pl | sort -r -n | less

Solution of gathering via DMOZ

Grab the XML file from DMOZ, extract URLs, make them unique and store them in content.url (URL per line): wget -o /dev/null -O - http://rdf.dmoz.org/rdf/content.rdf.u8.gz | gunzip -dc | ./dmoz-extract-urls.pl -b -f - | sort | uniq > content.url

For each URL get MD5 of favicon (if found) and write it to content.md5.url: ./get-favicon-md5.rb < content.url > content.md5.url Note that perl equivalent of this script did not work due to broken threads in Perl(even in Perl 5.10) and I need threads for this badly (performance!).

Display sorted list of most frequent MD5 and last URL: ./get-favicon-md5-count.pl < content.md5.url | sort -r -n | less

...and that's it. But if you're brave enough, you can do it all at once: wget -o /dev/null -O - http://rdf.dmoz.org/rdf/content.rdf.u8.gz | gunzip -dc | ./dmoz-extract-urls.pl -b -f - | sort | uniq | ./get-favicon-md5.rb | ./get-favicon-md5-count.pl | sort -r -n | less

Notes

I have limited gathering scripts to fetch only favicon.ico from server root (i.e. /favicon.ico). So scripts will not parse HTML directives in order to find location of favicon. Reason is: simplicity.

Project Identification

PROJECT INFO
What does this OWASP project offer you?
RELEASE(S) INFO
What does this OWASP project release offer you?
what is this project?
OWASP Favicon Database Project

Purpose: Software enumeration via favicon.ico

License: N/A

who is working on this project?
Project Leader: Vlatko Kosturjak @

Project Maintainer: Vlatko Kosturjak @

Project Contributor(s):

  • Fyodor
  • Brandon Enright
  • Kris Katterjohn
how can you learn more?
Project Pamphlet: N/A

3x slide Project Presentation: N/A

Mailing list: Subscribe or read the archives

Project Roadmap: N/A

Main links:

Project Health: Yellow button.JPG Not Reviewed (Provisional)
To be reviewed under Assessment Criteria v2.0

Key Contacts
  • Contact Vlatko Kosturjak @ to contribute, review or sponsor this project
  • Contact the GPC to report a problem or concern about this project or to update information.
current release
First Release - Unknown Date - (no download available)

Release Leader: Vlatko Kosturjak @

Release details: N/A :

Rating: Yellow button.JPG Not Reviewed
To be reviewed under Assessment Criteria v2.0


Project's License:

Pages in category "OWASP Favicon Database Project"

The following 3 pages are in this category, out of 3 total.