This site is the archived OWASP Foundation Wiki and is no longer accepting Account Requests.
To view the new OWASP Foundation website, please visit https://owasp.org

Difference between revisions of "Category:OWASP Favicon Database Project"

From OWASP
Jump to: navigation, search
(added feedback and mailing list info)
(Added preformatted formatting to command line)
Line 13: Line 13:
 
First type is more or less straightforward to cover (with nmap -p80,443 -iR). But the second type is problematic. Because there is no straight way (ignore services like mynetworkneighborhood or live.msn.com, they only know what they crawled) of knowing all virtual hosts from the IP address you have (so you cannot do nmap -iR stuff).  
 
First type is more or less straightforward to cover (with nmap -p80,443 -iR). But the second type is problematic. Because there is no straight way (ignore services like mynetworkneighborhood or live.msn.com, they only know what they crawled) of knowing all virtual hosts from the IP address you have (so you cannot do nmap -iR stuff).  
  
One of the ideas was to extract all links from wikipedia. From my viewpoint, Wikipedia is not good source. They started to remove http:// references from the usual articles and only top 5 (or some other number) links they put on External references in articles. I did small research and I found out that the best source for this would be Open Directory Project (DMOZ). It's interesting that DMOZ have XML files of their whole directory located here. They even have nice format to do so.
+
One of the ideas was to extract all links from wikipedia. From my viewpoint, Wikipedia is not good source. They started to remove http:// references from the usual articles and only top 5 (or some other number) links they put on External references in articles. I did small research and I found out that the best source for this would be Open Directory Project (DMOZ). It's interesting that DMOZ have XML files of their whole directory located here. They even have nice format to do so.  
  
= Solution =
+
= Solution =
  
 
Note that I did not want to do only DMOZ gathering or only nmap -iR gathering. With only DMOZ favicon gathering, I would lose favicons from network and appliance as usually they are not entered into DMOZ. And with only nmap -iR gathering, I would lose virtual hosts as there is no easy way of enumerating of all virtual hosts behind specific IP. So, I'm doing it both because I want to cover all possible cases.  
 
Note that I did not want to do only DMOZ gathering or only nmap -iR gathering. With only DMOZ favicon gathering, I would lose favicons from network and appliance as usually they are not entered into DMOZ. And with only nmap -iR gathering, I would lose virtual hosts as there is no easy way of enumerating of all virtual hosts behind specific IP. So, I'm doing it both because I want to cover all possible cases.  
Line 23: Line 23:
 
This solution gathers all favicons from port 80 as example.  
 
This solution gathers all favicons from port 80 as example.  
  
Gather the data using modified version of favicon.nse: nmap -v -sT -iR 0 -p80 -n -PN --script=http-favicon-get.nse -oN nmap-p80-ir-favicon  
+
Gather the data using modified version of favicon.nse:  
 
+
<pre>nmap -v -sT -iR 0 -p80 -n -PN --script=http-favicon-get.nse -oN nmap-p80-ir-favicon </pre>
Extract only MD5 and IP from nmap output: grep -i "http-favicon.*Unknown" nmap-p80-ir-favicon | awk -F':' '{print $4,",",$2; } &gt; content-p80.md5.url  
+
Extract only MD5 and IP from nmap output:  
 
+
<pre>grep -i "http-favicon.*Unknown" nmap-p80-ir-favicon | awk -F':' '{print $4,",",$2; } &gt; content-p80.md5.url  
Display sorted list of most frequent MD5 and last IP: ./get-favicon-md5-count.pl &lt; content-p80.md5.url | sort -r -n | less  
+
</pre>
 
+
Display sorted list of most frequent MD5 and last IP:  
...and that's it. But if you're brave enough, you can do it all at once (note that -iR number then must be specified): nmap -v -sT -iR 100000 -p80 -n -PN --script=http-favicon-get.nse | grep -i "http-favicon.*Unknown" | awk -F':' '{print $4,",",$ 2; } | ./get-favicon-md5-count.pl | sort -r -n | less  
+
<pre>./get-favicon-md5-count.pl &lt; content-p80.md5.url | sort -r -n | less </pre>
 
+
...and that's it. But if you're brave enough, you can do it all at once (note that -iR number then must be specified):  
 +
<pre>nmap -v -sT -iR 100000 -p80 -n -PN --script=http-favicon-get.nse | grep -i "http-favicon.*Unknown" | awk -F':' '{print $4,",",$ 2; } | ./get-favicon-md5-count.pl | sort -r -n | less  
 +
</pre>
 
== Solution of gathering via DMOZ  ==
 
== Solution of gathering via DMOZ  ==
  
Grab the XML file from DMOZ, extract URLs, make them unique and store them in content.url (URL per line): wget -o /dev/null -O - http://rdf.dmoz.org/rdf/content.rdf.u8.gz | gunzip -dc | ./dmoz-extract-urls.pl -b -f - | sort | uniq &gt; content.url  
+
Grab the XML file from DMOZ, extract URLs, make them unique and store them in content.url (URL per line):  
 
+
<pre>wget -o /dev/null -O - http://rdf.dmoz.org/rdf/content.rdf.u8.gz | gunzip -dc | ./dmoz-extract-urls.pl -b -f - | sort | uniq &gt; content.url </pre>
For each URL get MD5 of favicon (if found) and write it to content.md5.url: ./get-favicon-md5.rb &lt; content.url &gt; content.md5.url Note that perl equivalent of this script did not work due to broken threads in Perl(even in Perl 5.10) and I need threads for this badly (performance!).  
+
For each URL get MD5 of favicon (if found) and write it to content.md5.url:  
 
+
<pre>./get-favicon-md5.rb &lt; content.url &gt; content.md5.url  
Display sorted list of most frequent MD5 and last URL: ./get-favicon-md5-count.pl &lt; content.md5.url | sort -r -n | less
+
</pre>
 
+
Note that perl equivalent of this script did not work due to broken threads in Perl(even in Perl 5.10) and I need threads for this badly (performance!).  
...and that's it. But if you're brave enough, you can do it all at once: wget -o /dev/null -O - http://rdf.dmoz.org/rdf/content.rdf.u8.gz | gunzip -dc | ./dmoz-extract-urls.pl -b -f - | sort | uniq | ./get-favicon-md5.rb | ./get-favicon-md5-count.pl | sort -r -n | less
 
  
 +
Display sorted list of most frequent MD5 and last URL:
 +
<pre>./get-favicon-md5-count.pl &lt; content.md5.url | sort -r -n | less </pre>
 +
...and that's it. But if you're brave enough, you can do it all at once:
 +
<pre>wget -o /dev/null -O - http://rdf.dmoz.org/rdf/content.rdf.u8.gz | gunzip -dc | ./dmoz-extract-urls.pl -b -f - | sort | uniq | ./get-favicon-md5.rb | ./get-favicon-md5-count.pl | sort -r -n | less </pre>
 
== Notes  ==
 
== Notes  ==
  
 
I have limited gathering scripts to fetch only favicon.ico from server root (i.e. /favicon.ico). So scripts will not parse HTML directives in order to find location of favicon. Reason is: simplicity.  
 
I have limited gathering scripts to fetch only favicon.ico from server root (i.e. /favicon.ico). So scripts will not parse HTML directives in order to find location of favicon. Reason is: simplicity.  
  
= Feedback and Participation =
+
= Feedback and Participation =
  
We hope you find the information in the OWASP Favicon Database project useful. Please contribute back to the project by sending your comments, questions, and suggestions to the OWASP Favicon mailing list. Thanks!
+
We hope you find the information in the OWASP Favicon Database project useful. Please contribute back to the project by sending your comments, questions, and suggestions to the OWASP Favicon mailing list. Thanks!  
  
 
To join the OWASP Testing mailing list or view the archives, please visit the [subscription page|https://lists.owasp.org/mailman/listinfo/owasp-favicon-database].  
 
To join the OWASP Testing mailing list or view the archives, please visit the [subscription page|https://lists.owasp.org/mailman/listinfo/owasp-favicon-database].  

Revision as of 19:50, 26 August 2009

Main

Idea is to have software enumerated via favicon.ico. How to do that? Take MD5 of favicon.ico and compare it against the known database. This article is description of building the MD5 database of most popular/frequent favicon.ico.

I wrote .nse script for nmap to perform enumeration of software via favicon.ico. I've noticed that there is very small database of existing MD5 fingerprints of favicon.ico and also most of the current md5 fingerprinting implementations have only web server enumeration, I have added also some popular CMS, wikis, etc. I added some of them manually, but it's boring process. Fyodor suggested that we should do internet wide scan (nmap -iR 0) and gather the statistics and MD5 fingerprints of most usual favicons.ico and document them.

So, I have started the adventure of getting the statistics of MD5 fingerprints of most usual favicons.ico. I have faced problems how to enumerate http(s) hosts on Internet. I recognized two types of http servers which I want to cover. First type is http servers on network devices and appliances and the second type is normal web servers with virtual hosts support.

First type is more or less straightforward to cover (with nmap -p80,443 -iR). But the second type is problematic. Because there is no straight way (ignore services like mynetworkneighborhood or live.msn.com, they only know what they crawled) of knowing all virtual hosts from the IP address you have (so you cannot do nmap -iR stuff).

One of the ideas was to extract all links from wikipedia. From my viewpoint, Wikipedia is not good source. They started to remove http:// references from the usual articles and only top 5 (or some other number) links they put on External references in articles. I did small research and I found out that the best source for this would be Open Directory Project (DMOZ). It's interesting that DMOZ have XML files of their whole directory located here. They even have nice format to do so.

Note that I did not want to do only DMOZ gathering or only nmap -iR gathering. With only DMOZ favicon gathering, I would lose favicons from network and appliance as usually they are not entered into DMOZ. And with only nmap -iR gathering, I would lose virtual hosts as there is no easy way of enumerating of all virtual hosts behind specific IP. So, I'm doing it both because I want to cover all possible cases.

Solution of gathering via nmap -iR

This solution gathers all favicons from port 80 as example.

Gather the data using modified version of favicon.nse:

nmap -v -sT -iR 0 -p80 -n -PN --script=http-favicon-get.nse -oN nmap-p80-ir-favicon 

Extract only MD5 and IP from nmap output:

grep -i "http-favicon.*Unknown" nmap-p80-ir-favicon | awk -F':' '{print $4,",",$2; } > content-p80.md5.url 

Display sorted list of most frequent MD5 and last IP:

./get-favicon-md5-count.pl < content-p80.md5.url | sort -r -n | less 

...and that's it. But if you're brave enough, you can do it all at once (note that -iR number then must be specified):

nmap -v -sT -iR 100000 -p80 -n -PN --script=http-favicon-get.nse | grep -i "http-favicon.*Unknown" | awk -F':' '{print $4,",",$ 2; } | ./get-favicon-md5-count.pl | sort -r -n | less 

Solution of gathering via DMOZ

Grab the XML file from DMOZ, extract URLs, make them unique and store them in content.url (URL per line):

wget -o /dev/null -O - http://rdf.dmoz.org/rdf/content.rdf.u8.gz | gunzip -dc | ./dmoz-extract-urls.pl -b -f - | sort | uniq > content.url 

For each URL get MD5 of favicon (if found) and write it to content.md5.url:

./get-favicon-md5.rb < content.url > content.md5.url 

Note that perl equivalent of this script did not work due to broken threads in Perl(even in Perl 5.10) and I need threads for this badly (performance!).

Display sorted list of most frequent MD5 and last URL:

./get-favicon-md5-count.pl < content.md5.url | sort -r -n | less 

...and that's it. But if you're brave enough, you can do it all at once:

wget -o /dev/null -O - http://rdf.dmoz.org/rdf/content.rdf.u8.gz | gunzip -dc | ./dmoz-extract-urls.pl -b -f - | sort | uniq | ./get-favicon-md5.rb | ./get-favicon-md5-count.pl | sort -r -n | less 

Notes

I have limited gathering scripts to fetch only favicon.ico from server root (i.e. /favicon.ico). So scripts will not parse HTML directives in order to find location of favicon. Reason is: simplicity.

We hope you find the information in the OWASP Favicon Database project useful. Please contribute back to the project by sending your comments, questions, and suggestions to the OWASP Favicon mailing list. Thanks!

To join the OWASP Testing mailing list or view the archives, please visit the [subscription page|https://lists.owasp.org/mailman/listinfo/owasp-favicon-database].

Project Identification

PROJECT INFO
What does this OWASP project offer you?
RELEASE(S) INFO
What does this OWASP project release offer you?
what is this project?
OWASP Favicon Database Project

Purpose: Software enumeration via favicon.ico

License: N/A

who is working on this project?
Project Leader: Vlatko Kosturjak @

Project Maintainer: Vlatko Kosturjak @

Project Contributor(s):

  • Fyodor
  • Brandon Enright
  • Kris Katterjohn
how can you learn more?
Project Pamphlet: N/A

3x slide Project Presentation: N/A

Mailing list: Subscribe or read the archives

Project Roadmap: N/A

Main links:

Project Health: Yellow button.JPG Not Reviewed (Provisional)
To be reviewed under Assessment Criteria v2.0

Key Contacts
  • Contact Vlatko Kosturjak @ to contribute, review or sponsor this project
  • Contact the GPC to report a problem or concern about this project or to update information.
current release
First Release - Unknown Date - (no download available)

Release Leader: Vlatko Kosturjak @

Release details: N/A :

Rating: Yellow button.JPG Not Reviewed
To be reviewed under Assessment Criteria v2.0


Project's License:

Pages in category "OWASP Favicon Database Project"

The following 3 pages are in this category, out of 3 total.