Web Page URL:
http://www.mcluhan.utoronto.ca

Most web crawlers would see this page as being about
'Program', 'McLuhan', 'Culture', 'House', 'Coach' and 'Technology'.
This is how a web crawler would make that determination:


  The web page is visited by a the web crawler.  Some web crawlers do not follow redirected URL's.  Most crawlers do not read frames pages or JavaScript.  

Our web crawler has just visited your web site.
View Header Information

 

The web crawler extracts the information about the web page (meta-data).  HTML is removed so that the crawler can evaluate just the text on the page.  Improper HTML coding can confuse the crawler.

 

View Meta Data


  The Description of the page has already been saved, but to categorize the web page, the crawler looks at the page text and filters out the unimportant "noise" words like "the", "we", and "are".  You many view the common noise words.  

View Filtered Body Text


  The words found on your web page are counted and the ratios of the use of these words on your page are checked (called word density).  This provides the best indicator of what your pages is about.

 

View Word Density


  Now the crawler finds all the hyperlinks on your page and adds them to the list of pages to visit.  

View Links