Web Page URL:
http://www.registrar.ualberta.ca/awards/
|
Most web crawlers would see this
page as being about |
| The web page is visited by a the web crawler. Some web crawlers do not follow redirected URL's. Most crawlers do not read frames pages or JavaScript. |
Our web crawler has just visited your web site.
View Header Information
|
The web crawler extracts the information about the web page (meta-data). HTML is removed so that the crawler can evaluate just the text on the page. Improper HTML coding can confuse the crawler. |
| The Description of the page has already been saved, but to categorize the web page, the crawler looks at the page text and filters out the unimportant "noise" words like "the", "we", and "are". You many view the common noise words. |
| The words found on your web page are counted and the ratios of the use of these words on your page are checked (called word density). This provides the best indicator of what your pages is about. |
|
| Now the crawler finds all the hyperlinks on your page and adds them to the list of pages to visit. |
|
|