I am about to undertake a large proxy type project, and an important aspect of it is scraping websites for keywords in order to display relevant ads, similar to what google adsense does, but with advertiser's own ads that they bid on in certain categories. I'm looking to get a brainstorm going on how I should efficiently gather relevant keywords on a web page and then consequently place that page into a premade category structure. This way, my "proxy" will display an advertisement from that category. Thoughts? (What I have for the database so far): Domains --id --domain --counter --cat_id Pages --id --domain_id --counter --cat_id Categories --id --name --parent_id Keywords --id --keyword --cat_id When one page (usually the index) of a domain is categorised, it will also put all other pages on the domain in that category until they can be categorised themselves.