Best keywords to use for scraping websites

Just in case any of you are new to scraping, I've found that it relies heavily on your footprints and almost more importantly, your keywords list. You'll hear a lot of talk about footprints, but not so much about the keywords list and I think it's a negligent thing to overlook.

When I'm creating a keywords list, I think about what keywords are common on websites.
"about us"
"about"
"contact us"
"contact"
"terms of service"
"privacy policy"

^ These are great, because almost all websites have them, or similar variations of them. But then you can dig deeper and think about what keywords are common to registration pages.
"username"
"username:"
"password"
"captcha"

^ These are almost even better, because you can visit the sites you are trying to scrape and see the exact words that they use on their registration pages.

This same technique can be used repeatedly until you have a list of 1,000 - 3,000 keywords to use for scraping. When you have good public proxies, you can scrape millions of sites/URL's in under a few hours.

Log in or Sign up

Best keywords to use for scraping websites

DylanBozeman Active Member

Log in or Sign up

Best keywords to use for scraping websites

DylanBozeman Active Member

Useful Searches