I haven’t started doing it yet but I plan to block every bot that doesn’t benefit my target audience. What’s my target audience? Western, English-speaking countries and the Philippines, where I live. There are more Filipinos who speak English than Filipinos who don’t.
Search engine crawlers catering to other regions don’t benefit my target audience. Search engine optimization (SEO) bots don’t benefit my target audience. Countries where English as a second language is less than 10 percent aren’t included in my target audience.
My primary audience is the United States. I will obviously avoid blocking Google, Bing (including the MSN bot) and Yahoo! Which ones will I block? Here’s a short list:
I don’t like blocking entire countries, not even China. I’ll only do it if it seems like I don’t have a choice. I used to block the Baiduspider along with some other Chinese search engine bots and that took care of most of the Chinese traffic.
With China, however, I have to block certain IP address ranges being used by one bot or another cloaked as web browser user agents. Here’s another short list:
There are other Chinese search engines but I haven’t seen the bots for them in more than a year.
If I allow any SEO bot, I’m giving away my secrets. Paid optimizers will know what terms to target to rank higher than me for anything I write about. I don’t use SEO services at all. Here’s a short list of those I’ll block:
There are hundreds, perhaps thousands, of bots that seem to have confusing purposes. Some are research bots, some are archiving bots and some seem to have no purpose at all.
I will block any bot that identifies itself as a bot but doesn’t have a web page in the user agent string. How am I supposed to know the purpose without it?
Some bots are advertising agencies. When I no longer need to use any advertising on my site (when I no longer want to make money from it), I’ll block them too.
I have to block these two Russian bots:
I don’t know what they’re used for. I plan to start blocking them along with all the others I listed tomorrow.