RT Cunningham


Blocking Data Center IP Addresses is Bad for Web Business

data center

Website owners want to keep the bad guys away. That’s a given. Unfortunately, we can’t block ranges of data center IP addresses like we could a few years ago. Well, we can, but then we have to suffer the consequences.

Data Center IP Address Usage

All kinds of people and entities use data center IP addresses. Many have to, especially businesses. Regular, residential people use them now even if they don’t know it. VPN services abound as well as proxy services.

I use a VPN myself, on occasion, for privacy reasons. Not very often, though, because I don’t need it very often. If I want to get around country blocks, I can use the Epic Privacy Browser. It uses ranges of data center IP addresses in multiple countries. In fact, the first time I used it, I checked the settings and found out it was using a proxy range in France.

Blocking People as Well as Bots

I’ve been blocking bots for ages. My robots.txt allows specific bots and disallows the rest, like this:

User-agent: bingbot
Disallow:

User-agent: Googlebot
Disallow:

User-agent: *
Disallow: /

sitemap: https://www.rtcx.net/sitemap.xml

The actually blocking occurs with my Nginx application firewall routines. I used to have a bunch of data center IP addresses in it. I had a CIDR range for just about every data center in existence. I removed all of them and watched my web traffic start to increase daily. Without realizing it, I had been blocking real human visitors.

I now have a few CIDR ranges in the file. Five, I think, without looking at it. They’re unidentified crawlers using plain clothes web browser user agents. Three are from Hetzner Online, based in Germany. Other than those, I block undesirable bots using their user agents. I particularly dislike the backlink checkers, like AhrefsBot, because you have to pay to use the information they get for free.

Webmaster Diligence

If you don’t have access to the access and error logs, you need to get access. Reviewing the logs every day is the only way to spot the bad actors. If you serve static pages like I do, you don’t have to worry as much as when you use a database server. If you use a database server, you have to cache your pages. You don’t have a choice. Well, unless you actually want your server to be inaccessible at times.

Most vulnerability scans can be ignored as long as you keep your software up-to-date. If you use static pages or have a good caching scheme, you probably don’t have to worry about so many hits that it brings your server down. Again, you can’t spot the bad actors unless you diligently scan the logs every day.

Some services are now using IP addresses reserved for residential customers. You can’t win the Internet by blocking data center IP addresses anymore. Articles like this even tell people how to scrape without getting blacklisted.

Share:    

RT Cunningham
January 12, 2019
Web Development