Website owners want to keep the bad guys away. That’s a given. Unfortunately, we can’t block ranges of data center IP addresses like we could a few years ago. Well, we can, but then we have to suffer the consequences.
All kinds of people and entities use data center IP addresses. Many have to, especially businesses. Regular, residential people use them now even if they don’t know it. VPN services abound as well as proxy services.
I use a VPN myself, on occasion, for privacy reasons. Not very often, though, because I don’t need it very often. If I want to get around country blocks, I can use the Epic Privacy Browser. It uses ranges of data center IP addresses in multiple countries. In fact, the first time I used it, I checked the settings and found out it was using a proxy range in France.
I’ve been blocking bots for ages. My robots.txt allows specific bots and disallows the rest, like this:
The actually blocking occurs with my Nginx application firewall routines. I used to have a bunch of data center IP addresses in it. I had a CIDR range for just about every data center in existence. I removed all of them and watched my web traffic start to increase daily. Without realizing it, I had been blocking real human visitors.
I now have a few CIDR ranges in the file. Five, I think, without looking at it. They’re unidentified crawlers using plain clothes web browser user agents. Three are from Hetzner Online, based in Germany. Other than those, I block undesirable bots using their user agents. I particularly dislike the backlink checkers, like AhrefsBot, because you have to pay to use the information they get for free.
If you don’t have access to the access and error logs, you need to get access. Reviewing the logs every day is the only way to spot the bad actors. If you serve static pages like I do, you don’t have to worry as much as when you use a database server. If you use a database server, you have to cache your pages. You don’t have a choice. Well, unless you actually want your server to be inaccessible at times.
Most vulnerability scans can be ignored as long as you keep your software up-to-date. If you use static pages or have a good caching scheme, you probably don’t have to worry about so many hits that it brings your server down. Again, you can’t spot the bad actors unless you diligently scan the logs every day.
Some services are now using IP addresses reserved for residential customers. You can’t win the Internet by blocking data center IP addresses anymore. Articles like this even tell people how to scrape without getting blacklisted.