Blocking scanner traffic at Nginx for a leaner Statamic cache
Blocking scanner traffic at Nginx is the second-biggest win I have had auditing Statamic static caches, after stripping UTM tracking parameters. On a site with around 340 real pages, the cache had grown to 1,407 URLs. The bulk of the gap was bots probing for WordPress paths and PHP exploits, each one quietly cached as if it were a real page.
What was actually in there
I ran the same Statamic\Facades\StaticCache::driver()->getUrls() script as last time, but tweaked the grouping to bucket URLs by their first path segment instead of by query string:
php artisan tinker --execute='
$urls = Statamic\Facades\StaticCache::driver()->getUrls();
$bySegment = $urls->map(function ($u) {
$path = parse_url($u, PHP_URL_PATH) ?? "/";
$parts = array_values(array_filter(explode("/", $path)));
return $parts[0] ?? "(root)";
})->countBy()->sortDesc();
echo "Top 30 path segments:\n";
$bySegment->take(30)->each(fn($c, $s) => printf("%6d /%s\n", $c, $s));
'The top of the list told the whole story:
Top 30 path segments:
103 /blog
96 /wp-includes
56 /wp-admin
48 /wp-content
44 /case-studies
41 /guides
28 /glossary
20 /__shared-errors
13 /services
13 /testimonials
12 /perch
12 /api
6 /admin
6 /config
5 /_ignition
5 /cgi-bin
5 /index.php
4 /vendorWordPress probes, Perch CMS scanners, Laravel Ignition exploit attempts, random PHP shell filenames. None of them belong to anything the site actually serves. All of them had been generated, returned to a scanner, and quietly written to disk as cache entries.
I keep an eye on this kind of thing as part of website maintenance on the Statamic sites I look after.
Block at the edge, not at the cache
Statamic does ship a static_caching.exclude.urls config that keeps specific paths out of the cache. That works, but it is the wrong layer.
By the time Statamic checks an exclude rule, the request has already passed through Nginx, booted Laravel, started a PHP-FPM worker, and run through middleware. For a bot hitting fifty random URLs in a few seconds, that is a lot of work for a request you are going to throw away.
The cleaner fix is to never let those requests reach PHP.
The Nginx config
On a Ploi-managed site, custom rules live in the per-site Server include directory. In the Ploi NGINX management UI, that is the Server section in the right sidebar. Add a new file called scanner-blocks.conf:
# WordPress probes
location ~* ^/(wp-admin|wp-content|wp-includes|wp-login|wp-json|wp-config|xmlrpc) {
access_log off;
log_not_found off;
return 444;
}
# Other CMS and control-panel probes
location ~* ^/(perch|cpanel|phpmyadmin|cgi-bin|administrator) {
access_log off;
log_not_found off;
return 444;
}
# Laravel Ignition RCE scans
location ~* ^/_ignition {
access_log off;
log_not_found off;
return 444;
}
# Credential and config file probes
location ~* /(database|config|sa-private-key|aws|credentials|secrets)\.(php|json|yml|yaml|sql|env|bak)$ {
access_log off;
log_not_found off;
return 444;
}Save in Ploi and it runs nginx -t then reloads automatically. If the config has a syntax error, Ploi shows you the error and refuses to apply it.
The return 444 directive is the interesting bit. It closes the connection without sending any response at all. No status code, no body, no headers. From the scanner's point of view, the request just dies. That is exactly what I want for hostile traffic. If you would rather see rejections in your error log for audit, swap 444 for 404 and remove the access_log off lines.
Clear the cache and watch the count fall
With Nginx now rejecting scanner traffic before it reaches Statamic, clear the polluted entries:
php please static:clearRe-run the original tinker a day later. On this site the URL count dropped from 1,407 to around 380. That number lines up with the actual content: collection entries, taxonomy pages, pagination, plus a handful of index pages.
If your static cache is bigger than your sitemap, scanners are probably the reason.
Why this matters
Two reasons, beyond the URL count itself.
First, every cached scanner URL is a file on disk. On a server hosting several Statamic sites, that turns into a lot of meaningless files in public/static/ to back up, rsync, and eventually clean up. Multiply by the number of scanner waves over a year and the disk impact is real.
Second, even cached responses to scanner traffic cost something. A polluted cache means more entries to invalidate when content changes, more files to walk during warming, and more PHP work overall. Blocking at Nginx returns the request in microseconds with no PHP touched. The scanner gets nothing useful back. The cache stays clean. The CP stays fast.
Same as the UTM fix, this is the kind of drift that does not show up until you go looking. The cache works, the site works, nobody complains. A year later the cache is full of /wp-admin entries and the framework cache directory is several gigabytes, all from traffic the site should never have answered.
You might also like...
- Why my Statamic static cache hit 2.3GB (and how I fixed it)
- Introducing Sentinel: the Statamic monitoring tool I built for myself, now free for everyone
- An npm Supply Chain Attack Just Hit One of the Most Popular Packages on the Internet
- Statamic 6: what's new and why it matters
- What does website maintenance include?
- Why Your Website Changes Don't Appear Instantly
- Business Websites Have a Running Cost. Here's Why That's a Good Thing.