Blocking scanner traffic at Nginx is the second-biggest win I have had auditing Statamic static caches, after stripping UTM tracking parameters. On a site with around 340 real pages, the cache had grown to 1,407 URLs. The bulk of the gap was bots probing for WordPress paths and PHP exploits, each one quietly cached as if it were a real page.

What was actually in there

I ran the same Statamic\Facades\StaticCache::driver()->getUrls() script as last time, but tweaked the grouping to bucket URLs by their first path segment instead of by query string:

php artisan tinker --execute='
$urls = Statamic\Facades\StaticCache::driver()->getUrls();

$bySegment = $urls->map(function ($u) {
    $path = parse_url($u, PHP_URL_PATH) ?? "/";
    $parts = array_values(array_filter(explode("/", $path)));
    return $parts[0] ?? "(root)";
})->countBy()->sortDesc();

echo "Top 30 path segments:\n";
$bySegment->take(30)->each(fn($c, $s) => printf("%6d  /%s\n", $c, $s));
'

The top of the list told the whole story:

Top 30 path segments:
   103  /blog
    96  /wp-includes
    56  /wp-admin
    48  /wp-content
    44  /case-studies
    41  /guides
    28  /glossary
    20  /__shared-errors
    13  /services
    13  /testimonials
    12  /perch
    12  /api
     6  /admin
     6  /config
     5  /_ignition
     5  /cgi-bin
     5  /index.php
     4  /vendor

WordPress probes, Perch CMS scanners, Laravel Ignition exploit attempts, random PHP shell filenames. None of them belong to anything the site actually serves. All of them had been generated, returned to a scanner, and quietly written to disk as cache entries.

I keep an eye on this kind of thing as part of website maintenance on the Statamic sites I look after.

Block at the edge, not at the cache

Statamic does ship a static_caching.exclude.urls config that keeps specific paths out of the cache. That works, but it is the wrong layer.

By the time Statamic checks an exclude rule, the request has already passed through Nginx, booted Laravel, started a PHP-FPM worker, and run through middleware. For a bot hitting fifty random URLs in a few seconds, that is a lot of work for a request you are going to throw away.

The cleaner fix is to never let those requests reach PHP.

The Nginx config

On a Ploi-managed site, custom rules live in the per-site Server include directory. In the Ploi NGINX management UI, that is the Server section in the right sidebar. Add a new file called scanner-blocks.conf:

# WordPress probes
location ~* ^/(wp-admin|wp-content|wp-includes|wp-login|wp-json|wp-config|xmlrpc) {
    access_log off;
    log_not_found off;
    return 444;
}

# Other CMS and control-panel probes
location ~* ^/(perch|cpanel|phpmyadmin|cgi-bin|administrator) {
    access_log off;
    log_not_found off;
    return 444;
}

# Laravel Ignition RCE scans
location ~* ^/_ignition {
    access_log off;
    log_not_found off;
    return 444;
}

# Credential and config file probes
location ~* /(database|config|sa-private-key|aws|credentials|secrets)\.(php|json|yml|yaml|sql|env|bak)$ {
    access_log off;
    log_not_found off;
    return 444;
}

Save in Ploi and it runs nginx -t then reloads automatically. If the config has a syntax error, Ploi shows you the error and refuses to apply it.

The return 444 directive is the interesting bit. It closes the connection without sending any response at all. No status code, no body, no headers. From the scanner's point of view, the request just dies. That is exactly what I want for hostile traffic. If you would rather see rejections in your error log for audit, swap 444 for 404 and remove the access_log off lines.

Clear the cache and watch the count fall

With Nginx now rejecting scanner traffic before it reaches Statamic, clear the polluted entries:

php please static:clear

Re-run the original tinker a day later. On this site the URL count dropped from 1,407 to around 380. That number lines up with the actual content: collection entries, taxonomy pages, pagination, plus a handful of index pages.

If your static cache is bigger than your sitemap, scanners are probably the reason.

Why this matters

Two reasons, beyond the URL count itself.

First, every cached scanner URL is a file on disk. On a server hosting several Statamic sites, that turns into a lot of meaningless files in public/static/ to back up, rsync, and eventually clean up. Multiply by the number of scanner waves over a year and the disk impact is real.

Second, even cached responses to scanner traffic cost something. A polluted cache means more entries to invalidate when content changes, more files to walk during warming, and more PHP work overall. Blocking at Nginx returns the request in microseconds with no PHP touched. The scanner gets nothing useful back. The cache stays clean. The CP stays fast.

Same as the UTM fix, this is the kind of drift that does not show up until you go looking. The cache works, the site works, nobody complains. A year later the cache is full of /wp-admin entries and the framework cache directory is several gigabytes, all from traffic the site should never have answered.

Blocking scanner traffic at Nginx for a leaner Statamic cache

What was actually in there

Block at the edge, not at the cache

The Nginx config

Clear the cache and watch the count fall

Why this matters

You might also like...

Get a measurably better website