Operations¶
This page covers running daycry/jobs v3 in production: keeping long-running workers alive under a
process supervisor, shutting them down cleanly, scaling out to many workers, running the periodic
reaper, the operational behaviour of the circuit breaker and rate limiter, the dead-letter queue,
and observability via the metrics collector.
It assumes you understand the queue model from Queues & Backends and the worker commands from CLI Commands.
Running long-running workers¶
In production you run one or more jobs:queue:work processes per queue, each kept alive by a process
supervisor so it restarts automatically on exit (crash, deploy, OOM, or a graceful stop). The worker
itself runs an unbounded loop when invoked without --once/--max.
Supervisor¶
A typical supervisord program. numprocs runs several identical workers against the same queue
(see Scaling out):
[program:jobs-reports]
command=php /var/www/app/spark jobs:queue:work reports --backend redis
directory=/var/www/app
user=www-data
numprocs=4
process_name=%(program_name)s_%(process_num)02d
autostart=true
autorestart=true
startsecs=3
stopsignal=TERM
stopwaitsecs=3600
stdout_logfile=/var/log/jobs/reports.out.log
stderr_logfile=/var/log/jobs/reports.err.log
Warning: Set
stopwaitsecs(Supervisor) higher than your longest job runtime. The worker finishes the in-flight job before exiting onSIGTERM; if the supervisor force-kills it first (SIGKILL) the job is interrupted and will be redelivered after its visibility timeout.
systemd¶
A templated unit (jobs-worker@.service) so you can start one instance per queue with
systemctl start jobs-worker@reports:
[Unit]
Description=Jobs queue worker (%i)
After=network.target
[Service]
Type=simple
User=www-data
WorkingDirectory=/var/www/app
ExecStart=/usr/bin/php /var/www/app/spark jobs:queue:work %i --backend redis
Restart=always
RestartSec=3
# Allow the in-flight job to finish before SIGKILL on stop/restart.
TimeoutStopSec=3600
KillSignal=SIGTERM
[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable --now jobs-worker@reports
systemctl enable --now jobs-worker@emails
Note: Restart your workers on every deploy. A long-running PHP process keeps the old code (and a warm
Config\Jobs) in memory until it restarts, so new handler code or config changes are not picked up by a worker that keeps running across the deploy.
Graceful shutdown and signals¶
The worker installs handlers for SIGTERM and SIGINT (POSIX, requires the pcntl extension).
On receipt it:
- Prints
stop signal received, finishing current cycle.... - Sets an internal stop flag (checked at the top of every loop iteration).
- Finishes the current cycle — an in-flight job is run to completion and settled normally.
- Prints
graceful shutdown complete.and exits withSUCCESS.
This means a deploy/restart never aborts a running job mid-flight; the worker simply stops pulling new work and exits.
Warning: On platforms without
pcntl(notably Windows), the worker cannot trap signals. Bound such runs with--onceor--max Nand re-invoke from a scheduler, or stop the process externally between cycles. Always give your supervisor enough stop-grace time (stopwaitsecs/TimeoutStopSec) to exceed the longest job runtime.
Scaling out¶
Because the queue contract is lease-based and claims are atomic, you scale throughput simply by running more workers against the same queue:
- The database backend claims rows with
FOR UPDATE SKIP LOCKED(optimistic fallback for SQLite), so concurrent workers never grab the same row. - The redis backend moves messages atomically with
RPOPLPUSHinto a per-message processing entry, so a message is leased by exactly one worker. - beanstalk and serviceBus reserve/peek-lock each message server-side.
Run dedicated worker pools per queue so a slow queue does not starve a fast one:
# 4 workers on 'emails', 2 on 'reports'
php spark jobs:queue:work emails --backend redis # x4 under the supervisor
php spark jobs:queue:work reports --backend redis # x2 under the supervisor
Warning: With multiple workers, delivery is at-least-once and a message can be processed more than once (after a crash + reaper recovery, or a lease expiry). Make handlers idempotent — use
idempotencyKey()for built-in de-duplication. See Idempotency.
The periodic reaper¶
A worker that crashes between fetch() and ack() leaves its message leased and invisible until the
visibility timeout elapses. Run jobs:queue:reap <queue> periodically (every minute is typical) to
return such messages to the ready state. This is required for the database and redis backends;
beanstalk and Service Bus recover natively.
# System cron, once a minute per queue
* * * * * cd /var/www/app && php spark jobs:queue:reap reports >> /dev/null 2>&1
* * * * * cd /var/www/app && php spark jobs:queue:reap emails --backend redis >> /dev/null 2>&1
The visibility timeout used is redisProcessingVisibilityTimeout for the redis backend and
databaseVisibilityTimeout otherwise (both default 300s).
Warning — visibility timeout must exceed runtime. If a job's real runtime can exceed the visibility timeout, the reaper (or the broker, for beanstalk TTR / Service Bus lock) will treat the still-running worker as crashed and redeliver the message, causing a duplicate execution. Always set the visibility timeout (and beanstalk TTR /
serviceBusLockTimeout) greater than your longest expected job runtime, with headroom. For redis, a long-running worker can also extend its lease by callingRedisBackend::renewLease().
Circuit breaker¶
The worker wraps each cycle in a per-queue CircuitBreaker (cache-backed, so state persists across
worker restarts). It protects an unhealthy backend from being hammered:
- Closed (normal): failures are counted. After
Config\Jobs::$circuitBreakerThresholdconsecutive backend errors the circuit opens. - Open: cycles are skipped for
Config\Jobs::$circuitBreakerCooldownseconds (the worker logs[Circuit Open] ...and idlespollInterval). After the cooldown it allows one probe (half-open). - Half-open: a successful cycle closes the circuit; a failed probe re-opens it.
// Config\Jobs
public int $circuitBreakerThreshold = 5; // consecutive failures before opening
public int $circuitBreakerCooldown = 60; // seconds the circuit stays open
Note: The breaker reacts to thrown backend errors during a cycle (e.g. the broker is unreachable), not to ordinary job failures — a job that runs and fails is nacked/abandoned by the pipeline and counts as a successful backend cycle for the breaker.
Rate limiting¶
Cap how many jobs a queue processes per minute with Config\Jobs::$queueRateLimits (jobs/minute,
0 = unlimited). The worker checks the limit before each cycle and, when throttled, logs
[Rate Limited] ... and idles for pollInterval.
// Config\Jobs
public array $queueRateLimits = [
'emails' => 100, // at most 100 email jobs/minute
'reports' => 10,
];
The limiter (Daycry\Jobs\Libraries\RateLimiter) uses a cache-based, per-minute token bucket.
Note: Use an atomic cache driver (Redis or Memcached) in production. With those, the increment is server-side atomic and the cap is enforced precisely. The file/dummy fallback is best-effort and may overshoot by one per racing worker.
Dead-letter queue¶
The DLQ holds jobs that have permanently failed so they can be inspected or replayed instead of being lost. Configure a queue name:
Routing is provided by Daycry\Jobs\Libraries\DeadLetterQueue::store($payload, $handler, $reason, $attempts),
which enqueues the failed payload (annotated with _dlq_metadata: reason, timestamp, attempts) onto
the configured queue using the default backend, and returns false when the DLQ is unconfigured or
the enqueue fails.
use Daycry\Jobs\Libraries\DeadLetterQueue;
$stored = (new DeadLetterQueue())->store(
payload: $failedPayload,
handler: 'command',
reason: 'connection timeout',
attempts: 4,
);
if (! $stored) {
// DLQ disabled or enqueue failed — decide whether to drop or requeue; never silently lose work.
}
Warning: In the current worker pipeline, retry exhaustion calls the backend's
abandon()directly — which routes to a native dead-letter facility where the backend has one (beanstalkbury, Service Bus dead-letter afterMaxDeliveryCount) and otherwise marks the messagefailed(database) or drops it (redis). TheDeadLetterQueuehelper and$deadLetterQueueconfig are an opt-in application-level facility you invoke yourself; they are not automatically called by the worker on abandon. For redis, in particular, configure your own DLQ handling (or rely on inspection) so permanently-failed messages are not lost. See also Retries & Backoff.
Observability and metrics¶
The worker emits counters through a pluggable Daycry\Jobs\Metrics\MetricsCollectorInterface,
resolved from Config\Jobs::$metricsCollector:
// Config\Jobs
// InMemoryMetricsCollector (default) is fine for dev; null disables all metrics.
public ?string $metricsCollector = InMemoryMetricsCollector::class;
The interface is small:
interface MetricsCollectorInterface
{
public function increment(string $counter, int $value = 1, array $labels = []): void;
public function observe(string $metric, float $value, array $labels = []): void;
public function getSnapshot(): array;
}
Counters emitted by the worker¶
Every counter carries a queue label.
| Counter | Incremented when |
|---|---|
jobs_fetched |
A message was leased from the backend. |
jobs_rejected_signature |
A message failed HMAC signature verification (then abandoned). |
jobs_skipped_idempotent |
A message was skipped because its idempotency key was already processed. |
jobs_succeeded |
A job ran successfully and was acked. |
jobs_failed |
A job attempt failed (before deciding requeue vs dead-letter). |
jobs_requeued |
A failed job had retries left and was nacked with backoff. |
jobs_failed_permanently |
A failed job exhausted its retries and was abandoned. |
Reading metrics¶
The default InMemoryMetricsCollector aggregates counters/histograms in process memory (with a
cardinality cap and FIFO eviction so a long-running worker cannot grow unbounded). Read a snapshot:
use Daycry\Jobs\Metrics\Metrics;
$snapshot = Metrics::get()?->getSnapshot();
// ['counters' => ['jobs_succeeded|queue=reports' => 42, ...], 'histograms' => [...]]
Note: In-memory metrics live only for the lifetime of one worker process and are not scraped across processes. For production monitoring (e.g. Prometheus), implement
MetricsCollectorInterfacewith an exporter that writes to a shared, scrapeable store — for example a Redis/StatsD-backed collector or a Prometheus pushgateway client — and pointConfig\Jobs::$metricsCollectorat it. Set the config tonullto disable metrics entirely (allincrement/observecalls become no-ops).
In addition to metrics, the worker logs operational events through CodeIgniter's logger: rejected
signatures and retry exhaustion are logged at critical, and backend errors surface as CLI error
output. Aggregate these logs centrally to alert on jobs_failed_permanently and signature
rejections.
See also¶
- Queues & Backends — backend semantics and recovery model.
- CLI Commands —
jobs:queue:work,jobs:queue:reap,jobs:queue:purge. - Retries & Backoff — retry budget and the dead-letter relationship.
- Configuration — every operational setting referenced here.
- Scheduling — the cron runner that feeds queued work.