Operations¶

This page covers running daycry/jobs v3 in production: keeping long-running workers alive under a process supervisor, shutting them down cleanly, scaling out to many workers, running the periodic reaper, the operational behaviour of the circuit breaker and rate limiter, the dead-letter queue, and observability via the metrics collector.

It assumes you understand the queue model from Queues & Backends and the worker commands from CLI Commands.

Running long-running workers¶

In production you run one or more jobs:queue:work processes per queue, each kept alive by a process supervisor so it restarts automatically on exit (crash, deploy, OOM, or a graceful stop). The worker itself runs an unbounded loop when invoked without --once/--max.

Supervisor¶

A typical supervisord program. numprocs runs several identical workers against the same queue (see Scaling out):

[program:jobs-reports]
command=php /var/www/app/spark jobs:queue:work reports --backend redis
directory=/var/www/app
user=www-data
numprocs=4
process_name=%(program_name)s_%(process_num)02d
autostart=true
autorestart=true
startsecs=3
stopsignal=TERM
stopwaitsecs=3600
stdout_logfile=/var/log/jobs/reports.out.log
stderr_logfile=/var/log/jobs/reports.err.log

Warning: Set stopwaitsecs (Supervisor) higher than your longest job runtime. The worker finishes the in-flight job before exiting on SIGTERM; if the supervisor force-kills it first (SIGKILL) the job is interrupted and will be redelivered after its visibility timeout.

systemd¶

A templated unit (jobs-worker@.service) so you can start one instance per queue with systemctl start jobs-worker@reports:

[Unit]
Description=Jobs queue worker (%i)
After=network.target

[Service]
Type=simple
User=www-data
WorkingDirectory=/var/www/app
ExecStart=/usr/bin/php /var/www/app/spark jobs:queue:work %i --backend redis
Restart=always
RestartSec=3
# Allow the in-flight job to finish before SIGKILL on stop/restart.
TimeoutStopSec=3600
KillSignal=SIGTERM

[Install]
WantedBy=multi-user.target

systemctl daemon-reload
systemctl enable --now jobs-worker@reports
systemctl enable --now jobs-worker@emails

Note: Restart your workers on every deploy. A long-running PHP process keeps the old code (and a warm Config\Jobs) in memory until it restarts, so new handler code or config changes are not picked up by a worker that keeps running across the deploy.

Graceful shutdown and signals¶

The worker installs handlers for SIGTERM and SIGINT (POSIX, requires the pcntl extension). On receipt it:

Prints stop signal received, finishing current cycle....
Sets an internal stop flag (checked at the top of every loop iteration).
Finishes the current cycle — an in-flight job is run to completion and settled normally.
Prints graceful shutdown complete. and exits with SUCCESS.

This means a deploy/restart never aborts a running job mid-flight; the worker simply stops pulling new work and exits.

Warning: On platforms without pcntl (notably Windows), the worker cannot trap signals. Bound such runs with --once or --max N and re-invoke from a scheduler, or stop the process externally between cycles. Always give your supervisor enough stop-grace time (stopwaitsecs / TimeoutStopSec) to exceed the longest job runtime.

Scaling out¶

Because the queue contract is lease-based and claims are atomic, you scale throughput simply by running more workers against the same queue:

The database backend claims rows with FOR UPDATE SKIP LOCKED (optimistic fallback for SQLite), so concurrent workers never grab the same row.
The redis backend moves messages atomically with RPOPLPUSH into a per-message processing entry, so a message is leased by exactly one worker.
beanstalk and serviceBus reserve/peek-lock each message server-side.

Run dedicated worker pools per queue so a slow queue does not starve a fast one:

# 4 workers on 'emails', 2 on 'reports'
php spark jobs:queue:work emails  --backend redis   # x4 under the supervisor
php spark jobs:queue:work reports --backend redis   # x2 under the supervisor

Warning: With multiple workers, delivery is at-least-once and a message can be processed more than once (after a crash + reaper recovery, or a lease expiry). Make handlers idempotent — use idempotencyKey() for built-in de-duplication. See Idempotency.

The periodic reaper¶

A worker that crashes between fetch() and ack() leaves its message leased and invisible until the visibility timeout elapses. Run jobs:queue:reap <queue> periodically (every minute is typical) to return such messages to the ready state. This is required for the database and redis backends; beanstalk and Service Bus recover natively.

# System cron, once a minute per queue
* * * * * cd /var/www/app && php spark jobs:queue:reap reports >> /dev/null 2>&1
* * * * * cd /var/www/app && php spark jobs:queue:reap emails --backend redis >> /dev/null 2>&1

The visibility timeout used is redisProcessingVisibilityTimeout for the redis backend and databaseVisibilityTimeout otherwise (both default 300s).

Warning — visibility timeout must exceed runtime. If a job's real runtime can exceed the visibility timeout, the reaper (or the broker, for beanstalk TTR / Service Bus lock) will treat the still-running worker as crashed and redeliver the message, causing a duplicate execution. Always set the visibility timeout (and beanstalk TTR / serviceBusLockTimeout) greater than your longest expected job runtime, with headroom. For redis, a long-running worker can also extend its lease by calling RedisBackend::renewLease().

Circuit breaker¶

The worker wraps each cycle in a per-queue CircuitBreaker (cache-backed, so state persists across worker restarts). It protects an unhealthy backend from being hammered:

Closed (normal): failures are counted. After Config\Jobs::$circuitBreakerThreshold consecutive backend errors the circuit opens.
Open: cycles are skipped for Config\Jobs::$circuitBreakerCooldown seconds (the worker logs [Circuit Open] ... and idles pollInterval). After the cooldown it allows one probe (half-open).
Half-open: a successful cycle closes the circuit; a failed probe re-opens it.

// Config\Jobs
public int $circuitBreakerThreshold = 5;   // consecutive failures before opening
public int $circuitBreakerCooldown  = 60;  // seconds the circuit stays open

Note: The breaker reacts to thrown backend errors during a cycle (e.g. the broker is unreachable), not to ordinary job failures — a job that runs and fails is nacked/abandoned by the pipeline and counts as a successful backend cycle for the breaker.

Rate limiting¶

Cap how many jobs a queue processes per minute with Config\Jobs::$queueRateLimits (jobs/minute, 0 = unlimited). The worker checks the limit before each cycle and, when throttled, logs [Rate Limited] ... and idles for pollInterval.

// Config\Jobs
public array $queueRateLimits = [
    'emails'  => 100, // at most 100 email jobs/minute
    'reports' => 10,
];

The limiter (Daycry\Jobs\Libraries\RateLimiter) uses a cache-based, per-minute token bucket.

Note: Use an atomic cache driver (Redis or Memcached) in production. With those, the increment is server-side atomic and the cap is enforced precisely. The file/dummy fallback is best-effort and may overshoot by one per racing worker.

Dead-letter queue¶

The DLQ holds jobs that have permanently failed so they can be inspected or replayed instead of being lost. Configure a queue name:

// Config\Jobs
public ?string $deadLetterQueue = 'dead-letter'; // null disables the DLQ helper

Routing is provided by Daycry\Jobs\Libraries\DeadLetterQueue::store($payload, $handler, $reason, $attempts), which enqueues the failed payload (annotated with _dlq_metadata: reason, timestamp, attempts) onto the configured queue using the default backend, and returns false when the DLQ is unconfigured or the enqueue fails.

use Daycry\Jobs\Libraries\DeadLetterQueue;

$stored = (new DeadLetterQueue())->store(
    payload: $failedPayload,
    handler: 'command',
    reason:  'connection timeout',
    attempts: 4,
);

if (! $stored) {
    // DLQ disabled or enqueue failed — decide whether to drop or requeue; never silently lose work.
}

Warning: In the current worker pipeline, retry exhaustion calls the backend's abandon() directly — which routes to a native dead-letter facility where the backend has one (beanstalk bury, Service Bus dead-letter after MaxDeliveryCount) and otherwise marks the message failed (database) or drops it (redis). The DeadLetterQueue helper and $deadLetterQueue config are an opt-in application-level facility you invoke yourself; they are not automatically called by the worker on abandon. For redis, in particular, configure your own DLQ handling (or rely on inspection) so permanently-failed messages are not lost. See also Retries & Backoff.

Observability and metrics¶

The worker emits counters through a pluggable Daycry\Jobs\Metrics\MetricsCollectorInterface, resolved from Config\Jobs::$metricsCollector:

// Config\Jobs
// InMemoryMetricsCollector (default) is fine for dev; null disables all metrics.
public ?string $metricsCollector = InMemoryMetricsCollector::class;

The interface is small:

interface MetricsCollectorInterface
{
    public function increment(string $counter, int $value = 1, array $labels = []): void;
    public function observe(string $metric, float $value, array $labels = []): void;
    public function getSnapshot(): array;
}

Counters emitted by the worker¶

Every counter carries a queue label.

Counter	Incremented when
`jobs_fetched`	A message was leased from the backend.
`jobs_rejected_signature`	A message failed HMAC signature verification (then abandoned).
`jobs_skipped_idempotent`	A message was skipped because its idempotency key was already processed.
`jobs_succeeded`	A job ran successfully and was acked.
`jobs_failed`	A job attempt failed (before deciding requeue vs dead-letter).
`jobs_requeued`	A failed job had retries left and was nacked with backoff.
`jobs_failed_permanently`	A failed job exhausted its retries and was abandoned.

Reading metrics¶

The default InMemoryMetricsCollector aggregates counters/histograms in process memory (with a cardinality cap and FIFO eviction so a long-running worker cannot grow unbounded). Read a snapshot:

use Daycry\Jobs\Metrics\Metrics;

$snapshot = Metrics::get()?->getSnapshot();
// ['counters' => ['jobs_succeeded|queue=reports' => 42, ...], 'histograms' => [...]]

Note: In-memory metrics live only for the lifetime of one worker process and are not scraped across processes. For production monitoring (e.g. Prometheus), implement MetricsCollectorInterface with an exporter that writes to a shared, scrapeable store — for example a Redis/StatsD-backed collector or a Prometheus pushgateway client — and point Config\Jobs::$metricsCollector at it. Set the config to null to disable metrics entirely (all increment/observe calls become no-ops).

In addition to metrics, the worker logs operational events through CodeIgniter's logger: rejected signatures and retry exhaustion are logged at critical, and backend errors surface as CLI error output. Aggregate these logs centrally to alert on jobs_failed_permanently and signature rejections.