Building a Distributed Cron System That Scales to 1000+ Users for $0/Month

I hit Cloudflare Workers’ 30-second CPU time limit while processing just 10 users.

Each user took ~3 seconds to process (GitHub API calls + notifications). 10 users × 3 seconds = 30 seconds. Add any overhead and I’d get Time Limit Exceeded errors. The math was simple: I couldn’t scale sequentially.

That’s when I discovered Service Bindings-a feature that lets you spawn multiple Worker instances, each with its own fresh CPU budget. The result? I went from processing 10 users in 30+ seconds (with failures) to processing 1000+ users in parallel, all on Cloudflare’s free tier.

The Problem: CPU Time Limits Kill Sequential Processing

I was building Streaky, a GitHub streak reminder app. Every day at noon, it checks users’ GitHub contributions and sends notifications if they haven’t committed yet.

The workflow:

Query active users from D1 database
For each user:
- Fetch GitHub contributions via API (~1.5 seconds)
- Calculate current streak (~0.5 seconds)
- Send Discord/Telegram notification (~1 second)
Log results to database

The constraint: Cloudflare Workers have a 30-second CPU time limit per request. With 10 users taking 3 seconds each, I was right at the edge. Any network latency or API slowdown would trigger TLE errors.

What I tried first:

// Sequential processing - DOESN'T SCALE
export default {
  async scheduled(event, env, ctx) {
    const users = await getActiveUsers(env);
    
    for (const user of users) {
      await processUser(env, user); // 3 seconds per user
    }
    // Total: 10 users × 3 seconds = 30 seconds (TLE!)
  }
}

Why it failed:

10 users = 30 seconds (at the limit)
11 users = 33 seconds (TLE error)
No room for growth
Network latency pushes it over the edge

I needed a way to process users in parallel, not sequentially.

The Solution: Service Bindings + Distributed Queue

The core insight: instead of one Worker processing N users, spawn N Workers each processing 1 user.

Architecture:

Scheduler Worker (Main)
    |
    |-- Worker Instance 1 (User A) - Fresh 30s CPU budget
    |-- Worker Instance 2 (User B) - Fresh 30s CPU budget
    |-- Worker Instance 3 (User C) - Fresh 30s CPU budget
    |-- ...
    |-- Worker Instance N (User N) - Fresh 30s CPU budget

Result:

10 users processed in ~10 seconds (parallel)
Each Worker uses <5 seconds CPU time
No TLE errors
Scales to 1000+ users

The key: Service Bindings allow a Worker to call itself, creating new Worker instances. Each env.SELF.fetch() spawns a fresh Worker with its own CPU budget.

The Architecture: Queue + Service Bindings

Component 1: Queue Table (D1 SQLite)

The queue tracks which users need processing and prevents duplicate work.

CREATE TABLE cron_queue (
  id TEXT PRIMARY KEY,
  user_id TEXT NOT NULL,
  batch_id TEXT NOT NULL,
  status TEXT NOT NULL CHECK(status IN ('pending', 'processing', 'completed', 'failed')),
  created_at TEXT NOT NULL DEFAULT (datetime('now')),
  started_at TEXT,
  completed_at TEXT,
  error_message TEXT,
  retry_count INTEGER NOT NULL DEFAULT 0
);

CREATE INDEX idx_cron_queue_status ON cron_queue(status);
CREATE INDEX idx_cron_queue_batch ON cron_queue(batch_id);

Why D1?

Already part of the stack (no external dependencies)
Fast enough for job queues (< 10ms queries)
Supports atomic operations (prevents race conditions)
Free tier: 100,000 writes/day (plenty for this use case)

Component 2: Atomic Queue Claiming

The critical part: prevent race conditions when multiple Workers try to claim the same user.

export async function claimNextPendingUserAtomic(
  env: Env
): Promise<QueueItem | null> {
  const result = await env.DB.prepare(`
    WITH next AS (
      SELECT id FROM cron_queue
      WHERE status = 'pending'
      ORDER BY created_at ASC
      LIMIT 1
    )
    UPDATE cron_queue
    SET status = 'processing', started_at = datetime('now')
    WHERE id IN (SELECT id FROM next)
    RETURNING id, user_id, batch_id
  `).all<QueueItem>();

  return result.results[0] ?? null;
}

Why atomic?

CTE (WITH) + UPDATE + RETURNING in single transaction
No gap between SELECT and UPDATE
D1 SQLite guarantees atomicity
Prevents duplicate processing

Without atomic claiming:

Worker 1: SELECT id WHERE status='pending' → Gets user A
Worker 2: SELECT id WHERE status='pending' → Gets user A (race!)
Both workers process user A (duplicate notifications!)

With atomic claiming:

Worker 1: CTE + UPDATE + RETURNING → Gets user A, marks processing
Worker 2: CTE + UPDATE + RETURNING → Gets user B, marks processing
No duplicates, each worker gets unique user

Component 3: Service Bindings Configuration

Service Bindings let a Worker call itself, creating new instances.

wrangler.toml:

[[services]]
binding = "SELF"
service = "streaky"

Usage:

// Each fetch creates a NEW Worker instance
env.SELF.fetch('http://internal/api/cron/process-user', {
  method: 'POST',
  headers: {
    'X-Cron-Secret': env.SERVER_SECRET,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    queueId: queueItem.id,
    userId: queueItem.user_id,
  }),
})

Why Service Bindings?

Each env.SELF.fetch() = new Worker instance
Fresh CPU budget per instance (30 seconds each)
Automatic load balancing by Cloudflare
No external queue service needed (Redis, SQS, etc.)

Implementation: Step-by-Step

Step 1: Initialize Batch

When the cron trigger fires, create a batch of queue items.

export async function initializeBatch(
  env: Env,
  userIds: string[]
): Promise<string> {
  const batchId = crypto.randomUUID();

  // Bulk insert users to queue
  for (const userId of userIds) {
    const queueId = crypto.randomUUID();
    await env.DB.prepare(
      `INSERT INTO cron_queue (id, user_id, batch_id, status)
       VALUES (?, ?, ?, 'pending')`
    )
      .bind(queueId, userId, batchId)
      .run();
  }

  return batchId;
}

Step 2: Scheduler (Main Worker)

The scheduler initializes the batch and dispatches Workers.

export default {
  async scheduled(event: ScheduledEvent, env: Env, ctx: ExecutionContext) {
    // Query active users
    const usersResult = await env.DB.prepare(
      `SELECT id FROM users WHERE is_active = 1 AND github_pat IS NOT NULL`
    ).all();

    const userIds = usersResult.results.map((row: any) => row.id as string);

    if (userIds.length === 0) {
      console.log('[Scheduled] No active users to process');
      return;
    }

    // Initialize batch
    const batchId = await initializeBatch(env, userIds);
    console.log(`[Scheduled] Batch ${batchId} initialized with ${userIds.length} users`);

    // Dispatch Workers via Service Bindings
    for (let i = 0; i < userIds.length; i++) {
      const queueItem = await claimNextPendingUserAtomic(env);
      
      if (!queueItem) break;

      // Spawn new Worker instance for this user
      ctx.waitUntil(
        env.SELF.fetch('http://internal/api/cron/process-user', {
          method: 'POST',
          headers: {
            'X-Cron-Secret': env.SERVER_SECRET,
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            queueId: queueItem.id,
            userId: queueItem.user_id,
          }),
        })
          .then((res) => {
            console.log(`[Scheduled] User ${queueItem.user_id} dispatched: ${res.status}`);
          })
          .catch((error: Error) => {
            console.error(`[Scheduled] User ${queueItem.user_id} dispatch failed:`, error);
          })
      );
    }

    console.log(`[Scheduled] All ${userIds.length} users dispatched for batch ${batchId}`);
  }
}

Key points:

ctx.waitUntil() ensures async operations complete
Each env.SELF.fetch() creates new Worker instance
Errors in one Worker don’t affect others

Step 3: Worker Instance (Process Single User)

Each Worker instance processes one user.

app.post('/process-user', async (c) => {
  // Auth check
  const secret = c.req.header('X-Cron-Secret');
  if (!c.env.SERVER_SECRET || secret !== c.env.SERVER_SECRET) {
    return c.json({ error: 'Unauthorized' }, 401);
  }

  const body = await c.req.json<{ queueId: string; userId: string }>();
  const { queueId, userId } = body;

  // Idempotency check
  const status = await getQueueItemStatus(c.env, queueId);
  
  if (status === 'completed') {
    return c.json({ 
      success: true, 
      queueId, 
      userId, 
      skipped: true, 
      reason: 'Already completed' 
    });
  }

  // Process user
  try {
    await processSingleUser(c.env, userId);
    await markCompleted(c.env, queueId);
    
    return c.json({ success: true, queueId, userId });
  } catch (error) {
    const errorMessage = error instanceof Error ? error.message : 'Unknown error';
    await markFailed(c.env, queueId, errorMessage);
    
    // Return 200 (not 500) so scheduler continues with other users
    return c.json({ success: false, queueId, userId, error: errorMessage });
  }
});

Key points:

Idempotency protection (check status before processing)
Return 200 even on failure (don’t block other Workers)
Mark completed/failed in queue

Show Me the Numbers

I’m skeptical by nature, so I needed concrete data.

Performance Comparison

Approach	Users	Processing Time	CPU Time/Worker	Success Rate
Sequential	10	30+ seconds	30 seconds	0% (TLE)
Distributed	10	~10 seconds	3 seconds	100%
Distributed	100	~15 seconds	3 seconds	100%
Distributed	1000	~30 seconds	3 seconds	100%

Source: Cloudflare Workers Analytics, October 2025

Real-World Impact

Before (Sequential):

10 users × 3 seconds = 30 seconds
CPU time: 30 seconds (at limit!)
Wall time: 30 seconds
Success rate: 0% (TLE errors)

After (Distributed):

10 users / 10 Workers = 1 user per Worker
CPU time per Worker: 3 seconds
Wall time: ~10 seconds (parallel)
Success rate: 100%

Scalability:

Current load: 10 users/day
Theoretical capacity: 25,000 users/day (D1 write limit)
Headroom: 2500x current load

Advanced Features

1. Stale Item Requeuing

What if a Worker crashes? Items stuck in “processing” need to be requeued.

export async function requeueStaleProcessing(
  env: Env,
  minutes: number = 10
): Promise<number> {
  const result = await env.DB.prepare(`
    UPDATE cron_queue
    SET status = 'pending', started_at = NULL
    WHERE status = 'processing'
      AND started_at < datetime('now', '-' || ? || ' minutes')
  `)
    .bind(minutes)
    .run();

  return result.meta.changes;
}

Usage in scheduler:

// Reaper for stale processing items (10+ minutes)
ctx.waitUntil(
  requeueStaleProcessing(env, 10)
    .then((requeued) => {
      if (requeued > 0) {
        console.log(`[Scheduled] Requeued ${requeued} stale processing items`);
      }
    })
);

2. Batch Progress Tracking

Monitor batch progress in real-time.

export interface BatchProgress {
  pending: number;
  processing: number;
  completed: number;
  failed: number;
  total: number;
}

export async function getBatchProgress(
  env: Env,
  batchId: string
): Promise<BatchProgress> {
  const results = await env.DB.prepare(`
    SELECT status, COUNT(*) as count
    FROM cron_queue
    WHERE batch_id = ?
    GROUP BY status
  `)
    .bind(batchId)
    .all();

  const progress: BatchProgress = {
    pending: 0,
    processing: 0,
    completed: 0,
    failed: 0,
    total: 0,
  };

  for (const row of results.results as Array<{ status: string; count: number }>) {
    const status = row.status as keyof Omit<BatchProgress, 'total'>;
    progress[status] = row.count;
    progress.total += row.count;
  }

  return progress;
}

Getting Your Hands Dirty

Prerequisites

Cloudflare account (free tier)
Node.js 18+ (for Wrangler CLI)
Basic TypeScript knowledge

Setup

# Install Wrangler CLI
npm install -g wrangler

# Create new project
npm create cloudflare@latest my-distributed-cron

# Install dependencies
cd my-distributed-cron
npm install hono

Quick Start

1. Configure wrangler.toml:

name = "my-distributed-cron"
main = "src/index.ts"
compatibility_date = "2025-10-11"

# D1 Database
[[d1_databases]]
binding = "DB"
database_name = "my-queue-db"
database_id = "your-database-id"

# Service Bindings
[[services]]
binding = "SELF"
service = "my-distributed-cron"

# Cron Trigger
[triggers]
crons = ["0 12 * * *"]

2. Create D1 database:

npx wrangler d1 create my-queue-db
npx wrangler d1 execute my-queue-db --file=schema.sql

3. Deploy:

npx wrangler deploy

Production Considerations

Rate Limiting:

Cloudflare Workers: 100,000 requests/day (free tier)
D1 writes: 100,000/day (free tier)
Bottleneck: D1 writes (2 writes per user = 50,000 users/day)

Error Handling:

Idempotency checks (prevent duplicate processing)
Stale item requeuing (handle Worker crashes)
Return 200 on failure (don’t block other Workers)

Monitoring:

Cloudflare Analytics (built-in)
Custom logging (Analytics Engine)
Batch progress tracking (API endpoint)

What Surprised Me: The Trade-offs

The Good

1. Scales Beyond Single-Worker Limits

Sequential: 10 users max (30s CPU limit)
Distributed: 1000+ users (parallel processing)
Each Worker gets fresh 30s CPU budget

2. Zero External Dependencies

No Redis, SQS, or RabbitMQ needed
D1 SQLite handles queue perfectly
Service Bindings built into Workers

3. Cost-Effective

Free tier: 100k requests/day
Current usage: ~50 requests/day
Headroom: 2000x capacity

The Not-So-Good

1. D1 Write Limits

Free tier: 100k writes/day
2 writes per user = 50k users/day max
Workaround: Batch writes, cleanup old data

2. Cold Start Latency

First Worker: ~100ms cold start
Subsequent Workers: ~10ms warm
Impact: Minimal (parallel processing)

3. Debugging Complexity

Multiple Workers = multiple logs
Need batch tracking to correlate
Solution: Batch ID + structured logging

When to Use This

Processing N independent tasks (users, jobs, etc.)
Each task takes significant CPU time (>1 second)
Need to scale beyond single-Worker limits
Want to stay on free tier

When NOT to Use This

Tasks are fast (<100ms each)
Need strict ordering (queue guarantees order)
Require transactional guarantees across tasks
Need more than 100k writes/day (D1 limit)

The Cost Calculation

Free tier limits:

Cloudflare Workers: 100,000 requests/day
D1 database: 100,000 writes/day
Bottleneck: D1 writes (2 writes per user)

Current usage (10 users/day):

Workers: ~20 requests/day (10 users × 2 endpoints)
D1 writes: ~40 writes/day (queue + notifications)
Cost: $0/month

Projected usage (1000 users/day):

Workers: ~2,000 requests/day
D1 writes: ~4,000 writes/day
Cost: Still $0/month (20x headroom)

When would I need to pay?

50,000 users/day (D1 write limit)
Paid tier: $5/month (D1)
Still cheaper than Redis/SQS

Resources:

Further Reading:

Connect

GitHub: @0xReLogic
LinkedIn: Allen Elzayn