Webhooks

Webhooks look convenient until you run them in production—they're hard to secure, easy to drop, and painful to observe. Instead of asking you to build fragile receivers, we optimize for predictability: synchronous responses, idempotent writes, and explicit retrieval APIs you can rely on. If demand becomes overwhelming, we'll add webhooks, but we'll still recommend building around patterns that don't fail silently when networks do.

The problems with webhooks

Unreliable delivery

Networks fail, DNS times out, TLS certificates expire, load balancers restart. Webhook delivery is best-effort—if a notification drops, the event is lost unless you build retry infrastructure.

Payment flow impact: A customer pays for an order. The payment succeeds, but the payment.succeeded webhook never arrives because your server briefly returned 503 during a deployment. Your UI shows the order stuck in "pending" indefinitely. The customer calls support, confused why they were charged but have no confirmation. You investigate, find nothing in your logs, and have no proof the webhook was even sent.

Security complexity

Webhooks expose an always-on endpoint that accepts POST requests from the internet. Without proper signature verification, IP allowlists, replay protection, and request logging, attackers can forge payment.succeeded events to unlock goods they never paid for.

Payment flow impact: An attacker reverse-engineers your webhook endpoint URL (leaked in a GitHub repo or browser network tab). They craft a payload claiming order or_789 is paid and POST it to your endpoint. Your handler sees the payload, marks the order complete, and releases the digital goods—no signature check, no verification. The real customer never paid; you lost revenue and inventory.

Out-of-order delivery and duplicates

Webhook systems retry failed deliveries. You'll receive duplicates. Network delays mean events arrive out of sequence—refund.succeeded can arrive before payment.captured.

Payment flow impact: A customer's payment succeeds. The webhook fires. Your server is slow to respond (database lock), so the webhook provider retries immediately. You process the payment.succeeded event twice: first triggers order fulfillment, second triggers a duplicate fulfillment email and double-updates your analytics. Meanwhile, a network hiccup delays the order.completed webhook, which arrives before payment.succeeded. Your state machine breaks; the order shows as complete with no payment record.

Debugging is painful

When a webhook doesn't arrive or fails, you need per-attempt traces, payload captures, retry logs, and timestamps. Without a comprehensive delivery dashboard, answering "where's my event?" becomes a multi-hour support ticket involving log archaeology across systems.

Payment flow impact: A merchant reports that 10 orders from yesterday show as "paid" in their dashboard but "pending" in their accounting system. You check your application logs—webhook handler ran successfully for all 10. You check the payment provider's dashboard—it shows "delivered" with 200 responses. Hours later, you discover a silent exception in your ledger update code that only fires when order totals exceed a threshold. The webhook arrived and was acknowledged, but the side effect failed. No alerts fired because the HTTP response was 200.

Customer-facing failures

If your UI or fulfillment logic depends on webhooks, every delivery failure becomes a customer experience failure. Orders appear stuck, payouts never update, verifications time out.

Payment flow impact: Your checkout flow redirects customers to the payment page, then polls an internal status endpoint that updates only when the payment.succeeded webhook arrives. The customer completes payment, is redirected back to your site, and sees a spinner that says "Confirming payment...". The webhook is delayed by 30 seconds due to retry backoff from a previous failure. The customer refreshes, sees the same spinner, and assumes payment failed. They retry payment with a different card, creating a duplicate charge. Support spends the next day issuing refunds and apologizing.

Better patterns you can use today

  1. Rely on synchronous responses for critical steps
    Key flows—order creation, payment initiation, OTP verification—return the authoritative state immediately. Handle the response and move on; don't wait for an async echo.

  2. Fetch state explicitly when you need it
    Every resource is queryable by ID. Instead of waiting for a push, ask for the latest state:

    • Orders: /orders/lookup
    • Payments (via order): /orders/lookup (payment is nested)
    • OTPs: /otp/lookup
    • Chimes: /chimes/lookup
  3. Use idempotency keys and retries
    All write endpoints accept idempotency keys. If you're unsure about prior success, retry with the same key to get the canonical result without duplicating work.

  4. Short polling for status-sensitive UX
    For experiences that need "live" updates (e.g., waiting for a payment to move from requires_action to paid), poll the lookup endpoint for a bounded window. Keep intervals small (e.g., 2—5 seconds) and stop after a timeout; persist the latest state server-side, not in the browser.

  5. Use customer-driven confirmation
    Redirect and client confirmations are explicit: after paying, your client can immediately fetch the order by ID and render the final state. No silent background dependencies.

Example: poll an order until it's paid

Use short polling with idempotent reads. Here's a minimal pattern you can adapt. Keep your intervals modest and bound the total wait time.

order_id="or_123"
for i in {1..10}; do
  resp=$(curl -s -X POST https://api.zebo.dev/orders/lookup \
    -H "Authorization: Bearer $COMMERCE_API_KEY" \
    -H "Content-Type: application/json" \
    -d "{\"order_id\":\"$order_id\"}")
  status=$(echo "$resp" | jq -r '.order.payment.status')
  echo "Attempt $i: $status"
  if [ "$status" = "paid" ]; then break; fi
  sleep 3
done

If you still want webhooks

If you have a rock-solid case for webhooks (and you're prepared to handle signing, retries, duplicates, ordering, and observability), tell us. We may add a push channel in the future if demand is unmistakable. Even then, we'll keep recommending that you build around synchronous responses, idempotent writes, and explicit fetches—patterns that won't fail silently when networks do.

Was this page helpful?