Webhooks and Async Patterns for Long-Running Analysis
Patterns for handling DataStoryBot's analysis time: polling, webhooks, and queue-based architectures for production integrations.
Webhooks and Async Patterns for Long-Running Analysis
DataStoryBot's /analyze endpoint takes 10-120 seconds to return. That's fine for a CLI script. It's a problem for a web application where users are staring at a loading spinner, or for a backend service that's holding open an HTTP connection while 200 other requests queue behind it.
Long-running API calls need async patterns. This article covers three approaches — polling, webhooks, and queue-based architectures — with production code for each.
Why Analysis Takes Time
The timing breakdown for a typical /analyze call:
- Container setup: 1-3 seconds (if a new container is needed)
- Prompt construction: < 1 second
- Code generation: 2-5 seconds (LLM generates the Python analysis code)
- Code execution: 5-60 seconds (pandas operations, statistical tests, chart rendering)
- Result extraction: 1-2 seconds (parsing Code Interpreter output into story angles)
The execution step is the bottleneck. Complex analysis on large datasets — correlation matrices, multi-group comparisons, time-series decomposition — takes longer. Simple analyses on small datasets can complete in 10-15 seconds.
Pattern 1: Client-Side Polling
The simplest approach. The client sends the request, shows a loading state, and polls for results.
Frontend (React)
function AnalysisLoader({ containerId, steering, onComplete }) {
const [status, setStatus] = useState("starting");
const [elapsed, setElapsed] = useState(0);
useEffect(() => {
let cancelled = false;
async function run() {
setStatus("analyzing");
const startTime = Date.now();
// Start analysis
const res = await fetch("/api/analyze", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ containerId, steering }),
});
if (!cancelled) {
const data = await res.json();
setStatus("complete");
setElapsed(Math.round((Date.now() - startTime) / 1000));
onComplete(data);
}
}
// Update elapsed time every second
const timer = setInterval(() => {
setElapsed((e) => e + 1);
}, 1000);
run();
return () => {
cancelled = true;
clearInterval(timer);
};
}, [containerId, steering]);
return (
<div className="flex items-center gap-3 p-4 rounded bg-blue-50">
<div className="animate-spin h-5 w-5 border-2 border-blue-600 border-t-transparent rounded-full" />
<div>
<div className="font-medium">
{status === "analyzing" ? "Analyzing your data..." : "Complete"}
</div>
<div className="text-sm text-gray-500">
{elapsed}s elapsed
{elapsed > 15 && " — complex datasets take up to 2 minutes"}
</div>
</div>
</div>
);
}
Backend (Streaming Response)
Instead of holding the connection open, use Server-Sent Events to stream status updates:
// app/api/analyze-stream/route.ts
import { NextRequest } from "next/server";
export async function POST(request: NextRequest) {
const { containerId, steering } = await request.json();
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
// Send initial status
controller.enqueue(
encoder.encode(`data: {"status": "started"}\n\n`)
);
try {
// Run analysis (this blocks for 10-120s)
const result = await fetch("https://datastory.bot/api/analyze", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
containerId,
steeringPrompt: steering,
}),
});
const stories = await result.json();
controller.enqueue(
encoder.encode(
`data: {"status": "complete", "stories": ${JSON.stringify(stories)}}\n\n`
)
);
} catch (error) {
controller.enqueue(
encoder.encode(
`data: {"status": "error", "message": "Analysis failed"}\n\n`
)
);
}
controller.close();
},
});
return new Response(stream, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
Connection: "keep-alive",
},
});
}
Frontend SSE Consumer
function startAnalysis(containerId, steering, onStatus) {
return new Promise((resolve, reject) => {
const response = fetch("/api/analyze-stream", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ containerId, steering }),
});
response.then(async (res) => {
const reader = res.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
const lines = text.split("\n").filter((l) => l.startsWith("data: "));
for (const line of lines) {
const data = JSON.parse(line.slice(6));
onStatus(data);
if (data.status === "complete") resolve(data.stories);
if (data.status === "error") reject(new Error(data.message));
}
}
});
});
}
Pattern 2: Job Queue
For applications that can't hold connections open (serverless functions, API gateways with short timeouts), use a job queue.
Client → POST /api/analyze → Returns job_id immediately
Client → GET /api/jobs/{job_id} → Returns status (pending/running/complete)
Worker → Picks up job → Calls DataStoryBot → Stores result
Client → GET /api/jobs/{job_id} → Returns completed result
Job Submission
// app/api/analyze/route.ts
import { randomUUID } from "crypto";
export async function POST(request: NextRequest) {
const { containerId, steering } = await request.json();
const jobId = randomUUID();
// Store job in database
await db.jobs.create({
id: jobId,
status: "pending",
containerId,
steering,
createdAt: new Date(),
});
// Enqueue for processing
await queue.push({
jobId,
containerId,
steering,
});
return NextResponse.json({ jobId, status: "pending" });
}
Job Status
// app/api/jobs/[id]/route.ts
export async function GET(
request: NextRequest,
{ params }: { params: { id: string } }
) {
const job = await db.jobs.findById(params.id);
if (!job) {
return NextResponse.json({ error: "Job not found" }, { status: 404 });
}
if (job.status === "complete") {
return NextResponse.json({
status: "complete",
result: job.result,
});
}
return NextResponse.json({
status: job.status,
createdAt: job.createdAt,
});
}
Worker
// worker.ts
async function processJob(job) {
await db.jobs.update(job.jobId, { status: "running" });
try {
const stories = await fetch("https://datastory.bot/api/analyze", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
containerId: job.containerId,
steeringPrompt: job.steering,
}),
}).then((r) => r.json());
// Refine top story
const report = await fetch("https://datastory.bot/api/refine", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
containerId: job.containerId,
selectedStoryTitle: stories[0].title,
}),
}).then((r) => r.json());
await db.jobs.update(job.jobId, {
status: "complete",
result: { stories, report },
});
} catch (error) {
await db.jobs.update(job.jobId, {
status: "failed",
error: error.message,
});
}
}
Client Polling
async function waitForResult(jobId, pollInterval = 3000) {
while (true) {
const res = await fetch(`/api/jobs/${jobId}`);
const data = await res.json();
if (data.status === "complete") return data.result;
if (data.status === "failed") throw new Error("Analysis failed");
await new Promise((r) => setTimeout(r, pollInterval));
}
}
Pattern 3: Webhook Callback
For service-to-service integrations where the client doesn't poll:
Client → POST /api/analyze { callbackUrl: "https://your-app.com/webhook" }
Worker → Calls DataStoryBot → POST result to callbackUrl
Submission with Callback
export async function POST(request: NextRequest) {
const { containerId, steering, callbackUrl } = await request.json();
const jobId = randomUUID();
await queue.push({ jobId, containerId, steering, callbackUrl });
return NextResponse.json({ jobId, status: "accepted" });
}
Worker with Callback
async function processJobWithCallback(job) {
try {
const result = await runAnalysis(job.containerId, job.steering);
// Deliver result via webhook
await fetch(job.callbackUrl, {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-Job-Id": job.jobId,
"X-Signature": sign(result, webhookSecret),
},
body: JSON.stringify({
jobId: job.jobId,
status: "complete",
result,
}),
});
} catch (error) {
await fetch(job.callbackUrl, {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-Job-Id": job.jobId,
},
body: JSON.stringify({
jobId: job.jobId,
status: "failed",
error: error.message,
}),
});
}
}
Always sign webhook payloads so the receiver can verify they came from your system.
Choosing the Pattern
| Pattern | Best For | Complexity | Latency |
|---|---|---|---|
| SSE streaming | Web apps with real-time UI | Low | Real-time |
| Job queue + polling | Serverless, API gateways | Medium | 3-5s polling delay |
| Webhook callback | Service-to-service | Medium | Real-time |
| Synchronous (no pattern) | Scripts, CLI tools | None | Blocks caller |
For most web applications, SSE streaming is the best choice — it's simple, shows progress, and doesn't require infrastructure for job queues.
For microservice architectures where the analysis consumer is a different service, webhooks are cleaner than polling.
For batch processing (analyzing 100 CSVs overnight), the job queue pattern scales well — just push all jobs onto the queue and let workers process them concurrently.
Timeout Handling
Regardless of pattern, handle the 300-second analysis timeout:
import asyncio
async def analyze_with_timeout(container_id, steering, timeout=300):
"""Run analysis with explicit timeout."""
try:
result = await asyncio.wait_for(
call_datastorybot_analyze(container_id, steering),
timeout=timeout
)
return result
except asyncio.TimeoutError:
return {
"status": "timeout",
"message": (
"Analysis timed out after 5 minutes. "
"Try with a smaller dataset or simpler steering prompt."
)
}
What to Read Next
For the error handling that wraps these async patterns, see error handling and retry patterns for data analysis APIs.
For the API endpoints these patterns call, see the DataStoryBot API reference.
For building a complete user-facing application, read building a self-service reporting portal with DataStoryBot.
Ready to find your data story?
Upload a CSV and DataStoryBot will uncover the narrative in seconds.
Try DataStoryBot →