generalMarch 24, 202611 min read

The Agentic Data Analysis Pattern: Upload, Discover, Narrate

How DataStoryBot uses autonomous code execution, iterative analysis, and structured output to produce insights that beat single-prompt approaches.

By DataStoryBot Team

The Agentic Data Analysis Pattern: Upload, Discover, Narrate

Ask ChatGPT to "analyze this CSV" and you will get a response. It will describe the columns, mention the row count, compute a few summary statistics, and probably generate a chart. The answer takes thirty seconds and looks impressive until you look closely.

The problem is not the AI — it is the architecture. A single prompt fires once, gets one chance to look at the data, and must produce its final answer without knowing what it will find. The model cannot discover that the third column has a bimodal distribution and decide to investigate further. It cannot notice that one customer segment drives 80% of variance and pivot its analysis there. It writes one block of code, runs it, and reports whatever comes back.

Agentic analysis is different. The model iterates. It executes code, observes the output, decides what to investigate next, writes more code, and continues until it has something worth saying. The analysis emerges from the data rather than from a template.

This article explains the agentic data analysis pattern precisely: what makes it work, how it differs from single-prompt approaches, and how DataStoryBot implements it.

What "Agentic" Actually Means Here

The word "agentic" gets overloaded. In the context of data analysis, it has a specific meaning: the model is in a feedback loop with the data, not just with the user.

In a conventional LLM workflow, the loop is:

User → Prompt → LLM → Response → User

In an agentic workflow, there is an additional loop the model operates autonomously:

LLM → Write code → Execute code → Observe output → Write more code → ...

The model acts on the world (executes code), observes the results, and uses those results to decide what to do next. The user is not in this inner loop. This is what makes it agentic: autonomous action toward a goal with intermediate feedback.

For data analysis, the "world" is a sandboxed Python environment containing the uploaded file. The model writes pandas and matplotlib code, runs it, reads stdout and generated files, and decides what to do next — all without waiting for human input.

Single-Prompt Approaches and Why They Fall Short

A single-prompt approach stuffs the CSV into the context window and asks the model to analyze it:

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": f"Here is a CSV:\n{csv_content}\n\nAnalyze it."}]
)

This breaks in three ways. First, a 10,000-row CSV does not fit in a context window — you must summarize it before sending, which means you are doing the analysis yourself and asking the model to narrate it. Second, the model cannot compute anything; statistics it generates are pattern-matched, not calculated, and may be fabricated. Third, it cannot discover what is interesting before deciding what to analyze — the analysis direction is locked in before any data is seen.

Chat-based Code Interpreter sessions improve on this: the model actually executes Python, so numbers are real. But the analysis direction is still entirely user-driven. The user must already know what to look for. "Check whether revenue correlates with customer tenure, broken down by acquisition channel" works only because the user already suspected that relationship. The AI is executing human-directed analysis, not discovering patterns autonomously.

The agentic pattern separates what to analyze from how to analyze it. You provide the data and a goal ("find the most interesting patterns"); the model decides which statistical approaches to try, which visualizations illuminate the data, and which findings are worth reporting. This is a genuinely different contract with the AI.

The Agentic Loop in Detail

The core pattern has three phases, but the middle phase contains the actual intelligence:

Upload CSV → Provision container
                  │
                  ▼
     ┌─── INNER EXECUTION LOOP ────┐
     │  Model writes Python        │
     │       ↓                     │
     │  Container executes it      │
     │       ↓                     │
     │  stdout + files returned    │
     │       ↓                     │
     │  Model reads output         │
     │  → continue? ───────────────┘ (yes)
     │  → done? (produces structured output)
     └─────────────────────────────┘
                  │
                  ▼
     Story candidates (JSON)
                  │
                  ▼
     Second prompt: "Develop story N"
                  │
              Inner loop again
                  │
                  ▼
     Narrative + chart PNGs + filtered dataset

The inner execution loop is where the agentic behavior lives. The model is not generating one block of code and stopping — it is writing code, reading the output, and deciding what to do next. A typical analysis session might execute ten to twenty code blocks:

df.head(), df.dtypes, df.describe() — initial reconnaissance
Null value inspection, outlier detection
Distribution analysis for continuous variables
Correlation matrix
Group-by aggregations on categorical columns
A specific follow-up: "that spike in column X in March is interesting — investigate"
Time series decomposition if temporal data is present
Chart generation for the most meaningful patterns
Final synthesis into story candidates

No single prompt produces this depth. It emerges from the model's autonomous decisions about what to investigate next.

How DataStoryBot Implements This

DataStoryBot implements the agentic pattern across three API endpoints that map to the three phases: upload, discover, and narrate.

Phase 1: Upload — Provisioning the Container

The first endpoint, /api/upload, accepts a CSV, creates an ephemeral OpenAI container, and uploads the file to it. The container is an isolated Python sandbox with pandas, numpy, scipy, matplotlib, and seaborn pre-installed. The response returns a containerId — the handle referenced by all subsequent calls. The container persists for 20 minutes from last activity; everything inside it, including generated charts, expires with it.

For container creation and file upload mechanics, see Building a Code Interpreter Workflow with the Responses API.

Phase 2: Analyze — Autonomous Discovery

The second endpoint, /api/analyze, is where the agentic loop runs. It takes the containerId, constructs a discovery prompt, and calls the Responses API with the code_interpreter tool enabled against that container.

The prompt instructs the model to explore the dataset and identify the three most analytically interesting story angles — patterns, anomalies, or relationships that would be worth explaining to a business audience. It explicitly does not prescribe what to look for. The model decides.

Under the hood, the Responses API call streams back a conversation that includes multiple code execution turns. The model's reasoning is interleaved with actual Python execution against the real data. Here is what a simplified version of that API call looks like:

response = openai.responses.create(
    model="gpt-4o",
    tools=[{
        "type": "code_interpreter",
        "container": {"id": container_id}
    }],
    input=[{
        "role": "user",
        "content": DISCOVERY_PROMPT  # instructs open-ended exploration
    }],
    tool_choice="auto"  # model decides when to execute code
)

The model will call the code_interpreter tool multiple times during a single responses.create call. Each call executes Python in the container and returns stdout. The model reads that stdout, updates its understanding of the data, and either executes more code or transitions to producing its final output.

The output of this phase is a structured JSON object: three story candidates, each with a title, one-sentence summary, and a relevance score the model assigns based on how interesting or actionable the pattern is. This structured output is what makes the result usable programmatically — not prose to parse, but a JSON schema to index into.

For the full Responses API mechanics including streaming and tool output handling, see OpenAI Code Interpreter for Data Analysis: A Complete Guide.

Phase 3: Refine — Structured Narrative and Charts

The third endpoint, /api/refine, takes a containerId, a storyIndex (which of the three story angles to develop), and an optional refinementPrompt for steering. It runs a second agentic loop against the same container, this time with a narrower mandate: develop the selected story into a full narrative and generate supporting charts.

The second loop tends to be more focused than the first. The model already explored the dataset during discovery; now it is executing targeted analysis — the specific charts, statistical tests, and comparisons that best support the chosen story angle. Chart generation is explicit: the model writes matplotlib code that saves PNG files to the container filesystem, and those files are later retrieved and returned as downloadable URLs.

The final output structure from /api/refine:

{
  "narrative": "## Revenue Concentration Risk\n\nAcross the 14-month period...",
  "charts": [
    {
      "fileId": "file-abc123",
      "url": "https://datastory.bot/api/files/file-abc123",
      "caption": "Revenue by customer segment, monthly"
    }
  ],
  "filteredDataset": "file-def456",
  "stories": [ ... ]
}

The narrative is Markdown. The charts are real PNGs computed against the actual data. The filtered dataset is a CSV the model generated as a by-product of its analysis — often useful for downstream processing.

Why Iterative Execution Beats One-Shot Analysis

The gap between single-prompt and agentic analysis is structural, not a matter of model capability — GPT-4o is the same model in both cases.

Discovery requires observation. A single prompt forces the model to decide what to analyze before it has any information. The agentic loop lets it look first, then decide.

Errors are recoverable. Code fails. Columns have unexpected types. The agentic loop catches execution errors and retries with corrected code — the model sees the traceback and adjusts. A one-shot code generation attempt that fails simply fails.

Depth follows interest. When the model finds something anomalous — a cluster of outliers, a surprising seasonal pattern, a correlation that disappears in one segment — it can pursue that finding. Single-prompt analysis cannot because there is no "additional": it is one shot.

Charts reflect actual findings. In single-prompt approaches, visualizations are generated based on what the model expected to find. In the agentic approach, they are generated after seeing the data — they depict real patterns, not anticipated ones.

The Structured Output Requirement

The agentic loop produces a stream of reasoning, code, execution output, and prose. For this to be integrable in an application, it must end in a predictable schema.

DataStoryBot enforces structured output at the boundary of each phase. The discovery loop must return valid JSON matching the story schema. The refine loop must produce narrative, chart references, and dataset reference in a fixed structure. This requires either JSON mode in the Responses API, explicit schema instructions in the prompt, or validate-and-retry logic when output does not parse.

The agentic loop's flexibility is a feature during analysis and a liability at the output boundary. Enforcing the schema at that boundary is what makes the result writable-to-database, assertable-in-tests, renderable-in-UI. For a broader argument about why this matters, see Why Your AI Data Analysis Needs an API, Not a Chat Window.

Latency Trade-offs

The agentic pattern costs time. Discovery typically takes 20-60 seconds; refinement adds 15-45 more. Total wall time: 35-90 seconds from upload to full narrative with charts.

Single-prompt approaches return in 5-15 seconds. The trade-off is explicit: slower analysis that reflects the actual data versus faster analysis that reflects the model's priors. For interactive use where a user uploads a file and waits, 60 seconds is acceptable when the quality difference is visible. For batch pipelines processing hundreds of files, the latency per file matters less than the analytical depth — and for truly real-time dashboards expecting sub-second generation, neither approach fits and pre-computed analysis is the right architecture.

Summary

The agentic data analysis pattern runs an autonomous loop between the model and the data: execute code, observe output, decide what to do next. This is structurally different from single-prompt approaches, which require the model to commit to an analysis before it has seen what is interesting.

DataStoryBot implements this pattern across three API phases: upload provisions the container, analyze runs an open-ended discovery loop that produces structured story candidates, and refine runs a targeted loop that produces a narrative with real charts. Each phase uses the Responses API with the code_interpreter tool and enforces structured output at the boundary.

The result is an analysis that reflects the actual data rather than a template applied to it. That is the core value of the agentic pattern: the analysis emerges from discovery rather than from assumptions made before the data was seen.

Ready to find your data story?

Upload a CSV and DataStoryBot will uncover the narrative in seconds.

Try DataStoryBot →