generalMarch 24, 20268 min read

From Raw CSV to Actionable Insights in Under 60 Seconds

Upload a raw CSV and get actionable insights with charts and narrative in under 60 seconds. Timed walkthrough using the DataStoryBot API.

By DataStoryBot Team

From Raw CSV to Actionable Insights in Under 60 Seconds

People say "quick analysis" and mean thirty minutes. Open a notebook, load the file, fix the date column, try three different groupbys, squint at a matplotlib chart, realize the axis is wrong, fix it, write a sentence about what you found, and then do it again for the next question. That is not quick. That is the minimum viable pandas ritual.

This article is a timed walkthrough. We start with a raw e-commerce CSV — no preprocessing, no schema documentation, no idea what is inside — and end with a written narrative, three charts, and a filtered dataset. The clock is real. Each step shows how long it takes.

The Dataset

We are using a fictional but realistic e-commerce orders CSV: orders_2025.csv. It has 8,400 rows and 11 columns — order ID, date, customer segment, product category, region, units, revenue, discount, shipping cost, return flag, and channel. The kind of file that lands in your inbox with the subject line "can you take a look at this."

No cleaning has been done. Dates are strings. There are nulls in the discount column. It is a real-world mess.

The Clock Starts Now

0:00 — Upload the CSV

curl -X POST https://datastory.bot/api/upload \
  -F "file=@orders_2025.csv"

Response:

{
  "containerId": "ctr_9f8e7d6c5b",
  "fileId": "file-abc123",
  "metadata": {
    "fileName": "orders_2025.csv",
    "rowCount": 8400,
    "columnCount": 11,
    "columns": [
      "order_id", "order_date", "customer_segment", "product_category",
      "region", "units", "revenue", "discount", "shipping_cost",
      "returned", "channel"
    ]
  }
}

The upload endpoint parsed the CSV, spun up an ephemeral Code Interpreter container with GPT-4o, and returned metadata. The container has a 20-minute TTL — more than enough time for what comes next.

Elapsed: ~3 seconds.

0:03 — Analyze for Story Angles

curl -X POST https://datastory.bot/api/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "containerId": "ctr_9f8e7d6c5b"
  }'

Response:

{
  "stories": [
    {
      "title": "Discount-Driven Orders Have 3x the Return Rate",
      "summary": "Orders with discounts above 20% show a 31% return rate compared to 9% for full-price orders, suggesting discounts attract low-intent buyers."
    },
    {
      "title": "Mobile Channel Revenue Grew 47% but Margins Collapsed",
      "summary": "Mobile orders surged in H2 2025 but carried 2.4x the shipping cost per dollar of revenue, eroding margins despite top-line growth."
    },
    {
      "title": "Enterprise Segment Is Quietly Outperforming Every Metric",
      "summary": "Enterprise customers represent 12% of orders but 38% of revenue, with a return rate under 4% and average order value 5x the consumer segment."
    }
  ]
}

This is the step where the AI actually works. Inside the container, Code Interpreter loaded the CSV with pandas, inferred column types, handled the nulls, ran exploratory analysis, and identified three statistically grounded story angles. It wrote and executed Python code to validate each one before returning the summaries.

Elapsed: ~18 seconds. Most of that is compute time inside the container.

0:21 — Pick a Story and Refine

The third angle — enterprise segment outperformance — is the kind of finding that changes resource allocation. Let's refine it.

curl -X POST https://datastory.bot/api/refine \
  -H "Content-Type: application/json" \
  -d '{
    "containerId": "ctr_9f8e7d6c5b",
    "selectedStoryTitle": "Enterprise Segment Is Quietly Outperforming Every Metric"
  }'

Response:

{
  "narrative": "## Enterprise Segment Is Quietly Outperforming Every Metric\n\nThe enterprise customer segment represents just 12% of total orders in the 2025 dataset, but accounts for 38% of revenue — a concentration ratio that has increased every quarter...",
  "charts": [
    {
      "fileId": "file-chart001",
      "caption": "Revenue share by customer segment, showing enterprise dominance despite low order volume"
    },
    {
      "fileId": "file-chart002",
      "caption": "Return rate by segment and discount tier"
    },
    {
      "fileId": "file-chart003",
      "caption": "Average order value trend by segment, Q1-Q4 2025"
    }
  ],
  "resultDataset": {
    "fileId": "file-ds001",
    "caption": "Filtered dataset: enterprise segment orders with computed metrics"
  }
}

The refine step generated a multi-paragraph narrative with specific numbers, three publication-quality charts (dark theme, 150 DPI, transparent PNG), and a filtered CSV with just the enterprise rows plus computed columns like margin and lifetime value.

Elapsed: ~38 seconds total.

0:38 — Download Everything

# Charts
curl -o segment_revenue.png \
  "https://datastory.bot/api/files/ctr_9f8e7d6c5b/file-chart001"

curl -o return_rates.png \
  "https://datastory.bot/api/files/ctr_9f8e7d6c5b/file-chart002"

curl -o aov_trend.png \
  "https://datastory.bot/api/files/ctr_9f8e7d6c5b/file-chart003"

# Filtered dataset
curl -o enterprise_orders.csv \
  "https://datastory.bot/api/files/ctr_9f8e7d6c5b/file-ds001"

Elapsed: ~42 seconds. Done.

The Same Thing in Python

Here is the complete script, end to end:

import requests
import time

BASE_URL = "https://datastory.bot"
start = time.time()

# Upload
with open("orders_2025.csv", "rb") as f:
    upload = requests.post(
        f"{BASE_URL}/api/upload",
        files={"file": ("orders_2025.csv", f, "text/csv")}
    ).json()

container_id = upload["containerId"]
print(f"[{time.time()-start:.1f}s] Uploaded: {upload['metadata']['rowCount']} rows")

# Analyze
stories = requests.post(
    f"{BASE_URL}/api/analyze",
    json={"containerId": container_id}
).json()

print(f"[{time.time()-start:.1f}s] Found {len(stories['stories'])} story angles:")
for s in stories["stories"]:
    print(f"  - {s['title']}")

# Refine the most interesting story
selected = stories["stories"][2]["title"]
result = requests.post(
    f"{BASE_URL}/api/refine",
    json={
        "containerId": container_id,
        "selectedStoryTitle": selected
    }
).json()

print(f"[{time.time()-start:.1f}s] Narrative: {len(result['narrative'])} chars")

# Download charts
for chart in result["charts"]:
    resp = requests.get(
        f"{BASE_URL}/api/files/{container_id}/{chart['fileId']}"
    )
    filename = f"chart_{chart['fileId']}.png"
    with open(filename, "wb") as f:
        f.write(resp.content)
    print(f"[{time.time()-start:.1f}s] Saved {filename}")

# Download filtered dataset
ds = result["resultDataset"]
resp = requests.get(f"{BASE_URL}/api/files/{container_id}/{ds['fileId']}")
with open("enterprise_filtered.csv", "wb") as f:
    f.write(resp.content)

print(f"[{time.time()-start:.1f}s] Complete.")

Typical output:

[2.8s] Uploaded: 8400 rows
[19.4s] Found 3 story angles:
  - Discount-Driven Orders Have 3x the Return Rate
  - Mobile Channel Revenue Grew 47% but Margins Collapsed
  - Enterprise Segment Is Quietly Outperforming Every Metric
[41.2s] Narrative: 1847 chars
[42.1s] Saved chart_file-chart001.png
[42.4s] Saved chart_file-chart002.png
[42.7s] Saved chart_file-chart003.png
[43.0s] Complete.

43 seconds from raw file to finished analysis. No pandas import. No matplotlib config. No cleaning code.

Where the Time Actually Goes

The breakdown matters because it tells you what to expect:

Step	Time	What happens
Upload	~3s	File transfer, CSV parsing, container creation
Analyze	~15s	AI reads data, runs exploratory code, identifies patterns
Refine	~20s	AI generates narrative, creates charts, filters dataset
Download	~4s	File transfer for PNGs and CSV

The analyze and refine steps are where the real work happens. The AI is writing and executing Python code inside the container — computing group-bys, running statistical tests, generating matplotlib figures. Fifteen to twenty seconds per step is fast for what it is actually doing.

Steering for Faster Iteration

If you already know what you are looking for, you can skip the guessing and steer the analysis:

curl -X POST https://datastory.bot/api/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "containerId": "ctr_9f8e7d6c5b",
    "steeringPrompt": "Focus on return rates and their relationship to discounting"
  }'

This does not force the output. It tells the AI to weight certain patterns more heavily during exploration. The three stories will still be data-driven, but they will cluster around your area of interest. This is useful when you are running the same type of analysis repeatedly — weekly reports, monthly reviews — and want consistent themes.

What Makes This Different from ChatGPT

You could paste a CSV into ChatGPT and ask for analysis. Here is why the API approach is better for anything beyond one-off exploration:

Reproducibility. The same CSV with the same parameters produces the same story structure. You can version-control the API calls. You can diff the outputs week over week.

Automation. Three curl commands or ten lines of Python. No copy-pasting into a chat window. No manually downloading images. No parsing prose for the numbers you need. Schedule it with cron and walk away.

Integration. The output is structured JSON. Charts are downloadable PNGs. The narrative is Markdown. You can pipe this into Slack, email, dashboards, or your own application without any format conversion.

Ephemeral containers. Your data lives in a sandboxed environment for 20 minutes and then it is gone. No data lingering in conversation histories. No data retention policies to worry about.

What the Output Actually Looks Like

The narrative that comes back from the refine step is not a vague summary. It reads like something a data analyst would write in a Slack message to their team: specific numbers, comparisons to baselines, and a clear recommendation at the end. Typical length is 300-500 words of Markdown with headers, bold text, and inline statistics.

The charts are 150 DPI PNGs with a dark theme — #141414 background, light text, transparent export. They are designed to drop into dark-mode dashboards, slide decks, or email templates without post-processing. If you need a different style, pass a refinementPrompt like "Use a light theme suitable for print".

The filtered dataset is a CSV containing only the rows the AI used to support its analysis. This is the audit trail: it lets you verify every claim in the narrative against the actual data.

Try It

The DataStoryBot playground lets you run this entire flow in a browser — no code required. Upload a CSV and watch the clock yourself.

For a deeper look at what the analysis step actually does under the hood, read How to Analyze a CSV File Automatically. And if you want to customize the chart output, see How to Generate Charts from CSV Data Automatically for chart type selection, styling, and embedding patterns.

Raw CSV in, actionable insights out, under a minute. That is the bar now.

Ready to find your data story?

Upload a CSV and DataStoryBot will uncover the narrative in seconds.

Try DataStoryBot →