generalMarch 24, 20268 min read

Code Interpreter vs. Function Calling for Data Tasks

When to use Code Interpreter for exploratory analysis and chart generation vs. function calling for structured queries and known schemas. A practical comparison.

By DataStoryBot Team

Code Interpreter vs. Function Calling for Data Tasks

OpenAI provides two tools for getting LLMs to work with data: Code Interpreter (run Python in a sandbox) and Function Calling (call your own functions with structured parameters). They solve different problems, and choosing the wrong one wastes time and money.

The short version: Code Interpreter is for exploratory analysis where you don't know what you'll find. Function Calling is for structured queries where you know exactly what you want. DataStoryBot uses Code Interpreter because data analysis is inherently exploratory — the whole point is discovering patterns you didn't know to ask about.

How Each One Works

Code Interpreter

The LLM writes Python code. The code runs in an isolated container. The results come back.

You: "Analyze this sales CSV and find the most interesting patterns"
LLM writes: pandas read → groupby → statistical tests → matplotlib charts
Container executes: Python code runs against the actual data
You get: text output + chart files + filtered datasets

The LLM has full programmatic access to the data. It can inspect column types, compute statistics, iterate on its analysis, handle edge cases, and generate visualizations — all through code.

Function Calling

The LLM decides which of your pre-defined functions to call, with what parameters. Your code runs the function and returns the result.

You: "What was total revenue in the West region last quarter?"
LLM decides: call get_revenue(region="West", period="Q1-2026")
Your code runs: SQL query against your database
You get: {"total_revenue": 2847391, "currency": "USD"}

The LLM doesn't touch the data directly. It's a structured interface layer — translating natural language into function calls that your system executes.

The Trade-Off Matrix

Dimension	Code Interpreter	Function Calling
Data access	Full — reads the actual file	Indirect — calls your functions
Flexibility	Unlimited — any Python code	Constrained — only defined functions
Determinism	Low — different code each run	High — same function, same result
Latency	10-120s (container + execution)	< 1s (function call)
Cost	Higher (compute + tokens)	Lower (tokens only)
Security	Sandboxed container	Your security boundary
Output types	Text + files + charts	Structured JSON
Best for	Exploration, visualization	Structured queries, known schemas

When to Use Code Interpreter

Exploratory analysis. You have a CSV and you want to know what's interesting. Code Interpreter can inspect the data, try different statistical approaches, and surface patterns. You couldn't define functions for "find something interesting" because you don't know what you're looking for.

Chart generation. Code Interpreter runs matplotlib, seaborn, and other visualization libraries. Function Calling returns data; you'd still need a frontend to render charts.

Complex computations. Multi-step analysis — correlation matrices, time-series decomposition, regression — requires iterative computation. Code Interpreter handles this in a single session.

Unknown schemas. When you don't know the column names, types, or data shape in advance, Code Interpreter can inspect and adapt. Function Calling requires you to define the schema upfront.

DataStoryBot's use case. Users upload arbitrary CSVs. The schema is unknown. The analysis is exploratory. Charts are required. Code Interpreter is the only viable option.

When to Use Function Calling

Known queries on known schemas. "What was revenue last quarter?" against a well-defined database. You know the schema, you know the query pattern, you just need the LLM to extract parameters from natural language.

Real-time responses. Function calls return in milliseconds (your code's execution time). Code Interpreter takes seconds to minutes. For chatbot-style interfaces where users expect instant answers, Function Calling wins.

Deterministic results. The same function with the same parameters returns the same result every time. Code Interpreter might write slightly different code on each run, producing slightly different results.

Multi-system orchestration. When the LLM needs to coordinate across multiple APIs — check inventory, place order, send confirmation — Function Calling provides the structured interface. Code Interpreter is a single Python environment, not an API orchestrator.

Sensitive data. Function Calling keeps data in your system. The LLM sees the function signature and result, not the underlying data. Code Interpreter requires uploading the actual data to OpenAI's containers.

A Practical Example

Imagine a product analytics question: "How is feature adoption trending for enterprise customers?"

With Function Calling

tools = [{
    "type": "function",
    "function": {
        "name": "get_feature_adoption",
        "parameters": {
            "type": "object",
            "properties": {
                "segment": {"type": "string", "enum": ["enterprise", "smb", "free"]},
                "metric": {"type": "string", "enum": ["dau", "wau", "mau", "adoption_rate"]},
                "period": {"type": "string", "description": "e.g. last_30_days"}
            }
        }
    }
}]

# LLM calls: get_feature_adoption(segment="enterprise", metric="adoption_rate", period="last_90_days")
# Your code: runs a SQL query, returns {"adoption_rate": [0.23, 0.28, 0.31, ...], "dates": [...]}
# LLM: "Enterprise feature adoption has grown from 23% to 31% over the last 90 days."

Fast, deterministic, secure. But you had to define the function, the allowed parameters, and the underlying query. If the user asks a question that doesn't fit your predefined functions, you're stuck.

With Code Interpreter

# Upload the feature usage CSV
# Steer: "Analyze feature adoption trends for enterprise customers"

# Code Interpreter writes:
# - reads CSV
# - filters to enterprise segment
# - computes adoption rates by week
# - runs a trend test
# - generates a line chart
# - checks if adoption correlates with other variables
# - produces a narrative

# You get: narrative + charts + statistical analysis + unexpected findings

Slower, more expensive, non-deterministic. But it answers questions you didn't anticipate, produces visualizations, and discovers patterns your predefined functions couldn't surface.

The Hybrid Pattern

Many production systems use both:

User question
    ↓
Is this a known query type? ─── Yes → Function Calling (fast, cheap)
    ↓ No
Is this exploratory? ─── Yes → Code Interpreter (flexible, visual)
    ↓ No
Fall back to → LLM response without tools

Route structured questions to Function Calling: "What was last month's revenue?" → pre-defined query function.

Route exploratory questions to Code Interpreter: "What's driving the revenue decline?" → Code Interpreter with uploaded data.

Route simple questions to the LLM itself: "What does CAC mean?" → no tools needed.

DataStoryBot is a Code Interpreter-first system because its users bring unknown data and ask exploratory questions. But if you're building a product analytics chatbot on top of a well-defined data model, Function Calling handles 80% of questions faster and cheaper.

Cost Comparison

Code Interpreter costs more per query because it includes:

Container creation and compute time
Token usage for the code generation prompt and output
File storage during the container's lifetime

Function Calling costs only:

Token usage for the function schema, parameters, and result
Your own compute costs for executing the function

For a typical data analysis query:

Code Interpreter: $0.05-0.20 (depending on model, data size, analysis complexity)
Function Calling: $0.01-0.03 (depending on model and response size)

At scale, this matters. If you're running 10,000 queries per day, Code Interpreter at $0.10 average is $1,000/day. Function Calling at $0.02 average is $200/day.

But the comparison isn't fair — they produce different outputs. Code Interpreter gives you charts, narratives, and statistical analysis. Function Calling gives you a structured answer. The right cost comparison is: "Does the richer output justify 5x the cost?"

What DataStoryBot Chose and Why

DataStoryBot uses Code Interpreter exclusively because:

Unknown schemas. Users upload arbitrary CSVs. Column names, types, and data shapes are unpredictable. Function Calling would require defining functions for every possible CSV structure — impossible.
Chart generation. Charts are a core output. Code Interpreter generates them natively. Function Calling would require a separate visualization step.
Iterative analysis. Good data analysis is iterative — inspect the data, form a hypothesis, test it, refine. Code Interpreter does this in a single session. Function Calling is request-response, not iterative.
Narrative generation. The narrative is based on what the Code Interpreter discovered, not what was pre-queried. The story angles emerge from the analysis, not from predefined queries.

What to Read Next

For the Code Interpreter architecture that DataStoryBot builds on, see OpenAI Code Interpreter for data analysis: a complete guide.

For the container management layer underneath, read how to use the OpenAI Containers API.

For the DataStoryBot API that wraps Code Interpreter into a higher-level interface, start with getting started with the DataStoryBot API.

Ready to find your data story?

Upload a CSV and DataStoryBot will uncover the narrative in seconds.

Try DataStoryBot →