Using Steering Prompts to Control Analysis Direction
Deep-dive into the steering prompt parameter: how to focus DataStoryBot's analysis on specific columns, time periods, or business questions — with 10 tested examples.
Using Steering Prompts to Control Analysis Direction
The steeringPrompt parameter in the /analyze endpoint is the difference between "here's some data, tell me something" and "here's some data, answer this specific question." Both work. One is useful.
When you omit the steering prompt, DataStoryBot's Code Interpreter examines your CSV and surfaces whatever patterns it finds most statistically interesting. That's great for exploration. But when you already know what question you're asking — "why did conversion drop last week?" or "how does enterprise churn compare to SMB?" — the steering prompt focuses the analysis on your question.
This article is a deep-dive into steering prompt mechanics: how they work, how to write them, and tested examples for common analysis scenarios.
Mechanics
The steering prompt is injected into the instruction set that drives Code Interpreter's code generation. Here's what happens:
- DataStoryBot reads your CSV metadata — column names, types, row count, sample values
- It constructs a system prompt combining the metadata with analysis instructions
- Your steering prompt is appended as "the user wants to focus on..."
- Code Interpreter generates Python code that addresses the combined instruction
- The code runs and produces findings, charts, and statistics
- Findings are structured into story angles
The steering prompt influences step 4 — what code gets written. Different steering prompts produce different pandas operations, different statistical tests, different matplotlib charts.
Anatomy of a Good Steering Prompt
Three components make a steering prompt effective:
1. The Question
What do you want to know? State it directly:
- "Which regions are underperforming?"
- "Is there a seasonal pattern in this data?"
- "What caused the March revenue drop?"
- "How do enterprise and SMB customers differ?"
2. The Focus
Which columns, groups, or time periods should the analysis prioritize?
- "Focus on the
revenueandchurn_ratecolumns" - "Compare rows where
plan = 'enterprise'vs.plan = 'smb'" - "Analyze the period from 2026-01-01 to 2026-03-31"
- "Group by the
regioncolumn"
3. The Context
Domain knowledge that helps Code Interpreter interpret the data correctly:
- "We ran a promotion March 10-12 — exclude from anomaly detection"
- "Revenue is in USD. The
amountcolumn is in cents, not dollars" - "Null values in the
discountcolumn mean no discount was applied, not missing data" - "This is weekly data — each row represents one week"
Tested Examples
Focusing on Specific Columns
# Instead of analyzing everything, focus on what matters
steering = (
"Focus the analysis on the relationship between "
"marketing_spend and new_signups. Ignore all other columns "
"unless they help explain this relationship."
)
Code Interpreter will compute correlation, create a scatter plot, check for lagged effects, and narrate the relationship — instead of producing a general overview of all columns.
Filtering to a Time Period
steering = (
"Analyze only the data from Q1 2026 (January through March). "
"Compare each month's performance to the prior-year same month "
"if historical data is available."
)
Targeting a Specific Segment
steering = (
"Filter to enterprise customers only (plan = 'enterprise'). "
"Within this segment, analyze retention, feature usage, "
"and expansion revenue. Compare enterprise performance to "
"the overall company averages."
)
Asking a Root-Cause Question
steering = (
"Conversion rate dropped from 4.2% to 3.1% between February "
"and March. Investigate what caused the decline. Break down "
"conversion by every available dimension — traffic source, "
"device type, landing page, geographic region — and identify "
"which factors contributed most to the drop."
)
This is one of the most powerful uses. You state the effect ("conversion dropped") and ask Code Interpreter to decompose it into contributing factors.
Requesting Specific Statistical Tests
steering = (
"This is an A/B test. The 'variant' column has values "
"'control' and 'treatment'. Run a chi-squared test on "
"conversion rate and a Mann-Whitney U test on revenue "
"(revenue is right-skewed, so parametric tests won't work). "
"Report effect sizes, confidence intervals, and p-values."
)
Combining Multiple Analyses
steering = (
"Analyze this quarterly sales data in three ways: "
"1) Revenue trend over the quarter with growth rate. "
"2) Regional comparison — which regions are growing fastest? "
"3) Anomaly detection — any weeks that deviated significantly "
"from the trend? "
"Present findings in priority order."
)
Layering three requests in one prompt works when the requests are complementary. Code Interpreter will address all three in the generated code.
Providing Domain Context
steering = (
"This is SaaS subscription data. Key context: "
"- Annual plans renew in January, so January always has a revenue spike. "
"- We raised prices 15% on March 1 for new customers only. "
"- The 'mrr' column is Monthly Recurring Revenue in USD. "
"- Churn is calculated as churned_customers / start_of_month_customers. "
"Analyze MRR trends and churn, accounting for these factors."
)
Domain context prevents misinterpretation. Without the January renewal note, Code Interpreter would flag the January spike as an anomaly or a trend change.
Requesting Specific Output Format
steering = (
"Summarize findings for an executive audience. "
"Lead with the single most important insight. "
"Use plain language, no statistical jargon. "
"Quantify everything. End with a recommended action."
)
The steering prompt can shape the narrative tone and structure, not just the analysis.
Comparing to Benchmarks
steering = (
"Compare our metrics to these industry benchmarks: "
"SaaS median churn rate: 5-7% monthly, "
"median NPS: 30-40, "
"median CAC payback: 12-18 months. "
"For each metric in our data, state whether we're "
"above, below, or in line with the benchmark."
)
Code Interpreter can't look up external benchmarks, but it can compare against values you provide.
Guiding Chart Generation
steering = (
"Generate charts with these specifications: "
"- Use a light background (white, not dark theme) "
"- Color palette: #2563eb, #f59e0b, #10b981 "
"- Include data labels on bar charts "
"- Use 14pt font for chart titles "
"- Create at least 3 charts covering different aspects"
)
What Steering Prompts Can't Do
Override the data. If your CSV has 10 columns, the steering prompt can't analyze an 11th that doesn't exist. It directs attention within the data, not beyond it.
Guarantee specific findings. "Prove that marketing spend drives revenue" won't make Code Interpreter fabricate a correlation that isn't there. It will look for the correlation and report what it finds.
Control execution time. A complex steering prompt (multi-analysis, many comparisons) takes longer than a simple one. You can't specify "return in under 30 seconds."
Access external data. The container is sandboxed with no network access. The steering prompt can reference external benchmarks (you provide the numbers), but it can't fetch data from URLs or databases.
Iteration Pattern
The best steering prompts come from iteration:
- First run: no steering. See what DataStoryBot finds on its own.
- Second run: focused. Steer toward the most interesting finding from run 1.
- Third run: precise. Add domain context, specific comparisons, or output formatting.
Each run uses the same container (within the 20-minute TTL), so there's no re-upload cost.
# Run 1: General
stories = analyze(container_id, steering=None)
# → Finds "Revenue grew 15% but churn also increased"
# Run 2: Focus on the tension
stories = analyze(container_id,
steering="Revenue is growing but churn is also increasing. "
"Investigate whether the new revenue is from lower-quality "
"customers who churn faster."
)
# → Finds "Customers from paid channels churn 2.3x faster"
# Run 3: Quantify the impact
stories = analyze(container_id,
steering="Quantify the impact of paid-channel customer churn on net revenue. "
"What's the break-even CAC if these customers churn at 2.3x the rate? "
"Compare the LTV of paid vs. organic customers."
)
Three analyses, increasing focus, all in the same container. The final analysis answers a specific, actionable question.
What to Read Next
For a pattern library of steering prompts, see prompt engineering for data analysis.
For the API endpoint details, see the DataStoryBot API reference.
For a broader guide to the analysis workflow, start with how to use AI to analyze data.
Or experiment directly — the DataStoryBot playground has a steering prompt input field where you can test these patterns interactively.
Ready to find your data story?
Upload a CSV and DataStoryBot will uncover the narrative in seconds.
Try DataStoryBot →