general8 min read

Prompt Engineering for Data Analysis: Steering Prompts That Work

How to write effective steering prompts that focus DataStoryBot's analysis on what matters — with 10 tested examples for common data analysis scenarios.

By DataStoryBot Team

Prompt Engineering for Data Analysis: Steering Prompts That Work

DataStoryBot's steering prompt is the single most impactful parameter in the API. Same dataset, different steering prompt, completely different analysis. A general prompt produces general findings. A specific prompt produces targeted insights.

This isn't magic — it's directing the Code Interpreter's attention. When you say "focus on anomalies," the generated Python code uses z-scores and IQR analysis. When you say "compare regions," it groups by the region column and runs significance tests. The prompt determines which code gets written.

This article presents 10 tested steering prompt patterns for common data analysis scenarios, explains why each one works, and shows how to adapt them to your specific needs.

How Steering Prompts Work

When you call /analyze, the DataStoryBot pipeline does this:

  1. Reads the CSV metadata (columns, types, row count, sample values)
  2. Combines the metadata with your steering prompt into an instruction set
  3. Sends that instruction set to Code Interpreter
  4. Code Interpreter writes and executes Python code to analyze the data
  5. The results are structured into story angles

The steering prompt shapes step 2. Without it, the instruction set says "find the most interesting patterns." With it, you're saying "find these specific patterns."

Good steering prompts share three properties:

  • Specific about what to analyze. Name the columns, metrics, or dimensions you care about.
  • Clear about the analytical lens. Trends, comparisons, anomalies, distributions — tell it which one.
  • Contextual about the domain. A "spike" in error rates is bad. A "spike" in signups is good. Context helps the narrative.

The 10 Patterns

1. The Trend Hunter

When to use: Time-series data where you want to know the direction of change.

steering = (
    "Identify trends in the data over time. Focus on: "
    "the dominant directional trend, any inflection points "
    "where the direction changed, the rate of change, and "
    "how recent performance compares to the historical baseline. "
    "Use the {date_column} column for the time axis."
)

Why it works: "Inflection points" tells Code Interpreter to look for slope changes, not just the overall direction. "Rate of change" gets you derivative analysis, not just "it went up."

2. The Group Comparator

When to use: Data with categorical segments you want to compare.

steering = (
    "Compare performance across groups in the '{group_column}' column. "
    "For each group, analyze {metric_1} and {metric_2}. "
    "Identify which groups outperform or underperform, quantify the gaps, "
    "and run statistical significance tests where sample sizes allow."
)

Why it works: Naming the specific group column and metrics prevents the Code Interpreter from guessing. Requesting significance tests ensures you get p-values, not just different-looking bar charts.

3. The Anomaly Detector

When to use: You suspect something unusual happened and want to find it.

steering = (
    "Focus on anomaly detection. Identify outlier values, "
    "unexpected spikes or drops, and any rows or time periods "
    "that deviate significantly from normal patterns. "
    "Use z-scores and IQR analysis. Flag data quality issues if present. "
    "Context: {your_domain_context}"
)

Why it works: Specifying the statistical methods (z-scores, IQR) ensures rigorous detection. Domain context prevents false positives — mentioning that "we ran a promotion on March 10-12" keeps the Code Interpreter from flagging that spike as anomalous.

4. The Distribution Profiler

When to use: You care about the spread, not just the average.

steering = (
    "Analyze the distribution of {column_name}. Report: "
    "the shape (normal, skewed, bimodal, long-tail), "
    "median vs. mean, key percentiles (p10, p25, p75, p90, p99), "
    "and any notable clusters or gaps in the distribution. "
    "Visualize with a histogram and box plot."
)

Why it works: Requesting specific percentiles and distribution shape forces a thorough statistical profile. Asking for median vs. mean comparison highlights skewness in the narrative.

5. The Correlation Finder

When to use: You want to know which variables move together.

steering = (
    "Find correlations between numeric columns. "
    "Compute the correlation matrix and highlight: "
    "the strongest positive correlations (r > 0.5), "
    "the strongest negative correlations (r < -0.5), "
    "and any surprising non-correlations between variables "
    "you'd expect to be related. Be explicit about "
    "correlation vs. causation."
)

Why it works: Setting the r-threshold prevents a flood of weak correlations. Requesting "surprising non-correlations" surfaces the most interesting findings — what doesn't correlate is often more actionable than what does.

6. The Segment Deep-Dive

When to use: You already know which segment to focus on.

steering = (
    "Focus the analysis on rows where {column} = '{value}'. "
    "Within this segment, analyze all available metrics. "
    "Compare this segment's performance to the overall dataset average. "
    "Identify what makes this segment different."
)

Why it works: Pre-filtering with a specific condition prevents dilution. The comparison to overall averages gives the segment context rather than analyzing it in isolation.

7. The Root Cause Investigator

When to use: You know what happened and want to know why.

steering = (
    "The {metric} dropped {amount} between {period_1} and {period_2}. "
    "Investigate what caused this change. Break down {metric} by "
    "every available dimension (region, product, channel, segment) "
    "and identify which factors contributed most to the decline. "
    "Quantify each factor's contribution."
)

Why it works: You're telling the Code Interpreter the effect and asking it to decompose it. The "quantify each factor's contribution" instruction triggers contribution analysis — breaking the total change into component parts.

8. The Data Quality Auditor

When to use: Before analyzing, to check if the data is trustworthy.

steering = (
    "Perform a data quality assessment only — do not analyze "
    "business trends. For each column, report: data type, "
    "null count and percentage, unique value count, min/max "
    "for numerics, and any suspicious patterns (mixed types, "
    "unusual values, potential encoding issues)."
)

Why it works: "Do not analyze business trends" is the key phrase. Without it, Code Interpreter mixes data quality findings with trend analysis. The explicit instruction keeps it focused on quality.

9. The Forecaster

When to use: You want projections, not just historical analysis.

steering = (
    "Based on historical patterns, project {metric} forward "
    "for the next {n} periods. Report: the projected values, "
    "confidence intervals, and the assumptions behind the "
    "projection. Flag any factors that could invalidate "
    "the forecast (seasonality changes, trend breaks, "
    "insufficient historical data)."
)

Why it works: Requesting confidence intervals prevents false precision. Asking the model to flag invalidating factors ensures the narrative is honest about uncertainty.

10. The Executive Summary

When to use: You need the high-level story for a non-technical audience.

steering = (
    "Provide a high-level executive summary of this dataset. "
    "Focus on 3-5 key findings that a business leader would "
    "care about. Use plain language — no technical statistics "
    "jargon. For each finding, state: what happened, why it "
    "matters, and what action it suggests. Quantify everything."
)

Why it works: "No technical statistics jargon" shapes the narrative tone. "What action it suggests" pushes the output beyond observation into recommendation. "Quantify everything" prevents vague hand-waving.

Composing Prompts

These patterns combine. A real-world steering prompt might layer multiple patterns:

steering = (
    "This is quarterly sales data. "
    # Trend Hunter
    "Identify the revenue trend over the quarter. "
    # Group Comparator
    "Compare performance across the four regions. "
    # Anomaly Detector
    "Flag any weeks where revenue deviated more than 2 standard "
    "deviations from the regional average. "
    # Context
    "We ran a 20% discount promotion in the West region "
    "during week 8 — exclude that from anomaly detection."
)

Layering works because Code Interpreter processes the full instruction set and writes code that addresses all requests. The analysis will include a trend section, a regional comparison, and an anomaly flag section.

Don't over-layer. Three to four instructions is the sweet spot. Beyond that, the Code Interpreter may deprioritize some requests or produce a less focused analysis. If you need five different analyses, run them as separate /analyze calls with focused steering prompts.

Anti-Patterns

Too vague: "Analyze this data" — Code Interpreter picks whatever looks statistically interesting, which may not be what you need.

Too prescriptive: "Run a linear regression of column A on column B, compute the R-squared, residual plot, and ANOVA table" — you're writing the analysis yourself. Let Code Interpreter choose the methods; steer the question, not the technique.

Contradictory: "Focus on anomalies. Also identify trends and compare all regions." — trying to do everything at once dilutes focus. Pick one primary lens.

Assuming column names: "Analyze the revenue column" when the column is actually called total_sales. Check the upload response metadata for exact column names and use them in your steering prompt.

What to Read Next

For the complete API flow that steering prompts fit into, start with getting started with the DataStoryBot API.

For deep-dive examples of the anomaly detection pattern, see anomaly detection in CSV data.

For comparison-focused prompts, read comparing groups in your data.

Or experiment directly — upload data to the DataStoryBot playground and try these steering prompts to see how the analysis changes with each one.

Ready to find your data story?

Upload a CSV and DataStoryBot will uncover the narrative in seconds.

Try DataStoryBot →