Financial Data Analysis: Revenue, Expenses, and Margin Stories
P&L CSVs, expense reports, revenue data — how DataStoryBot generates the narrative your CFO actually wants to read.
Financial Data Analysis: Revenue, Expenses, and Margin Stories
Financial data tells a specific kind of story — one where the audience already knows roughly what the numbers should be, and wants to understand why they differ from expectations. A CFO reading a report isn't discovering the business for the first time. They're pattern-matching against targets, prior periods, and mental models of what drives performance.
That audience requirement changes how financial analysis should be written. The narrative needs margin context, not just revenue totals. It needs period-over-period comparisons, not just absolute values. And it needs to call out cost drivers specifically, not just say "expenses increased."
This article covers three financial analysis scenarios — P&L statement analysis, expense categorization and trend analysis, and revenue breakdown by segment — with the CSV shapes, steering prompts, and example narrative outputs for each.
Scenario 1: P&L Statement Analysis
The CSV Shape
Monthly or quarterly P&L data typically lands as one of two shapes. The wide format has one row per period with revenue and expense line items as columns:
period,fiscal_quarter,revenue_product,revenue_services,revenue_other,cogs,gross_profit,sales_marketing,rd_expense,ga_expense,ebitda,net_income
2025-01,Q1-2025,1240000,380000,45000,612000,1053000,287000,198000,143000,425000,318000
2025-02,Q1-2025,1310000,392000,41000,641000,1102000,301000,204000,147000,450000,337000
2025-03,Q1-2025,1185000,408000,52000,598000,1047000,276000,211000,151000,409000,306000
The long format — common from ERP exports — has one row per line item per period:
period,fiscal_quarter,line_item,category,amount
2025-01,Q1-2025,Product Revenue,Revenue,1240000
2025-01,Q1-2025,Services Revenue,Revenue,380000
2025-01,Q1-2025,COGS,Cost,612000
2025-01,Q1-2025,Sales & Marketing,OpEx,287000
2025-01,Q1-2025,R&D,OpEx,198000
2025-01,Q1-2025,G&A,OpEx,143000
DataStoryBot handles both shapes, but the long format requires a steering prompt note so the analysis pivots correctly before aggregating. Include a note like "this is long-format data; pivot on the line_item column before analysis."
Steering Prompts for P&L Analysis
A generic prompt produces generic output. For P&L analysis, the questions that matter are margin trends, the relationship between revenue growth and expense growth, and which line items are moving faster than the top line:
steering = (
"Analyze this P&L data over the full date range. Focus on: "
"1) Gross margin trend — calculate gross margin % each period and identify any compression or expansion. "
"2) EBITDA margin trend — is operating leverage improving (revenue growing faster than OpEx)? "
"3) Expense ratio analysis — what percentage of revenue is each OpEx category? Show the trend. "
"4) Revenue mix — how is the split between product, services, and other revenue changing? "
"5) Identify any quarter where margin deteriorated materially and quantify the primary driver. "
"Format all currency values in USD with commas. Express margins as percentages with one decimal place."
)
The last instruction — format currency and express margins as percentages — matters more than it might seem. Without it, the narrative may present raw floats like 0.3412 instead of 34.1%, which creates friction for finance readers.
Example Narrative Output
The kind of output a CFO finds useful:
Gross margin compressed 3.2 percentage points over the nine-month period, falling from 47.1% in January to 43.9% in September. The compression is not uniform — Q1 averaged 46.8%, Q2 held flat at 46.5%, and Q3 saw the sharpest decline to 43.4%. The Q3 decline coincides with COGS growing 12.3% while revenue grew only 6.1%, suggesting either a pricing issue or a product mix shift toward lower-margin lines.
Operating leverage is absent in the current data. Total OpEx grew from $628,000 in January to $741,000 in September — an 18% increase — while revenue grew 9% over the same period. Sales & Marketing is the primary driver, growing from $287,000 to $352,000 (23% increase). R&D held relatively flat (+6%). G&A grew 11%, slightly above the revenue growth rate.
EBITDA margin fell from 25.3% to 19.8% — a 5.5-point decline that accounts for both the gross margin compression and the OpEx growth. At current trajectories, the Q4 EBITDA margin is projected to fall below 18% absent corrective action on either pricing or operating costs.
That output is roughly 180 words. It answers the questions a CFO has before asking them: where is margin going, what's driving it, and what's the trajectory.
Scenario 2: Expense Categorization and Trend Analysis
The CSV Shape
Expense data from accounting systems is typically transactional — one row per expense event, with vendor, category, amount, department, and date:
expense_id,date,fiscal_period,vendor,category,subcategory,department,amount,currency,approved_by
EXP-4821,2026-01-08,2026-01,AWS,Infrastructure,Cloud Compute,Engineering,18420.00,USD,sarah.chen
EXP-4822,2026-01-09,2026-01,Salesforce,Software,CRM,Sales,4800.00,USD,mark.torres
EXP-4823,2026-01-10,2026-01,Hilton,Travel,Lodging,Sales,342.00,USD,mark.torres
EXP-4824,2026-01-11,2026-01,Delta Airlines,Travel,Airfare,Sales,618.00,USD,mark.torres
EXP-4825,2026-01-12,2026-01,Gusto,HR Software,Payroll Processing,Finance,1200.00,USD,linda.park
This shape is richer than a P&L export because it preserves vendor-level detail and department attribution. The analytical questions are different too: not just "how much did we spend on travel?" but "is travel concentrated in one department?" and "which vendors are growing fastest?"
Steering Prompts for Expense Analysis
steering = (
"Analyze this expense transaction data. Focus on: "
"1) Total spend by category and subcategory — rank by total amount, not transaction count. "
"2) Month-over-month trend for the top 5 expense categories. Flag any category growing more than 15% month-over-month. "
"3) Department breakdown — which departments are driving which categories? "
"4) Vendor concentration — top 10 vendors by total spend, and what % of total spend they represent. "
"5) Identify any unusual single transactions (statistical outliers by amount within their category). "
"All amounts are in USD. Aggregate by `fiscal_period` for trend analysis."
)
For multi-currency expense data, add explicit handling: "The currency column contains mixed currencies. Convert all amounts to USD using these rates before aggregating: GBP=1.27, EUR=1.08, CAD=0.74. Add a converted_usd column." Doing this in the steering prompt is more reliable than preprocessing the CSV manually, because the model will document its conversion logic in the narrative.
Example Narrative Output
Infrastructure spend is the fastest-growing expense category, increasing 34% quarter-over-quarter from $52,000 in Q1 to $70,000 in Q2. AWS accounts for $61,000 of the Q2 total (87% of category spend), with the growth driven by compute costs rather than storage or networking. At the current growth rate, infrastructure will surpass Software subscriptions as the largest single expense category by Q4.
Travel expense shows a department concentration risk. The Sales team accounts for 71% of all travel spend ($38,400 of $54,100 in Q2), yet represents only 22% of headcount. Per-person travel spend in Sales is $1,920/quarter versus a company average of $423/quarter. Whether that's appropriate depends on the sales model, but the concentration warrants review.
Three vendors — AWS, Salesforce, and Workday — account for 41% of total controllable spend. This concentration creates renewal leverage risk. All three contracts renew in Q1 2027, making Q3 2026 the appropriate window to initiate renegotiation.
This kind of output — specific numbers, vendor names, actionable timing — is what separates useful expense analysis from a pivot table someone could have built themselves.
Scenario 3: Revenue Breakdown by Segment
The CSV Shape
Revenue by segment data typically comes from a CRM or billing system export. The key columns are the revenue amount, the time period, and whatever segmentation dimensions the business uses — geography, product line, customer tier, acquisition channel:
period,fiscal_quarter,customer_id,customer_tier,industry,region,sales_rep,product_line,contract_type,arr,mrr,contract_start,contract_end
2026-01,Q1-2026,CUST-1042,Enterprise,Financial Services,Northeast,james.wright,Platform,Annual,120000,10000,2025-07-01,2026-06-30
2026-01,Q1-2026,CUST-2187,Mid-Market,Healthcare,Southeast,priya.kumar,Platform,Monthly,36000,3000,2025-11-01,
2026-01,Q1-2026,CUST-0891,SMB,Retail,West,direct,Starter,Monthly,9600,800,2026-01-15,
2026-01,Q1-2026,CUST-3304,Enterprise,Technology,International,anna.schmidt,Platform + Services,Annual,240000,20000,2025-04-01,2026-03-31
Note that this shape mixes ARR (annualized) and MRR (monthly) metrics in the same file. The steering prompt needs to acknowledge this: "Use MRR for monthly trend analysis and ARR for total book-of-business analysis. Do not add ARR and MRR together."
Steering Prompts for Revenue Segmentation
steering = (
"Analyze this SaaS revenue data by customer segment. Focus on: "
"1) Revenue concentration — what percentage of total ARR comes from the top 10, 20, and 50 customers? "
"2) Tier breakdown — what is the ARR, customer count, and average contract size for Enterprise, Mid-Market, and SMB? "
"3) Revenue by region — identify regions with disproportionately high or low average contract values. "
"4) Product line mix — how is ARR distributed across product lines? Is any line growing its share? "
"5) Contract type analysis — what portion of ARR is on annual vs. monthly contracts? "
"Use ARR for book-of-business metrics. Flag any customers where contract_end is within 90 days of the data date."
)
The 90-day renewal flag is a practical addition. DataStoryBot will write Python to calculate days_to_renewal from the contract_end column and surface at-risk ARR — something a standard BI chart won't do without custom configuration.
For revenue concentration specifically, the prompt "calculate revenue concentration" tends to produce HHI (Herfindahl-Hirschman Index) output, which is technically correct but not what finance teams want. The more explicit framing — "what percentage of ARR comes from the top 10/20/50 customers" — produces the 80/20 analysis that's actually useful for identifying customer concentration risk.
Example Narrative Output
Enterprise customers (72 accounts) represent 68% of total ARR ($8.2M of $12.1M), despite being 18% of the customer base. The average Enterprise contract is $113,900 ARR versus $31,200 for Mid-Market and $8,400 for SMB. Enterprise concentration has grown — it was 61% of ARR twelve months ago — reflecting both new Enterprise wins and a higher churn rate in the SMB segment.
Revenue concentration is high. The top 10 customers represent 31% of total ARR. The top 50 represent 67%. The largest single customer (CUST-3304, $240,000 ARR) represents 2% of total ARR — below the typical threshold for customer concentration risk, but the top 10 collectively create meaningful dependency.
At-risk renewals in the next 90 days total $1.84M ARR across 14 accounts. Three Enterprise accounts ($892,000 combined ARR) are on annual contracts expiring before June 30. No renewal outreach is logged in the data for two of these accounts.
That last finding — renewal risk with a specific dollar figure and a process gap — is the kind of insight that gets acted on immediately. It requires joining the contract_end column with the current date, calculating ARR at risk, and cross-referencing with activity data. DataStoryBot's Code Interpreter does all of that from a single steering prompt.
Practical Considerations for Financial Data
Data Sensitivity
Financial data is sensitive. Before uploading P&L or revenue data to any API, understand the data handling guarantees. DataStoryBot uses OpenAI's Containers API for code execution — file data is scoped to an isolated container per analysis session. Review your organization's data classification policy against the API's data retention terms before processing anything above internal confidentiality.
For regulated environments, anonymization is an option that doesn't compromise analytical quality. Customer IDs, customer names, and vendor names can be replaced with codes without affecting margin calculations, trend analysis, or segment breakdowns. Only narrative output that names specific vendors or customers requires the original identifiers.
Currency Formatting
The model will format numbers however you tell it to. Without instruction, it may output raw floats. Include formatting directives in the steering prompt:
- "Format all currency values in USD with dollar signs and comma separators (e.g., $1,234,567)"
- "Express margins as percentages rounded to one decimal place"
- "Use 'K' for thousands and 'M' for millions in chart axis labels"
These instructions translate directly into f"${value:,.0f}" formatting in the generated Python, which ensures consistent output across the narrative and chart labels.
Handling Fiscal Periods
Companies use calendar quarters, fiscal quarters offset by one to three months, 4-4-5 retail calendars, and custom period definitions. The analysis will be wrong if DataStoryBot infers a calendar-year quarter from a column that contains fiscal periods.
Solve this in the steering prompt: "Fiscal year runs February 1 to January 31. The fiscal_quarter column uses the company's fiscal calendar, not calendar quarters. Do not infer quarter boundaries from the date column — use fiscal_quarter as the grouping dimension for period-over-period analysis."
If the raw data doesn't include a fiscal period column, include a period mapping in the steering prompt:
steering = (
"This company uses a fiscal year starting February 1. "
"Map calendar months to fiscal quarters as follows: "
"Feb-Apr = FQ1, May-Jul = FQ2, Aug-Oct = FQ3, Nov-Jan = FQ4. "
"Apply this mapping to the `date` column before any period aggregation."
)
Mixed Granularity in Multi-Entity Data
Consolidated P&L data from multi-entity organizations — subsidiaries, business units, geographic entities — often has inconsistent granularity. Entity A reports weekly, Entity B reports monthly. Or the same line item appears at different levels of the hierarchy across entities.
Flag this explicitly: "This data contains two business units with different reporting granularities: unit = 'EMEA' reports monthly, unit = 'AMER' reports weekly. Aggregate AMER to monthly before combining for consolidated analysis."
Without this instruction, a comparison of EMEA and AMER totals will appear to show AMER is four times larger — because four weekly rows are summing against one monthly row.
Related Reading
For the time-indexed analysis within P&L data — decomposing seasonality in quarterly revenue, detecting inflection points, or projecting trends forward — see Time Series Analysis with DataStoryBot.
For working with other transactional data shapes — sales orders, survey responses, event logs — the same steering prompt patterns apply: Analyzing Sales, Survey, and Sensor Data from CSV.
For the broader question of how to structure financial narratives for different stakeholders — board decks versus operational reviews versus investor updates — see Data Reporting: Narrative Over Numbers.
Summary
Financial data analysis requires more precision in the steering prompt than most other data types — because the audience is expert, the metrics have specific definitions (ARR is not MRR, gross margin is not net margin), and the formatting expectations are non-negotiable. The three scenarios above cover the most common use cases:
- P&L analysis: focus on margin trends and operating leverage, not absolute figures
- Expense categorization: surface cost drivers by category, department, and vendor — not just totals
- Revenue segmentation: quantify concentration risk, segment economics, and at-risk renewals
In each case, the steering prompt is doing more work than it might appear to — instructing the model on how to interpret the data, what calculations to run, and how to format the output. The narrative DataStoryBot returns is only as precise as the instructions it receives.
Ready to find your data story?
Upload a CSV and DataStoryBot will uncover the narrative in seconds.
Try DataStoryBot →