generalMarch 24, 20266 min read

Analyzing Sales, Survey, and Sensor Data from CSV

Three worked examples with real-world CSV shapes: e-commerce transactions, NPS survey results, and IoT sensor logs — each analyzed with DataStoryBot.

By DataStoryBot Team

Analyzing Sales, Survey, and Sensor Data from CSV

Different data shapes produce different stories. A sales CSV has transactions, dates, and revenue. A survey CSV has Likert scales, open-ended responses, and demographic segments. A sensor CSV has timestamps at sub-second intervals and numeric readings.

Each requires a different analytical approach, a different steering prompt, and produces a different kind of narrative. This article walks through three complete examples — e-commerce sales, customer survey, and IoT sensor data — showing the full API flow and what to expect from each.

Example 1: E-Commerce Sales Data

The CSV Shape

order_id,order_date,customer_id,product_category,product_name,quantity,unit_price,total_revenue,region,channel
ORD-001,2026-01-03,C-1234,Electronics,Wireless Headphones,1,79.99,79.99,West,organic
ORD-002,2026-01-03,C-5678,Apparel,Running Shoes,1,129.00,129.00,Northeast,paid
ORD-003,2026-01-04,C-9012,Electronics,USB-C Hub,2,34.99,69.98,Southeast,organic
...

Typical columns: order ID, date, customer, product info, revenue, geography, acquisition channel. Rows represent individual transactions.

The Steering Prompt

steering = (
    "Analyze this e-commerce order data. Focus on: "
    "1) Revenue trends over time — is the business growing? "
    "2) Product category performance — which categories drive revenue? "
    "3) Regional breakdown — any regions underperforming? "
    "4) Channel effectiveness — organic vs. paid acquisition. "
    "Highlight any anomalies or unexpected patterns."
)

What DataStoryBot Finds

The analysis typically surfaces:

Revenue trends: "Revenue grew 18% quarter-over-quarter, with growth accelerating in March. The weekday/weekend pattern shows 2.3x higher revenue on Tuesdays (likely payday effect)."

Category insights: "Electronics accounts for 42% of revenue but only 28% of orders — higher average order value. Apparel has the highest order volume but lowest margins."

Regional gaps: "The Southeast region generates 15% of orders but only 9% of revenue — significantly lower average order value ($47 vs. $78 company average)."

Channel comparison: "Organic customers have 23% higher lifetime value than paid customers, but paid channels drive 60% of first-time purchases."

The narrative weaves these findings into a coherent story, with charts for each dimension.

Complete Code

import requests

BASE_URL = "https://datastory.bot/api"

# Upload
with open("ecommerce_orders.csv", "rb") as f:
    upload = requests.post(f"{BASE_URL}/upload", files={"file": f})
container_id = upload.json()["containerId"]

# Analyze
stories = requests.post(f"{BASE_URL}/analyze", json={
    "containerId": container_id,
    "steeringPrompt": (
        "Analyze this e-commerce order data. Focus on: "
        "revenue trends, product category performance, "
        "regional breakdown, and channel effectiveness."
    )
})
angles = stories.json()

# Refine the most interesting finding
report = requests.post(f"{BASE_URL}/refine", json={
    "containerId": container_id,
    "selectedStoryTitle": angles[0]["title"]
})

result = report.json()
print(result["narrative"])

Example 2: Customer Survey Data (NPS)

The CSV Shape

response_id,date,customer_segment,tenure_months,nps_score,satisfaction_overall,satisfaction_support,satisfaction_product,would_recommend,open_feedback
R-001,2026-03-01,Enterprise,24,9,5,4,5,Yes,"Love the API integration"
R-002,2026-03-01,SMB,6,3,2,1,3,No,"Support response times are terrible"
R-003,2026-03-02,Enterprise,36,8,4,4,4,Yes,""
...

Survey data has a different structure: Likert scales (1-5), NPS scores (0-10), boolean responses, and optional free-text feedback. The analytical approach is distribution-focused, not trend-focused.

The Steering Prompt

steering = (
    "Analyze this NPS survey data. Focus on: "
    "1) Overall NPS score and breakdown (Promoters/Passives/Detractors). "
    "2) NPS by customer segment — which segments are happiest? "
    "3) Satisfaction dimension analysis — support, product, overall. "
    "4) Correlation between tenure and satisfaction. "
    "5) Key themes from the open_feedback column if present. "
    "NPS scoring: 9-10 = Promoter, 7-8 = Passive, 0-6 = Detractor."
)

The scoring definition in the steering prompt is important — Code Interpreter needs to know the NPS classification rules to compute the score correctly.

What DataStoryBot Finds

NPS breakdown: "Overall NPS is +32 (47% Promoters, 38% Passives, 15% Detractors). This is above the SaaS industry median of +30."

Segment differences: "Enterprise NPS is +52, SMB is +18. The gap is driven by Detractor rate — 8% of Enterprise respondents are Detractors vs. 22% of SMB."

Dimension analysis: "Support satisfaction is the weakest dimension (mean 3.1/5 vs. 4.2 for product satisfaction). Among Detractors, 78% rate support at 1 or 2."

Tenure correlation: "NPS correlates positively with tenure (r=0.34). Customers under 6 months have an NPS of +8; customers over 24 months have +48. Early churn risk is highest."

Feedback themes: "Top negative themes: response time (mentioned in 34% of Detractor feedback), onboarding confusion (22%), pricing (18%)."

Survey analysis produces a different kind of story — it's about sentiment, segments, and satisfaction drivers rather than revenue and trends.

Example 3: IoT Sensor Data

The CSV Shape

timestamp,device_id,temperature_c,humidity_pct,pressure_hpa,battery_v,signal_rssi
2026-03-01 00:00:00,SENSOR-01,22.3,45.2,1013.25,3.72,-67
2026-03-01 00:01:00,SENSOR-01,22.3,45.1,1013.24,3.72,-68
2026-03-01 00:02:00,SENSOR-01,22.4,45.3,1013.25,3.72,-67
...

Sensor data is high-frequency time series. The analytical approach focuses on temporal patterns, anomalies, and device-level comparisons.

The Steering Prompt

steering = (
    "Analyze this IoT sensor data. Focus on: "
    "1) Temperature and humidity patterns over time — any trends or cycles? "
    "2) Anomaly detection — any sensors reporting unusual values? "
    "3) Device comparison — are all sensors reading consistently? "
    "4) Battery health — any devices showing battery degradation? "
    "5) Signal quality — any devices with poor connectivity? "
    "Data is sampled every minute. Focus on the last 24 hours."
)

What DataStoryBot Finds

Temporal patterns: "Temperature follows a clear diurnal cycle: low of 19.2°C at 04:00, high of 26.8°C at 14:00. Humidity inversely tracks temperature (r=-0.87)."

Anomalies: "SENSOR-04 reported a temperature spike to 42.3°C at 03:17 lasting 8 minutes, then returned to normal. This is a 6.3-sigma event and likely a sensor malfunction rather than actual temperature change, as no other nearby sensors detected it."

Device consistency: "Sensors 01-03 agree within ±0.5°C. SENSOR-05 consistently reads 1.8°C higher — possible calibration drift."

Battery: "SENSOR-02 battery voltage has declined from 3.71V to 3.54V over 7 days — faster than the fleet average of 0.01V/day. Estimated 12 days until replacement threshold (3.0V)."

Sensor analysis is technical and operational — focused on device health, data quality, and environmental monitoring rather than business metrics.

Pre-Processing Tips for Sensor Data

Sensor data is often high-volume. Pre-aggregate before uploading:

import pandas as pd

df = pd.read_csv("sensor_raw.csv")
df["timestamp"] = pd.to_datetime(df["timestamp"])

# Aggregate from per-minute to per-hour
hourly = df.groupby([
    pd.Grouper(key="timestamp", freq="h"),
    "device_id"
]).agg(
    temp_mean=("temperature_c", "mean"),
    temp_min=("temperature_c", "min"),
    temp_max=("temperature_c", "max"),
    humidity_mean=("humidity_pct", "mean"),
    battery_mean=("battery_v", "mean"),
    signal_mean=("signal_rssi", "mean"),
    reading_count=("temperature_c", "count")
).reset_index()

hourly.to_csv("sensor_hourly.csv", index=False)

Hourly aggregation preserves patterns while reducing data volume by 60x. Include min/max alongside mean to preserve anomaly visibility.

Choosing the Right Steering Prompt by Data Type

Data Type	Primary Analysis	Key Steering Focus
Transaction/Sales	Trends, segments, comparisons	Revenue trends, category performance, regional breakdown
Survey/NPS	Distributions, segments, correlations	Score breakdown, segment differences, satisfaction drivers
Sensor/IoT	Time series, anomalies, device comparison	Temporal patterns, anomaly detection, calibration drift
Financial	Trends, ratios, forecasting	Period-over-period, ratio analysis, budget vs. actual
Web Analytics	Funnels, segments, trends	Conversion funnel, traffic sources, engagement metrics
HR/People	Distributions, correlations, segments	Compensation distribution, tenure analysis, department comparison

The data shape determines the analytical approach. Let the steering prompt match the data type.

What to Read Next

For the foundational CSV analysis guide, see how to analyze a CSV file automatically.

For time series patterns in more detail, read time series analysis with DataStoryBot.

For anomaly detection techniques, see anomaly detection in CSV data.

For handling large sensor datasets, read CSV analysis at scale.

Ready to find your data story?

Upload a CSV and DataStoryBot will uncover the narrative in seconds.

Try DataStoryBot →