general12 min read

Data Analysis for Product Teams: Feature Usage, Retention, Funnel

Export from Amplitude or Mixpanel, upload to DataStoryBot, and get narrative insights about user behavior patterns.

By DataStoryBot Team

Data Analysis for Product Teams: Feature Usage, Retention, Funnel

Product analytics platforms are good at dashboards. They are less good at answering the question your CEO just asked in Slack. Amplitude and Mixpanel show you what happened. Turning that into a coherent explanation of why, with supporting evidence, is where things break down.

The typical workflow: export a CSV from your analytics tool, open it in Excel or hand it to a data analyst, wait days for a response. The faster workflow: export the CSV, upload it to DataStoryBot, get a narrative with charts in under a minute.

This article covers three product analytics use cases — feature adoption analysis, retention cohort analysis, and funnel conversion analysis — with the exact CSV shapes, steering prompts, and example outputs for each.

What You're Working With

Every major product analytics tool exports to CSV. Amplitude, Mixpanel, Heap, PostHog, and Segment all let you export event data, user property data, or aggregated metric tables. The export format varies, but the structure is consistent: each row is either an event, a user, or a time-bucketed aggregate.

DataStoryBot accepts these CSVs directly. The workflow is the same regardless of which tool you exported from:

  1. Export the CSV from your analytics platform
  2. POST it to /api/upload to get a containerId
  3. POST to /api/analyze with a steering prompt focused on your product question
  4. Select the most relevant story angle and POST to /api/refine
  5. Read the narrative, embed the charts

The steering prompt is the critical step — it's what tells the analysis engine to think like a product manager rather than a generic data analyst. See using steering prompts to control analysis direction for the full mechanics.

Use Case 1: Feature Adoption Analysis

The Business Question

You shipped a new feature three months ago. What percentage of eligible users have adopted it? Are there segments that adopt faster? Is adoption accelerating or plateauing?

CSV Shape from Amplitude

Amplitude's "User Composition" export or a custom funnel export gives you user-level data:

user_id,signup_date,plan,company_size,country,first_used_feature_date,total_feature_events_30d,days_since_signup,has_adopted
u_001,2025-11-14,pro,11-50,US,2025-11-20,42,130,true
u_002,2025-11-14,starter,1-10,UK,,0,130,false
u_003,2025-11-15,enterprise,201-500,US,2025-11-16,187,129,true
u_004,2025-11-17,pro,51-200,CA,2025-12-01,23,127,true
u_005,2025-11-18,starter,1-10,DE,,0,126,false
...

Key columns: user identifier, signup date, plan tier, company size, country, the date they first used the feature (blank if never), usage frequency in the last 30 days, and a boolean adoption flag.

The first_used_feature_date column is important — it lets the analysis compute time-to-adoption, not just whether they adopted.

Steering Prompt

steering = (
    "Analyze this feature adoption dataset. Each row is a user. "
    "'has_adopted' indicates whether the user has ever used the feature. "
    "'first_used_feature_date' is when they first used it; blank means non-adopter. "
    "'total_feature_events_30d' is usage in the last 30 days. "
    "Focus on: "
    "1) Overall adoption rate — what percentage has adopted? "
    "2) Adoption by plan tier — do Pro or Enterprise users adopt faster than Starter? "
    "3) Time-to-adoption distribution — median days from signup to first use. "
    "4) Usage depth — among adopters, what does the distribution of events look like? "
    "5) Adoption by company size — do larger companies adopt at different rates? "
    "Identify any segments with notably high or low adoption."
)

Example DataStoryBot Output

The narrative from a typical feature adoption analysis:

Overall Adoption: 34% at 90 days

Of 2,847 users who signed up before the feature shipped, 968 have adopted it — a 34% adoption rate at the 90-day mark. This is consistent with industry benchmarks for B2B SaaS features (30-40% at 90 days), though the trend line suggests the curve is flattening: 28% of adoption happened in weeks 1-4, 6% in weeks 9-12.

Plan Tier Drives Adoption

Enterprise users adopt at 71%, Pro at 41%, Starter at 18%. The gap between Enterprise and Starter is larger than company size alone explains — Enterprise accounts receive onboarding calls where the feature is demonstrated directly. This suggests a guided onboarding path for Pro users could lift adoption 10-15 percentage points.

Time-to-Adoption: Adopt in Week 1 or Not at All

Median time-to-adoption among adopters is 6 days. The distribution is strongly bimodal: 61% of adopters used the feature within the first 14 days; only 12% adopted after day 30. Users who don't adopt in the first two weeks are unlikely to adopt at all, making the first-14-day window the critical activation period.

Complete Code

import requests

BASE_URL = "https://datastory.bot/api"

with open("feature_adoption.csv", "rb") as f:
    upload = requests.post(f"{BASE_URL}/upload", files={"file": f})
container_id = upload.json()["containerId"]

angles = requests.post(f"{BASE_URL}/analyze", json={
    "containerId": container_id,
    "steeringPrompt": (
        "Analyze this feature adoption dataset. 'has_adopted' indicates adoption. "
        "Focus on: overall adoption rate, adoption by plan tier, "
        "time-to-adoption distribution, usage depth among adopters, "
        "and adoption by company size."
    )
}).json()

# Pick the adoption rate angle
result = requests.post(f"{BASE_URL}/refine", json={
    "containerId": container_id,
    "selectedStoryTitle": angles[0]["title"]
}).json()

print(result["narrative"])
# Access charts: result["charts"]

Use Case 2: Retention and Cohort Analysis

The Business Question

Are users who signed up in January still active in March? How does retention differ across signup cohorts? Where in the retention curve are users dropping off?

CSV Shape from Mixpanel

Mixpanel's retention report exports a cohort-by-week table. It's a wide format CSV where columns represent weeks since signup:

cohort_week,cohort_size,week_0,week_1,week_2,week_4,week_8,week_12
2025-W40,312,100%,68%,52%,41%,29%,22%
2025-W41,289,100%,71%,55%,44%,31%,24%
2025-W42,341,100%,65%,49%,38%,26%,19%
2025-W43,298,100%,72%,58%,47%,35%,27%
2025-W44,276,100%,69%,53%,42%,30%,23%
2025-W45,315,100%,74%,61%,50%,38%,29%
...

Each row is a signup cohort (weekly). Columns are retention at week 0 (always 100%), week 1, week 2, etc. cohort_size is the number of users who signed up in that week.

You can also export user-level retention data for more granular analysis:

user_id,cohort_week,plan,region,week_1_active,week_2_active,week_4_active,week_8_active,week_12_active
u_001,2025-W40,pro,US,true,true,true,false,false
u_002,2025-W40,starter,EU,true,false,false,false,false
u_003,2025-W40,enterprise,US,true,true,true,true,true
...

The user-level format is more useful for segment analysis. If you have it, use it.

Steering Prompt

steering = (
    "Analyze this retention cohort data. "
    "Each row represents a signup cohort identified by 'cohort_week'. "
    "Columns week_0 through week_12 show the percentage of that cohort "
    "still active at each week milestone. "
    "Focus on: "
    "1) Average retention curve — what does typical retention look like across weeks? "
    "2) Cohort comparison — are newer cohorts retaining better or worse than older ones? "
    "3) Retention cliff — at which week does the steepest drop occur? "
    "4) Best and worst performing cohorts — what week did they start, and how large were they? "
    "5) Week-12 retention trend — is long-term retention improving over time? "
    "Compute cohort-size-weighted averages where appropriate."
)

Example DataStoryBot Output

Retention Curve: 23% Average at Week 12

Across all cohorts, the average retention curve follows a sharp early drop followed by stabilization: 70% at week 1, 54% at week 2, 43% at week 4, 31% at week 8, and 23% at week 12. The steepest drop is between weeks 0 and 1, where 30% of users disengage. The curve flattens significantly after week 4, suggesting that users who reach the one-month mark are likely to become long-term retained users.

Cohort Improvement: +7 Points Over 10 Cohorts

Week-12 retention shows a clear upward trend: the W40 cohort retained at 22%, while the W49 cohort (most recent with 12-week data) retained at 29% — a 7 percentage point improvement. This trend is statistically significant (p < 0.01) and consistent with the product updates shipped in November that improved the onboarding flow.

The Week 1 Cliff is the Problem to Solve

30% of users never return after day 7. This is the highest-leverage retention problem in the data. Improving week-1 retention by 5 percentage points would, at steady state, increase week-12 retention by approximately 3-4 points — larger than any other single intervention available.

For a deeper treatment of cohort comparison methodology, see comparing groups in your data: A/B tests, segments, and cohorts.

Pre-Processing Wide-Format Retention Data

Mixpanel's export is often percentage strings ("68%") rather than floats. Strip the percent sign before uploading:

import pandas as pd

df = pd.read_csv("retention_export.csv")

# Convert percentage strings to floats
pct_cols = [c for c in df.columns if c.startswith("week_")]
for col in pct_cols:
    df[col] = df[col].str.rstrip("%").astype(float) / 100

df.to_csv("retention_clean.csv", index=False)

Use Case 3: Funnel Conversion Analysis

The Business Question

What percentage of users who start your signup or upgrade flow complete it? At which step is drop-off highest? Does conversion differ by traffic source, device type, or plan tier?

CSV Shape from Amplitude

Amplitude's funnel analysis exports step-level data. You can export either the aggregated funnel table or user-level event data. The user-level format enables segment analysis:

user_id,session_id,traffic_source,device_type,country,plan_attempted,step_1_viewed_pricing,step_2_clicked_upgrade,step_3_entered_payment,step_4_completed_purchase,time_step1_to_step2_sec,time_step2_to_step3_sec,time_step3_to_step4_sec
u_001,s_a1b2,organic,desktop,US,pro,true,true,true,true,12,45,28
u_002,s_c3d4,paid_search,mobile,UK,pro,true,true,false,false,8,,
u_003,s_e5f6,organic,desktop,US,enterprise,true,false,false,false,,,
u_004,s_g7h8,email,desktop,CA,starter,true,true,true,true,22,67,41
u_005,s_i9j0,paid_social,mobile,AU,pro,true,true,true,false,15,92,
...

Each row is one funnel session. Boolean columns indicate whether the user reached each step. Time-between-steps columns (in seconds) are null when the user didn't reach the next step.

Steering Prompt

steering = (
    "Analyze this funnel conversion data. Each row is a user session. "
    "Steps are: step_1_viewed_pricing → step_2_clicked_upgrade → "
    "step_3_entered_payment → step_4_completed_purchase. "
    "Focus on: "
    "1) Overall funnel conversion — what percentage completes all four steps? "
    "2) Step-by-step drop-off — which step has the highest abandonment? "
    "3) Conversion by traffic source — organic vs. paid vs. email. "
    "4) Conversion by device type — desktop vs. mobile. "
    "5) Time-in-step analysis — do users who convert spend more or less time at payment? "
    "6) Plan tier differences — does attempted plan affect conversion rate? "
    "Flag any step where mobile conversion is significantly lower than desktop."
)

Example DataStoryBot Output

Overall Conversion: 18.4% End-to-End

Of 3,241 sessions that viewed pricing, 597 completed a purchase — an end-to-end conversion rate of 18.4%. The step-by-step breakdown reveals where loss occurs: 94% proceed from pricing to clicking upgrade, 71% proceed from clicking upgrade to entering payment, and 87% of users who enter payment complete the purchase. Payment entry is the critical drop-off point: 29% of users who click upgrade abandon before entering payment details.

Mobile Conversion at Payment is 41% Lower

Desktop users convert at 22.1%; mobile users convert at 13.0%. The gap is not uniform across steps — mobile and desktop perform similarly at steps 1 and 2. The divergence happens at step 3 (payment entry), where mobile abandonment is 41% higher. This is consistent with payment form friction on smaller screens and suggests a mobile-optimized checkout flow as the highest-priority intervention.

Email Traffic Converts at 2.4x Organic Rate

Conversion by traffic source: email (38.2%), organic (16.1%), paid search (14.8%), paid social (9.3%). Email's high conversion reflects intent — these are existing users responding to an upgrade campaign. Paid social's low conversion (9.3%) combined with its higher acquisition cost suggests a negative ROI on paid social for direct conversion campaigns.

Time at Payment: Converters Spend Less Time

Counterintuitively, users who complete the purchase spend a median of 28 seconds on the payment step; users who abandon spend a median of 67 seconds before leaving. This suggests hesitation — users who are going to convert do so quickly, while users who spend longer are weighing costs and ultimately deciding not to proceed. Reducing friction (fewer form fields, trust signals) may recover some of the high-hesitation abandoners.

Understanding why conversion differs across segments is fundamentally a group comparison problem. Comparing groups in your data covers the statistical approach to validating whether these differences are significant.

Preparing Your Export for Better Analysis

Column Naming Conventions

DataStoryBot's Code Interpreter infers column semantics from names. Use descriptive, consistent names:

  • Date columns: signup_date, event_date, cohort_week — not dt, d1, col_3
  • Boolean flags: has_adopted, is_active, completed_step — not flag, y, 1
  • Metrics: events_last_30d, revenue_usd, sessions_count — include units in the name

Handling Amplitude and Mixpanel Export Quirks

Amplitude exports often include a header row with the query name and export timestamp before the actual CSV headers. Strip these:

import pandas as pd

# Read the raw export
raw = pd.read_csv("amplitude_export.csv", skiprows=2)  # Skip metadata rows
raw.to_csv("amplitude_clean.csv", index=False)

Mixpanel cohort exports sometimes produce one file per cohort rather than a single combined file. Concatenate them:

import pandas as pd
import glob

files = glob.glob("mixpanel_cohort_*.csv")
dfs = []
for f in files:
    df = pd.read_csv(f)
    df["source_file"] = f  # Track which cohort this came from
    dfs.append(df)

combined = pd.concat(dfs, ignore_index=True)
combined.to_csv("cohorts_combined.csv", index=False)

Anonymizing Before Upload

Product analytics data often contains PII. Strip it before uploading to any external service:

import pandas as pd
import hashlib

df = pd.read_csv("raw_export.csv")

# Hash user IDs (one-way, consistent within this dataset)
df["user_id"] = df["user_id"].apply(
    lambda x: hashlib.sha256(str(x).encode()).hexdigest()[:12]
)

# Drop direct PII columns
pii_cols = ["email", "name", "phone", "ip_address"]
df = df.drop(columns=[c for c in pii_cols if c in df.columns])

df.to_csv("export_anonymized.csv", index=False)

Hashing user IDs preserves the ability to count unique users and track individuals across steps without exposing raw identifiers.

Choosing the Right Analysis for Your Question

QuestionUse CaseKey Column Types
Who is using the feature?Feature adoptionUser-level with boolean adoption flag
Are users coming back?Retention cohortCohort with week-N activity columns
Where are users dropping off?Funnel conversionSession-level with step boolean flags
Which segment retains best?Cohort + segmentRetention columns + plan/region/size
Is my new onboarding working?Cohort comparisonCompare pre/post launch cohorts
Why did conversion drop?Funnel + time seriesFunnel data with date dimension

The analyses above are independent, but the most useful product questions combine them. A feature adoption analysis followed by a retention analysis comparing adopters vs. non-adopters tells you whether the feature actually drives retention — which is the question that matters.

What to Read Next

For the fundamentals of AI-powered data analysis, start with how to use AI to analyze your data.

For controlling analysis direction with steering prompts, see using steering prompts to control analysis direction.

For group comparison methodology — validating that segment differences are statistically meaningful — read comparing groups in your data: A/B tests, segments, and cohorts.

Ready to find your data story?

Upload a CSV and DataStoryBot will uncover the narrative in seconds.

Try DataStoryBot →