Data Analysis for Product Teams: Feature Usage, Retention, Funnel
Export from Amplitude or Mixpanel, upload to DataStoryBot, and get narrative insights about user behavior patterns.
Data Analysis for Product Teams: Feature Usage, Retention, Funnel
Product analytics platforms are good at dashboards. They are less good at answering the question your CEO just asked in Slack. Amplitude and Mixpanel show you what happened. Turning that into a coherent explanation of why, with supporting evidence, is where things break down.
The typical workflow: export a CSV from your analytics tool, open it in Excel or hand it to a data analyst, wait days for a response. The faster workflow: export the CSV, upload it to DataStoryBot, get a narrative with charts in under a minute.
This article covers three product analytics use cases — feature adoption analysis, retention cohort analysis, and funnel conversion analysis — with the exact CSV shapes, steering prompts, and example outputs for each.
What You're Working With
Every major product analytics tool exports to CSV. Amplitude, Mixpanel, Heap, PostHog, and Segment all let you export event data, user property data, or aggregated metric tables. The export format varies, but the structure is consistent: each row is either an event, a user, or a time-bucketed aggregate.
DataStoryBot accepts these CSVs directly. The workflow is the same regardless of which tool you exported from:
- Export the CSV from your analytics platform
- POST it to
/api/uploadto get acontainerId - POST to
/api/analyzewith a steering prompt focused on your product question - Select the most relevant story angle and POST to
/api/refine - Read the narrative, embed the charts
The steering prompt is the critical step — it's what tells the analysis engine to think like a product manager rather than a generic data analyst. See using steering prompts to control analysis direction for the full mechanics.
Use Case 1: Feature Adoption Analysis
The Business Question
You shipped a new feature three months ago. What percentage of eligible users have adopted it? Are there segments that adopt faster? Is adoption accelerating or plateauing?
CSV Shape from Amplitude
Amplitude's "User Composition" export or a custom funnel export gives you user-level data:
user_id,signup_date,plan,company_size,country,first_used_feature_date,total_feature_events_30d,days_since_signup,has_adopted
u_001,2025-11-14,pro,11-50,US,2025-11-20,42,130,true
u_002,2025-11-14,starter,1-10,UK,,0,130,false
u_003,2025-11-15,enterprise,201-500,US,2025-11-16,187,129,true
u_004,2025-11-17,pro,51-200,CA,2025-12-01,23,127,true
u_005,2025-11-18,starter,1-10,DE,,0,126,false
...
Key columns: user identifier, signup date, plan tier, company size, country, the date they first used the feature (blank if never), usage frequency in the last 30 days, and a boolean adoption flag.
The first_used_feature_date column is important — it lets the analysis compute time-to-adoption, not just whether they adopted.
Steering Prompt
steering = (
"Analyze this feature adoption dataset. Each row is a user. "
"'has_adopted' indicates whether the user has ever used the feature. "
"'first_used_feature_date' is when they first used it; blank means non-adopter. "
"'total_feature_events_30d' is usage in the last 30 days. "
"Focus on: "
"1) Overall adoption rate — what percentage has adopted? "
"2) Adoption by plan tier — do Pro or Enterprise users adopt faster than Starter? "
"3) Time-to-adoption distribution — median days from signup to first use. "
"4) Usage depth — among adopters, what does the distribution of events look like? "
"5) Adoption by company size — do larger companies adopt at different rates? "
"Identify any segments with notably high or low adoption."
)
Example DataStoryBot Output
The narrative from a typical feature adoption analysis:
Overall Adoption: 34% at 90 days
Of 2,847 users who signed up before the feature shipped, 968 have adopted it — a 34% adoption rate at the 90-day mark. This is consistent with industry benchmarks for B2B SaaS features (30-40% at 90 days), though the trend line suggests the curve is flattening: 28% of adoption happened in weeks 1-4, 6% in weeks 9-12.
Plan Tier Drives Adoption
Enterprise users adopt at 71%, Pro at 41%, Starter at 18%. The gap between Enterprise and Starter is larger than company size alone explains — Enterprise accounts receive onboarding calls where the feature is demonstrated directly. This suggests a guided onboarding path for Pro users could lift adoption 10-15 percentage points.
Time-to-Adoption: Adopt in Week 1 or Not at All
Median time-to-adoption among adopters is 6 days. The distribution is strongly bimodal: 61% of adopters used the feature within the first 14 days; only 12% adopted after day 30. Users who don't adopt in the first two weeks are unlikely to adopt at all, making the first-14-day window the critical activation period.
Complete Code
import requests
BASE_URL = "https://datastory.bot/api"
with open("feature_adoption.csv", "rb") as f:
upload = requests.post(f"{BASE_URL}/upload", files={"file": f})
container_id = upload.json()["containerId"]
angles = requests.post(f"{BASE_URL}/analyze", json={
"containerId": container_id,
"steeringPrompt": (
"Analyze this feature adoption dataset. 'has_adopted' indicates adoption. "
"Focus on: overall adoption rate, adoption by plan tier, "
"time-to-adoption distribution, usage depth among adopters, "
"and adoption by company size."
)
}).json()
# Pick the adoption rate angle
result = requests.post(f"{BASE_URL}/refine", json={
"containerId": container_id,
"selectedStoryTitle": angles[0]["title"]
}).json()
print(result["narrative"])
# Access charts: result["charts"]
Use Case 2: Retention and Cohort Analysis
The Business Question
Are users who signed up in January still active in March? How does retention differ across signup cohorts? Where in the retention curve are users dropping off?
CSV Shape from Mixpanel
Mixpanel's retention report exports a cohort-by-week table. It's a wide format CSV where columns represent weeks since signup:
cohort_week,cohort_size,week_0,week_1,week_2,week_4,week_8,week_12
2025-W40,312,100%,68%,52%,41%,29%,22%
2025-W41,289,100%,71%,55%,44%,31%,24%
2025-W42,341,100%,65%,49%,38%,26%,19%
2025-W43,298,100%,72%,58%,47%,35%,27%
2025-W44,276,100%,69%,53%,42%,30%,23%
2025-W45,315,100%,74%,61%,50%,38%,29%
...
Each row is a signup cohort (weekly). Columns are retention at week 0 (always 100%), week 1, week 2, etc. cohort_size is the number of users who signed up in that week.
You can also export user-level retention data for more granular analysis:
user_id,cohort_week,plan,region,week_1_active,week_2_active,week_4_active,week_8_active,week_12_active
u_001,2025-W40,pro,US,true,true,true,false,false
u_002,2025-W40,starter,EU,true,false,false,false,false
u_003,2025-W40,enterprise,US,true,true,true,true,true
...
The user-level format is more useful for segment analysis. If you have it, use it.
Steering Prompt
steering = (
"Analyze this retention cohort data. "
"Each row represents a signup cohort identified by 'cohort_week'. "
"Columns week_0 through week_12 show the percentage of that cohort "
"still active at each week milestone. "
"Focus on: "
"1) Average retention curve — what does typical retention look like across weeks? "
"2) Cohort comparison — are newer cohorts retaining better or worse than older ones? "
"3) Retention cliff — at which week does the steepest drop occur? "
"4) Best and worst performing cohorts — what week did they start, and how large were they? "
"5) Week-12 retention trend — is long-term retention improving over time? "
"Compute cohort-size-weighted averages where appropriate."
)
Example DataStoryBot Output
Retention Curve: 23% Average at Week 12
Across all cohorts, the average retention curve follows a sharp early drop followed by stabilization: 70% at week 1, 54% at week 2, 43% at week 4, 31% at week 8, and 23% at week 12. The steepest drop is between weeks 0 and 1, where 30% of users disengage. The curve flattens significantly after week 4, suggesting that users who reach the one-month mark are likely to become long-term retained users.
Cohort Improvement: +7 Points Over 10 Cohorts
Week-12 retention shows a clear upward trend: the W40 cohort retained at 22%, while the W49 cohort (most recent with 12-week data) retained at 29% — a 7 percentage point improvement. This trend is statistically significant (p < 0.01) and consistent with the product updates shipped in November that improved the onboarding flow.
The Week 1 Cliff is the Problem to Solve
30% of users never return after day 7. This is the highest-leverage retention problem in the data. Improving week-1 retention by 5 percentage points would, at steady state, increase week-12 retention by approximately 3-4 points — larger than any other single intervention available.
For a deeper treatment of cohort comparison methodology, see comparing groups in your data: A/B tests, segments, and cohorts.
Pre-Processing Wide-Format Retention Data
Mixpanel's export is often percentage strings ("68%") rather than floats. Strip the percent sign before uploading:
import pandas as pd
df = pd.read_csv("retention_export.csv")
# Convert percentage strings to floats
pct_cols = [c for c in df.columns if c.startswith("week_")]
for col in pct_cols:
df[col] = df[col].str.rstrip("%").astype(float) / 100
df.to_csv("retention_clean.csv", index=False)
Use Case 3: Funnel Conversion Analysis
The Business Question
What percentage of users who start your signup or upgrade flow complete it? At which step is drop-off highest? Does conversion differ by traffic source, device type, or plan tier?
CSV Shape from Amplitude
Amplitude's funnel analysis exports step-level data. You can export either the aggregated funnel table or user-level event data. The user-level format enables segment analysis:
user_id,session_id,traffic_source,device_type,country,plan_attempted,step_1_viewed_pricing,step_2_clicked_upgrade,step_3_entered_payment,step_4_completed_purchase,time_step1_to_step2_sec,time_step2_to_step3_sec,time_step3_to_step4_sec
u_001,s_a1b2,organic,desktop,US,pro,true,true,true,true,12,45,28
u_002,s_c3d4,paid_search,mobile,UK,pro,true,true,false,false,8,,
u_003,s_e5f6,organic,desktop,US,enterprise,true,false,false,false,,,
u_004,s_g7h8,email,desktop,CA,starter,true,true,true,true,22,67,41
u_005,s_i9j0,paid_social,mobile,AU,pro,true,true,true,false,15,92,
...
Each row is one funnel session. Boolean columns indicate whether the user reached each step. Time-between-steps columns (in seconds) are null when the user didn't reach the next step.
Steering Prompt
steering = (
"Analyze this funnel conversion data. Each row is a user session. "
"Steps are: step_1_viewed_pricing → step_2_clicked_upgrade → "
"step_3_entered_payment → step_4_completed_purchase. "
"Focus on: "
"1) Overall funnel conversion — what percentage completes all four steps? "
"2) Step-by-step drop-off — which step has the highest abandonment? "
"3) Conversion by traffic source — organic vs. paid vs. email. "
"4) Conversion by device type — desktop vs. mobile. "
"5) Time-in-step analysis — do users who convert spend more or less time at payment? "
"6) Plan tier differences — does attempted plan affect conversion rate? "
"Flag any step where mobile conversion is significantly lower than desktop."
)
Example DataStoryBot Output
Overall Conversion: 18.4% End-to-End
Of 3,241 sessions that viewed pricing, 597 completed a purchase — an end-to-end conversion rate of 18.4%. The step-by-step breakdown reveals where loss occurs: 94% proceed from pricing to clicking upgrade, 71% proceed from clicking upgrade to entering payment, and 87% of users who enter payment complete the purchase. Payment entry is the critical drop-off point: 29% of users who click upgrade abandon before entering payment details.
Mobile Conversion at Payment is 41% Lower
Desktop users convert at 22.1%; mobile users convert at 13.0%. The gap is not uniform across steps — mobile and desktop perform similarly at steps 1 and 2. The divergence happens at step 3 (payment entry), where mobile abandonment is 41% higher. This is consistent with payment form friction on smaller screens and suggests a mobile-optimized checkout flow as the highest-priority intervention.
Email Traffic Converts at 2.4x Organic Rate
Conversion by traffic source: email (38.2%), organic (16.1%), paid search (14.8%), paid social (9.3%). Email's high conversion reflects intent — these are existing users responding to an upgrade campaign. Paid social's low conversion (9.3%) combined with its higher acquisition cost suggests a negative ROI on paid social for direct conversion campaigns.
Time at Payment: Converters Spend Less Time
Counterintuitively, users who complete the purchase spend a median of 28 seconds on the payment step; users who abandon spend a median of 67 seconds before leaving. This suggests hesitation — users who are going to convert do so quickly, while users who spend longer are weighing costs and ultimately deciding not to proceed. Reducing friction (fewer form fields, trust signals) may recover some of the high-hesitation abandoners.
Understanding why conversion differs across segments is fundamentally a group comparison problem. Comparing groups in your data covers the statistical approach to validating whether these differences are significant.
Preparing Your Export for Better Analysis
Column Naming Conventions
DataStoryBot's Code Interpreter infers column semantics from names. Use descriptive, consistent names:
- Date columns:
signup_date,event_date,cohort_week— notdt,d1,col_3 - Boolean flags:
has_adopted,is_active,completed_step— notflag,y,1 - Metrics:
events_last_30d,revenue_usd,sessions_count— include units in the name
Handling Amplitude and Mixpanel Export Quirks
Amplitude exports often include a header row with the query name and export timestamp before the actual CSV headers. Strip these:
import pandas as pd
# Read the raw export
raw = pd.read_csv("amplitude_export.csv", skiprows=2) # Skip metadata rows
raw.to_csv("amplitude_clean.csv", index=False)
Mixpanel cohort exports sometimes produce one file per cohort rather than a single combined file. Concatenate them:
import pandas as pd
import glob
files = glob.glob("mixpanel_cohort_*.csv")
dfs = []
for f in files:
df = pd.read_csv(f)
df["source_file"] = f # Track which cohort this came from
dfs.append(df)
combined = pd.concat(dfs, ignore_index=True)
combined.to_csv("cohorts_combined.csv", index=False)
Anonymizing Before Upload
Product analytics data often contains PII. Strip it before uploading to any external service:
import pandas as pd
import hashlib
df = pd.read_csv("raw_export.csv")
# Hash user IDs (one-way, consistent within this dataset)
df["user_id"] = df["user_id"].apply(
lambda x: hashlib.sha256(str(x).encode()).hexdigest()[:12]
)
# Drop direct PII columns
pii_cols = ["email", "name", "phone", "ip_address"]
df = df.drop(columns=[c for c in pii_cols if c in df.columns])
df.to_csv("export_anonymized.csv", index=False)
Hashing user IDs preserves the ability to count unique users and track individuals across steps without exposing raw identifiers.
Choosing the Right Analysis for Your Question
| Question | Use Case | Key Column Types |
|---|---|---|
| Who is using the feature? | Feature adoption | User-level with boolean adoption flag |
| Are users coming back? | Retention cohort | Cohort with week-N activity columns |
| Where are users dropping off? | Funnel conversion | Session-level with step boolean flags |
| Which segment retains best? | Cohort + segment | Retention columns + plan/region/size |
| Is my new onboarding working? | Cohort comparison | Compare pre/post launch cohorts |
| Why did conversion drop? | Funnel + time series | Funnel data with date dimension |
The analyses above are independent, but the most useful product questions combine them. A feature adoption analysis followed by a retention analysis comparing adopters vs. non-adopters tells you whether the feature actually drives retention — which is the question that matters.
What to Read Next
For the fundamentals of AI-powered data analysis, start with how to use AI to analyze your data.
For controlling analysis direction with steering prompts, see using steering prompts to control analysis direction.
For group comparison methodology — validating that segment differences are statistically meaningful — read comparing groups in your data: A/B tests, segments, and cohorts.
Ready to find your data story?
Upload a CSV and DataStoryBot will uncover the narrative in seconds.
Try DataStoryBot →