Automating Weekly Chart Reports from Live Data
Build a pipeline that pulls data, sends it to DataStoryBot's API, extracts charts, and assembles them into a PDF or email report.
Automating Weekly Chart Reports from Live Data
This article builds a pipeline that runs on a schedule, pulls fresh CSV data, sends it through the DataStoryBot API, downloads the generated chart PNGs, and assembles them into either an HTML email or a PDF. The focus is specifically on the chart extraction and assembly step — the part that trips people up when they move from interactive API calls to production automation.
If you want the broader narrative-and-email pipeline, that is covered in automating weekly data reports. And if you need to understand chart downloading mechanics in isolation, download and embed AI-generated charts covers that in depth. This article focuses on the full chart-forward pipeline: getting the charts out reliably and rendering them into a finished report document.
Pipeline Architecture
The pipeline has five stages:
[Cron / Scheduler]
→ [Pull CSV from data source]
→ [DataStoryBot API: upload → analyze → refine]
→ [Extract chart file IDs → download PNGs via /files endpoint]
→ [Assemble into HTML email or PDF]
→ [Deliver]
Each stage is a function with a clear input and output. The container lifetime is 20 minutes from upload — short enough that you cannot afford to be slow between the upload and the chart download steps. Structure the pipeline so chart downloads happen immediately after the refine call returns, before you do anything else with the data.
Prerequisites
pip install requests weasyprint markdown python-dotenv
requests— HTTP calls to the DataStoryBot APIweasyprint— HTML-to-PDF conversion (alternatively usereportlaborpdfkit)markdown— Markdown-to-HTML rendering for the narrative textpython-dotenv— environment variable management
You will also need a DATASTORYBOT_API_KEY environment variable if your account requires one. During the current open beta, the API is unauthenticated.
Step 1: Pull CSV Data from Your Source
The pipeline starts by fetching fresh data. The exact implementation depends on your source — a database, an S3 bucket, a REST API, or a file share. Here are three common patterns:
import os
import csv
import io
import boto3
import requests
from datetime import date, timedelta
def fetch_from_s3(bucket: str, key: str) -> bytes:
"""Download a CSV from S3 and return raw bytes."""
s3 = boto3.client("s3")
response = s3.get_object(Bucket=bucket, Key=key)
return response["Body"].read()
def fetch_from_postgres(dsn: str, query: str) -> bytes:
"""Run a query and return results as CSV bytes."""
import psycopg2
conn = psycopg2.connect(dsn)
cur = conn.cursor()
cur.execute(query)
rows = cur.fetchall()
headers = [desc[0] for desc in cur.description]
conn.close()
buf = io.StringIO()
writer = csv.writer(buf)
writer.writerow(headers)
writer.writerows(rows)
return buf.getvalue().encode("utf-8")
def fetch_from_url(url: str) -> bytes:
"""Fetch a CSV from an HTTP endpoint."""
response = requests.get(url, timeout=30)
response.raise_for_status()
return response.content
The return value is always raw CSV bytes. That keeps the rest of the pipeline source-agnostic.
Step 2: Upload and Analyze with DataStoryBot
This is the three-call sequence: upload the CSV, discover story angles, refine the chosen story into a narrative with charts. The key constraint is the 20-minute container TTL — once you upload, you have 20 minutes to complete the analyze and refine calls and download all chart files.
BASE_URL = "https://datastory.bot/api"
def run_analysis(csv_bytes: bytes, filename: str, steering: str = None) -> dict:
"""
Upload CSV bytes, run analyze and refine, return the full result.
Returns a dict with:
- container_id: str
- title: str
- narrative: str (Markdown)
- charts: list of {fileId, caption}
"""
# Upload
upload_resp = requests.post(
f"{BASE_URL}/upload",
files={"file": (filename, csv_bytes, "text/csv")},
timeout=60,
)
upload_resp.raise_for_status()
upload_data = upload_resp.json()
container_id = upload_data["containerId"]
print(f"Uploaded {filename}: container {container_id}")
print(f" {upload_data['metadata']['rowCount']} rows, "
f"{upload_data['metadata']['columnCount']} columns")
# Analyze — discover story angles
analyze_payload = {"containerId": container_id}
if steering:
analyze_payload["steeringPrompt"] = steering
analyze_resp = requests.post(
f"{BASE_URL}/analyze",
json=analyze_payload,
timeout=120,
)
analyze_resp.raise_for_status()
stories = analyze_resp.json()
print(f"Found {len(stories)} story angles — selecting: {stories[0]['title']}")
# Refine the top story
refine_resp = requests.post(
f"{BASE_URL}/refine",
json={
"containerId": container_id,
"selectedStoryTitle": stories[0]["title"],
},
timeout=180,
)
refine_resp.raise_for_status()
refine_data = refine_resp.json()
return {
"container_id": container_id,
"title": stories[0]["title"],
"narrative": refine_data["narrative"],
"charts": refine_data.get("charts", []),
}
The timeout values matter for unattended runs. The analyze call can take 30–90 seconds; refine can take up to 3 minutes for complex datasets. Do not use the default timeout.
Step 3: Extract Chart URLs and Download PNGs
The refine response contains charts, an array of objects with fileId and caption. The file proxy URL pattern is:
GET https://datastory.bot/api/files/{containerId}/{fileId}
Download all charts immediately after the refine call returns — before any other processing. The container TTL is 20 minutes from upload, and you have already spent some of that time on the analyze and refine calls.
import time
def download_charts(container_id: str, charts: list, output_dir: str = "/tmp") -> list:
"""
Download all chart PNGs from the file proxy.
Returns a list of dicts with:
- path: absolute path to the downloaded PNG
- caption: chart caption from the API response
- file_id: original fileId
- size_bytes: file size
"""
os.makedirs(output_dir, exist_ok=True)
results = []
for i, chart in enumerate(charts):
file_id = chart["fileId"]
url = f"{BASE_URL}/files/{container_id}/{file_id}"
# Retry up to 3 times with backoff
for attempt in range(3):
try:
resp = requests.get(url, timeout=30)
resp.raise_for_status()
break
except requests.RequestException as e:
if attempt == 2:
raise RuntimeError(
f"Failed to download chart {file_id} after 3 attempts: {e}"
)
time.sleep(2 ** attempt)
# Derive a clean filename from the caption
slug = chart["caption"][:60].lower()
slug = "".join(c if c.isalnum() or c == " " else "" for c in slug)
slug = slug.strip().replace(" ", "_")
filename = os.path.join(output_dir, f"chart_{i+1:02d}_{slug}.png")
with open(filename, "wb") as f:
f.write(resp.content)
results.append({
"path": filename,
"caption": chart["caption"],
"file_id": file_id,
"size_bytes": len(resp.content),
})
print(f" Downloaded chart {i+1}: {os.path.basename(filename)} "
f"({len(resp.content) // 1024} KB)")
return results
def run_analysis_and_download(csv_bytes, filename, steering=None, output_dir="/tmp"):
"""Full pipeline: analyze CSV, download charts, return everything needed for assembly."""
result = run_analysis(csv_bytes, filename, steering)
print(f"\nDownloading {len(result['charts'])} charts...")
downloaded = download_charts(result["container_id"], result["charts"], output_dir)
return {
"title": result["title"],
"narrative": result["narrative"],
"charts": downloaded,
"container_id": result["container_id"],
}
Note that download_charts returns the local file path, caption, and size — everything the assembly step needs. The container ID is no longer needed after this point.
Step 4: Assemble as HTML Email
For email delivery, charts embed as CID (Content-ID) attachments. This is the most compatible approach across Gmail, Outlook, and Apple Mail.
import base64
import markdown as md
from datetime import date
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.image import MIMEImage
import smtplib
def build_html_email_body(title: str, narrative: str, charts: list) -> str:
"""Convert narrative and charts into an HTML email body."""
html_narrative = md.markdown(
narrative,
extensions=["tables", "fenced_code", "nl2br"],
)
charts_section = ""
for chart in charts:
cid = chart["file_id"]
charts_section += f"""
<div style="background-color:#141414;padding:16px;border-radius:8px;margin:20px 0;">
<img src="cid:{cid}"
alt="{chart['caption']}"
width="600"
style="max-width:100%;height:auto;display:block;" />
<p style="color:#999999;font-size:13px;margin:8px 0 0 0;line-height:1.4;">
{chart['caption']}
</p>
</div>
"""
report_date = date.today().strftime("%B %d, %Y")
return f"""<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width,initial-scale=1">
</head>
<body style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;
color:#222222;max-width:680px;margin:0 auto;padding:24px 16px;">
<p style="color:#888888;font-size:13px;margin:0 0 8px 0;">{report_date}</p>
<h1 style="font-size:22px;font-weight:700;color:#111111;margin:0 0 24px 0;">
{title}
</h1>
<div style="font-size:15px;line-height:1.7;color:#333333;">
{html_narrative}
</div>
<h2 style="font-size:17px;font-weight:600;color:#111111;
margin:36px 0 16px 0;border-top:1px solid #eeeeee;padding-top:24px;">
Charts
</h2>
{charts_section}
<hr style="border:none;border-top:1px solid #eeeeee;margin:36px 0 16px 0;" />
<p style="font-size:12px;color:#aaaaaa;margin:0;">
Generated by <a href="https://datastory.bot" style="color:#aaaaaa;">DataStoryBot</a>
</p>
</body>
</html>"""
def send_email_report(
report: dict,
from_addr: str,
to_addrs: list,
smtp_host: str,
smtp_port: int = 587,
smtp_user: str = None,
smtp_password: str = None,
) -> None:
"""Send the chart report as an HTML email with inline chart attachments."""
html_body = build_html_email_body(
report["title"], report["narrative"], report["charts"]
)
msg = MIMEMultipart("related")
msg["From"] = from_addr
msg["To"] = ", ".join(to_addrs)
msg["Subject"] = f"Weekly Report: {report['title']}"
# HTML body
msg.attach(MIMEText(html_body, "html", "utf-8"))
# Inline chart attachments
for chart in report["charts"]:
with open(chart["path"], "rb") as f:
img = MIMEImage(f.read(), "png")
img.add_header("Content-ID", f"<{chart['file_id']}>")
img.add_header(
"Content-Disposition", "inline",
filename=os.path.basename(chart["path"])
)
msg.attach(img)
with smtplib.SMTP(smtp_host, smtp_port) as server:
server.ehlo()
server.starttls()
if smtp_user and smtp_password:
server.login(smtp_user, smtp_password)
server.sendmail(from_addr, to_addrs, msg.as_string())
print(f"Email sent to {', '.join(to_addrs)}")
Step 5: Assemble as PDF
For PDF output, WeasyPrint converts HTML to PDF. The same HTML template used for email works here — the only difference is that images are referenced by file path instead of cid: URLs, and the HTML is rendered to a file rather than sent over SMTP.
from weasyprint import HTML, CSS
def build_pdf_html(title: str, narrative: str, charts: list) -> str:
"""Build HTML for PDF rendering. Images reference local paths, not CIDs."""
html_narrative = md.markdown(
narrative,
extensions=["tables", "fenced_code"],
)
charts_section = ""
for chart in charts:
# WeasyPrint reads local files via file:// or absolute path
charts_section += f"""
<div class="chart-block">
<img src="file://{chart['path']}" alt="{chart['caption']}" />
<p class="caption">{chart['caption']}</p>
</div>
"""
report_date = date.today().strftime("%B %d, %Y")
return f"""<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<style>
@page {{
size: A4;
margin: 2cm 2.5cm;
}}
body {{
font-family: -apple-system, 'Helvetica Neue', Arial, sans-serif;
font-size: 11pt;
line-height: 1.6;
color: #222222;
}}
h1 {{ font-size: 18pt; font-weight: 700; margin: 0 0 6pt 0; }}
h2 {{ font-size: 14pt; font-weight: 600; margin: 18pt 0 8pt 0; }}
h3 {{ font-size: 12pt; font-weight: 600; margin: 12pt 0 6pt 0; }}
p {{ margin: 0 0 8pt 0; }}
table {{ border-collapse: collapse; width: 100%; margin: 12pt 0; font-size: 9pt; }}
th, td {{ border: 1px solid #cccccc; padding: 4pt 8pt; text-align: left; }}
th {{ background-color: #f5f5f5; font-weight: 600; }}
.date {{ color: #888888; font-size: 9pt; margin: 0 0 12pt 0; }}
.chart-block {{
background-color: #141414;
border-radius: 6pt;
padding: 12pt;
margin: 16pt 0;
page-break-inside: avoid;
}}
.chart-block img {{
max-width: 100%;
height: auto;
display: block;
}}
.caption {{
color: #999999;
font-size: 8.5pt;
margin: 6pt 0 0 0;
}}
.footer {{
margin-top: 24pt;
padding-top: 12pt;
border-top: 1pt solid #eeeeee;
font-size: 8pt;
color: #aaaaaa;
}}
</style>
</head>
<body>
<p class="date">{report_date}</p>
<h1>{title}</h1>
{html_narrative}
<h2>Charts</h2>
{charts_section}
<div class="footer">Generated by DataStoryBot — datastory.bot</div>
</body>
</html>"""
def export_pdf(report: dict, output_path: str) -> str:
"""Render the report to a PDF file. Returns the output path."""
html_string = build_pdf_html(
report["title"], report["narrative"], report["charts"]
)
HTML(string=html_string).write_pdf(output_path)
size_kb = os.path.getsize(output_path) // 1024
print(f"PDF written: {output_path} ({size_kb} KB)")
return output_path
A typical report with three charts produces a PDF between 800 KB and 2 MB depending on chart complexity. WeasyPrint handles page-break-inside: avoid on chart blocks, so charts do not split across pages.
Step 6: The Complete Pipeline Script
Putting all the pieces together into a single script:
#!/usr/bin/env python3
"""
weekly_chart_report.py — automated weekly chart report pipeline.
Usage:
python weekly_chart_report.py --mode email
python weekly_chart_report.py --mode pdf --output /tmp/report.pdf
"""
import argparse
import os
import sys
import tempfile
from datetime import date
from dotenv import load_dotenv
load_dotenv()
# Configuration from environment
CSV_SOURCE = os.environ["REPORT_CSV_SOURCE"] # s3://bucket/key or https:// URL
STEERING = os.environ.get("REPORT_STEERING", "")
SMTP_HOST = os.environ.get("SMTP_HOST", "smtp.gmail.com")
SMTP_PORT = int(os.environ.get("SMTP_PORT", "587"))
SMTP_USER = os.environ.get("SMTP_USER")
SMTP_PASS = os.environ.get("SMTP_PASSWORD")
FROM_EMAIL = os.environ.get("REPORT_FROM")
TO_EMAILS = os.environ.get("REPORT_TO", "").split(",")
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--mode", choices=["email", "pdf"], default="pdf")
parser.add_argument("--output", default=f"/tmp/report_{date.today()}.pdf")
args = parser.parse_args()
# 1. Fetch CSV
print("Fetching CSV data...")
if CSV_SOURCE.startswith("s3://"):
parts = CSV_SOURCE[5:].split("/", 1)
csv_bytes = fetch_from_s3(parts[0], parts[1])
filename = parts[1].split("/")[-1]
else:
csv_bytes = fetch_from_url(CSV_SOURCE)
filename = CSV_SOURCE.split("/")[-1] or "data.csv"
print(f"Fetched {len(csv_bytes) // 1024} KB from {CSV_SOURCE}")
# 2. Analyze and download charts into a temp directory
with tempfile.TemporaryDirectory(prefix="dsbcharts_") as tmpdir:
report = run_analysis_and_download(
csv_bytes,
filename,
steering=STEERING or None,
output_dir=tmpdir,
)
print(f"\nAnalysis complete: '{report['title']}'")
print(f"{len(report['charts'])} charts downloaded")
# 3. Assemble and deliver
if args.mode == "email":
send_email_report(
report,
from_addr=FROM_EMAIL,
to_addrs=[e.strip() for e in TO_EMAILS if e.strip()],
smtp_host=SMTP_HOST,
smtp_port=SMTP_PORT,
smtp_user=SMTP_USER,
smtp_password=SMTP_PASS,
)
else:
export_pdf(report, args.output)
print(f"\nReport saved to: {args.output}")
if __name__ == "__main__":
main()
The temporary directory ensures chart PNGs are cleaned up after delivery. For PDF mode, the PDF is written to --output before the temp directory is deleted.
Step 7: Schedule with Cron or a Cloud Scheduler
For a Linux/macOS server, add a crontab entry:
# Edit crontab
crontab -e
# Every Monday at 7:00 AM UTC — PDF mode, log to file
0 7 * * 1 /usr/bin/python3 /opt/reports/weekly_chart_report.py \
--mode pdf \
--output /opt/reports/output/report_$(date +\%Y-\%m-\%d).pdf \
>> /var/log/weekly_chart_report.log 2>&1
For cloud environments:
GitHub Actions — add a scheduled workflow:
# .github/workflows/weekly-report.yml
name: Weekly Chart Report
on:
schedule:
- cron: '0 7 * * 1'
workflow_dispatch: # allow manual runs
jobs:
generate-report:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install dependencies
run: pip install requests weasyprint markdown python-dotenv
- name: Generate and send report
env:
REPORT_CSV_SOURCE: ${{ secrets.REPORT_CSV_SOURCE }}
REPORT_STEERING: ${{ secrets.REPORT_STEERING }}
SMTP_HOST: ${{ secrets.SMTP_HOST }}
SMTP_USER: ${{ secrets.SMTP_USER }}
SMTP_PASSWORD: ${{ secrets.SMTP_PASSWORD }}
REPORT_FROM: ${{ secrets.REPORT_FROM }}
REPORT_TO: ${{ secrets.REPORT_TO }}
run: python weekly_chart_report.py --mode email
AWS — use EventBridge + Lambda or ECS. For Lambda, the WeasyPrint dependency requires a layer or a container image because of its native library dependencies (libpango, libcairo). ECS is simpler: build a Docker image with the dependencies installed and run it on a schedule.
Google Cloud Scheduler — trigger a Cloud Run job. Cloud Run containers have enough memory and CPU for WeasyPrint without special configuration.
Error Handling
Three failure modes matter for unattended operation:
Container TTL expiry — If more than 20 minutes pass between the upload and a file download, you will get a 404. This should not happen if the pipeline runs sequentially, but it can happen if an intermediate step hangs. Detect it:
resp = requests.get(url, timeout=30)
if resp.status_code == 404:
raise RuntimeError(
f"Container {container_id} expired before chart download completed. "
"Re-run the full pipeline."
)
resp.raise_for_status()
Empty chart list — occasionally the refine call returns no charts (the model determined the story did not need them). Handle it gracefully rather than crashing the assembly step:
if not result["charts"]:
print("Warning: no charts generated. Report will be narrative-only.")
SMTP or PDF write failures — these happen after the analysis is complete, so the work is not lost. Catch delivery failures separately and surface them loudly:
try:
send_email_report(report, ...)
except Exception as e:
# Log the error with context
print(f"Delivery failed: {e}", file=sys.stderr)
# Save the PDF as a fallback
export_pdf(report, f"/tmp/report_unsent_{date.today()}.pdf")
raise
Persisting Charts for Audit Trails
For regulated environments or anywhere you need to reproduce a past report, save the chart PNGs and narrative to durable storage before the temp directory is deleted:
import boto3
import json
def persist_report_to_s3(report: dict, bucket: str, prefix: str) -> dict:
"""Upload charts and narrative to S3 for archival."""
s3 = boto3.client("s3")
today = date.today().isoformat()
stored_charts = []
for chart in report["charts"]:
key = f"{prefix}/{today}/{os.path.basename(chart['path'])}"
s3.upload_file(
chart["path"], bucket, key,
ExtraArgs={"ContentType": "image/png"}
)
stored_charts.append({
"s3_key": key,
"caption": chart["caption"],
"file_id": chart["file_id"],
})
print(f"Archived: s3://{bucket}/{key}")
# Also store the narrative as JSON
meta_key = f"{prefix}/{today}/report_metadata.json"
s3.put_object(
Bucket=bucket,
Key=meta_key,
Body=json.dumps({
"title": report["title"],
"narrative": report["narrative"],
"charts": stored_charts,
"generated_at": today,
}).encode("utf-8"),
ContentType="application/json",
)
return stored_charts
Call persist_report_to_s3 inside the with tempfile.TemporaryDirectory(...) block, before the context manager exits and deletes the files.
What to Read Next
For the mechanics of chart file downloading in detail — including base64 embedding, CID attachments, and optimization — see download and embed AI-generated charts.
For automatic chart generation from CSV without the full report pipeline, see how to generate charts from CSV data automatically.
For the broader automated reporting pipeline including narrative delivery and multi-audience segmentation, see automating weekly data reports.
Ready to find your data story?
Upload a CSV and DataStoryBot will uncover the narrative in seconds.
Try DataStoryBot →