Files
burmddit/PIPELINE-AUTOMATION-SETUP.md
2026-02-20 02:41:34 +00:00

4.8 KiB

Burmddit Pipeline Automation Setup

Status: READY (Waiting for Anthropic API Key)

Date: 2026-02-20 Setup by: Modo

What's Done

1. Database Connected

  • Host: 172.26.13.68:5432
  • Database: burmddit
  • Status: Connected successfully
  • Current Articles: 87 published (from Feb 19)
  • Tables: 10 (complete schema)

2. Dependencies Installed

✅ psycopg2-binary - PostgreSQL driver
✅ python-dotenv - Environment variables
✅ loguru - Logging
✅ beautifulsoup4 - Web scraping
✅ requests - HTTP requests
✅ feedparser - RSS feeds
✅ newspaper3k - Article extraction
✅ anthropic - Claude API client

3. Configuration Files Created

  • /backend/.env - Environment variables (DATABASE_URL configured)
  • /run-daily-pipeline.sh - Automation script (executable)
  • /.credentials - Secure credentials storage

4. Website Status

  • burmddit.com is LIVE
  • Articles displaying correctly
  • Categories working (fixed yesterday)
  • Tags working
  • Frontend pulling from database successfully

What's Needed

Anthropic API Key

Required for: Article translation (English → Burmese)

How to get:

  1. Go to https://console.anthropic.com/
  2. Sign up for free account
  3. Get API key from dashboard
  4. Paste key into /backend/.env file:
    ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxxx
    

Cost:

  • Free: $5 credit (enough for ~150 articles)
  • Paid: $15/month for 900 articles (30/day)

Automation Setup (Once API Key Added)

Cron Job Configuration

Add to crontab (crontab -e):

# Burmddit Daily Content Pipeline
# Runs at 9:00 AM Singapore time (UTC+8) = 1:00 AM UTC
0 1 * * * /home/ubuntu/.openclaw/workspace/burmddit/run-daily-pipeline.sh

This will:

  1. Scrape 200-300 articles from 8 AI news sources
  2. Cluster similar articles together
  3. Compile 3-5 sources into 30 comprehensive articles
  4. Translate to casual Burmese using Claude
  5. Extract 5 images + 3 videos per article
  6. Publish automatically to burmddit.com

Manual Test Run

Before automation, test the pipeline:

cd /home/ubuntu/.openclaw/workspace/burmddit/backend
python3 run_pipeline.py

Expected output:

✅ Scraped 250 articles from 8 sources
✅ Clustered into 35 topics
✅ Compiled 30 articles (3-5 sources each)
✅ Translated 30 articles to Burmese
✅ Published 30 articles

Time: ~90 minutes

Pipeline Configuration

Current settings in backend/config.py:

PIPELINE = {
    'articles_per_day': 30,
    'min_article_length': 600,
    'max_article_length': 1000,
    'sources_per_article': 3,
    'clustering_threshold': 0.6,
    'research_time_minutes': 90,
}

8 News Sources:

  1. Medium (8 AI tags)
  2. TechCrunch AI
  3. VentureBeat AI
  4. MIT Technology Review
  5. The Verge AI
  6. Wired AI
  7. Ars Technica
  8. Hacker News (AI/ChatGPT)

Logs & Monitoring

Logs location: /home/ubuntu/.openclaw/workspace/burmddit/logs/

  • Format: pipeline-YYYY-MM-DD.log
  • Retention: 30 days

Check logs:

tail -f /home/ubuntu/.openclaw/workspace/burmddit/logs/pipeline-$(date +%Y-%m-%d).log

Check database:

cd /home/ubuntu/.openclaw/workspace/burmddit/backend
python3 -c "
import psycopg2
from dotenv import load_dotenv
import os

load_dotenv()
conn = psycopg2.connect(os.getenv('DATABASE_URL'))
cur = conn.cursor()

cur.execute('SELECT COUNT(*) FROM articles WHERE status = %s', ('published',))
print(f'Published articles: {cur.fetchone()[0]}')

cur.execute('SELECT MAX(published_at) FROM articles')
print(f'Latest article: {cur.fetchone()[0]}')

cur.close()
conn.close()
"

Troubleshooting

Issue: Translation fails

Solution: Check Anthropic API key in .env file

Issue: Scraping fails

Solution: Check internet connection, source websites may be down

Issue: Database connection fails

Solution: Verify DATABASE_URL in .env file

Issue: No new articles

Solution: Check logs for errors, increase articles_per_day in config

Next Steps (Once API Key Added)

  1. Add API key to .env
  2. Test manual run: python3 run_pipeline.py
  3. Verify articles published
  4. Set up cron job
  5. Monitor first automated run
  6. Weekly check: article quality, view counts

Revenue Target

Goal: $5,000/month by Month 12

Strategy:

  • Month 3: Google AdSense application (need 50+ articles/month )
  • Month 6: Affiliate partnerships
  • Month 9: Sponsored content
  • Month 12: Premium features

Current Progress:

  • 87 articles published
  • Categories + tags working
  • SEO-optimized
  • Automation pending (API key)

Contact

Questions? Ping Modo on Telegram or modo@xyz-pulse.com


Status: Waiting for Anthropic API key to complete setup ETA to Full Automation: 10 minutes after API key provided