# Burmddit Pipeline Automation Setup ## Status: ⏳ READY (Waiting for Anthropic API Key) Date: 2026-02-20 Setup by: Modo ## What's Done ✅ ### 1. Database Connected - **Host:** 172.26.13.68:5432 - **Database:** burmddit - **Status:** ✅ Connected successfully - **Current Articles:** 87 published (from Feb 19) - **Tables:** 10 (complete schema) ### 2. Dependencies Installed ```bash ✅ psycopg2-binary - PostgreSQL driver ✅ python-dotenv - Environment variables ✅ loguru - Logging ✅ beautifulsoup4 - Web scraping ✅ requests - HTTP requests ✅ feedparser - RSS feeds ✅ newspaper3k - Article extraction ✅ anthropic - Claude API client ``` ### 3. Configuration Files Created - ✅ `/backend/.env` - Environment variables (DATABASE_URL configured) - ✅ `/run-daily-pipeline.sh` - Automation script (executable) - ✅ `/.credentials` - Secure credentials storage ### 4. Website Status - ✅ burmddit.com is LIVE - ✅ Articles displaying correctly - ✅ Categories working (fixed yesterday) - ✅ Tags working - ✅ Frontend pulling from database successfully ## What's Needed ❌ ### Anthropic API Key **Required for:** Article translation (English → Burmese) **How to get:** 1. Go to https://console.anthropic.com/ 2. Sign up for free account 3. Get API key from dashboard 4. Paste key into `/backend/.env` file: ```bash ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxxx ``` **Cost:** - Free: $5 credit (enough for ~150 articles) - Paid: $15/month for 900 articles (30/day) ## Automation Setup (Once API Key Added) ### Cron Job Configuration Add to crontab (`crontab -e`): ```bash # Burmddit Daily Content Pipeline # Runs at 9:00 AM Singapore time (UTC+8) = 1:00 AM UTC 0 1 * * * /home/ubuntu/.openclaw/workspace/burmddit/run-daily-pipeline.sh ``` This will: 1. **Scrape** 200-300 articles from 8 AI news sources 2. **Cluster** similar articles together 3. **Compile** 3-5 sources into 30 comprehensive articles 4. **Translate** to casual Burmese using Claude 5. **Extract** 5 images + 3 videos per article 6. **Publish** automatically to burmddit.com ### Manual Test Run Before automation, test the pipeline: ```bash cd /home/ubuntu/.openclaw/workspace/burmddit/backend python3 run_pipeline.py ``` Expected output: ``` ✅ Scraped 250 articles from 8 sources ✅ Clustered into 35 topics ✅ Compiled 30 articles (3-5 sources each) ✅ Translated 30 articles to Burmese ✅ Published 30 articles ``` Time: ~90 minutes ## Pipeline Configuration Current settings in `backend/config.py`: ```python PIPELINE = { 'articles_per_day': 30, 'min_article_length': 600, 'max_article_length': 1000, 'sources_per_article': 3, 'clustering_threshold': 0.6, 'research_time_minutes': 90, } ``` ### 8 News Sources: 1. Medium (8 AI tags) 2. TechCrunch AI 3. VentureBeat AI 4. MIT Technology Review 5. The Verge AI 6. Wired AI 7. Ars Technica 8. Hacker News (AI/ChatGPT) ## Logs & Monitoring **Logs location:** `/home/ubuntu/.openclaw/workspace/burmddit/logs/` - Format: `pipeline-YYYY-MM-DD.log` - Retention: 30 days **Check logs:** ```bash tail -f /home/ubuntu/.openclaw/workspace/burmddit/logs/pipeline-$(date +%Y-%m-%d).log ``` **Check database:** ```bash cd /home/ubuntu/.openclaw/workspace/burmddit/backend python3 -c " import psycopg2 from dotenv import load_dotenv import os load_dotenv() conn = psycopg2.connect(os.getenv('DATABASE_URL')) cur = conn.cursor() cur.execute('SELECT COUNT(*) FROM articles WHERE status = %s', ('published',)) print(f'Published articles: {cur.fetchone()[0]}') cur.execute('SELECT MAX(published_at) FROM articles') print(f'Latest article: {cur.fetchone()[0]}') cur.close() conn.close() " ``` ## Troubleshooting ### Issue: Translation fails **Solution:** Check Anthropic API key in `.env` file ### Issue: Scraping fails **Solution:** Check internet connection, source websites may be down ### Issue: Database connection fails **Solution:** Verify DATABASE_URL in `.env` file ### Issue: No new articles **Solution:** Check logs for errors, increase `articles_per_day` in config ## Next Steps (Once API Key Added) 1. ✅ Add API key to `.env` 2. ✅ Test manual run: `python3 run_pipeline.py` 3. ✅ Verify articles published 4. ✅ Set up cron job 5. ✅ Monitor first automated run 6. ✅ Weekly check: article quality, view counts ## Revenue Target **Goal:** $5,000/month by Month 12 **Strategy:** - Month 3: Google AdSense application (need 50+ articles/month ✅) - Month 6: Affiliate partnerships - Month 9: Sponsored content - Month 12: Premium features **Current Progress:** - ✅ 87 articles published - ✅ Categories + tags working - ✅ SEO-optimized - ⏳ Automation pending (API key) ## Contact **Questions?** Ping Modo on Telegram or modo@xyz-pulse.com --- **Status:** ⏳ Waiting for Anthropic API key to complete setup **ETA to Full Automation:** 10 minutes after API key provided