✅ Trigger redeploy: Category pages + Quality control

2026-02-20 02:41:34 +00:00
parent 785910b81d
commit f9c1c1ea10
5 changed files with 756 additions and 0 deletions
--- a/PIPELINE-AUTOMATION-SETUP.md
+++ b/PIPELINE-AUTOMATION-SETUP.md
@@ -0,0 +1,204 @@
+# Burmddit Pipeline Automation Setup
+
+## Status: ⏳ READY (Waiting for Anthropic API Key)
+
+Date: 2026-02-20
+Setup by: Modo
+
+## What's Done ✅
+
+### 1. Database Connected
+- **Host:** 172.26.13.68:5432
+- **Database:** burmddit
+- **Status:** ✅ Connected successfully
+- **Current Articles:** 87 published (from Feb 19)
+- **Tables:** 10 (complete schema)
+
+### 2. Dependencies Installed
+```bash
+✅ psycopg2-binary - PostgreSQL driver
+✅ python-dotenv - Environment variables
+✅ loguru - Logging
+✅ beautifulsoup4 - Web scraping
+✅ requests - HTTP requests
+✅ feedparser - RSS feeds
+✅ newspaper3k - Article extraction
+✅ anthropic - Claude API client
+```
+
+### 3. Configuration Files Created
+- ✅ `/backend/.env` - Environment variables (DATABASE_URL configured)
+- ✅ `/run-daily-pipeline.sh` - Automation script (executable)
+- ✅ `/.credentials` - Secure credentials storage
+
+### 4. Website Status
+- ✅ burmddit.com is LIVE
+- ✅ Articles displaying correctly
+- ✅ Categories working (fixed yesterday)
+- ✅ Tags working
+- ✅ Frontend pulling from database successfully
+
+## What's Needed ❌
+
+### Anthropic API Key
+**Required for:** Article translation (English → Burmese)
+
+**How to get:**
+1. Go to https://console.anthropic.com/
+2. Sign up for free account
+3. Get API key from dashboard
+4. Paste key into `/backend/.env` file:
+   ```bash
+   ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxxx
+   ```
+
+**Cost:**
+- Free: $5 credit (enough for ~150 articles)
+- Paid: $15/month for 900 articles (30/day)
+
+## Automation Setup (Once API Key Added)
+
+### Cron Job Configuration
+
+Add to crontab (`crontab -e`):
+
+```bash
+# Burmddit Daily Content Pipeline
+# Runs at 9:00 AM Singapore time (UTC+8) = 1:00 AM UTC
+0 1 * * * /home/ubuntu/.openclaw/workspace/burmddit/run-daily-pipeline.sh
+```
+
+This will:
+1. **Scrape** 200-300 articles from 8 AI news sources
+2. **Cluster** similar articles together
+3. **Compile** 3-5 sources into 30 comprehensive articles
+4. **Translate** to casual Burmese using Claude
+5. **Extract** 5 images + 3 videos per article
+6. **Publish** automatically to burmddit.com
+
+### Manual Test Run
+
+Before automation, test the pipeline:
+
+```bash
+cd /home/ubuntu/.openclaw/workspace/burmddit/backend
+python3 run_pipeline.py
+```
+
+Expected output:
+```
+✅ Scraped 250 articles from 8 sources
+✅ Clustered into 35 topics
+✅ Compiled 30 articles (3-5 sources each)
+✅ Translated 30 articles to Burmese
+✅ Published 30 articles
+```
+
+Time: ~90 minutes
+
+## Pipeline Configuration
+
+Current settings in `backend/config.py`:
+
+```python
+PIPELINE = {
+    'articles_per_day': 30,
+    'min_article_length': 600,
+    'max_article_length': 1000,
+    'sources_per_article': 3,
+    'clustering_threshold': 0.6,
+    'research_time_minutes': 90,
+}
+```
+
+### 8 News Sources:
+1. Medium (8 AI tags)
+2. TechCrunch AI
+3. VentureBeat AI
+4. MIT Technology Review
+5. The Verge AI
+6. Wired AI
+7. Ars Technica
+8. Hacker News (AI/ChatGPT)
+
+## Logs & Monitoring
+
+**Logs location:** `/home/ubuntu/.openclaw/workspace/burmddit/logs/`
+- Format: `pipeline-YYYY-MM-DD.log`
+- Retention: 30 days
+
+**Check logs:**
+```bash
+tail -f /home/ubuntu/.openclaw/workspace/burmddit/logs/pipeline-$(date +%Y-%m-%d).log
+```
+
+**Check database:**
+```bash
+cd /home/ubuntu/.openclaw/workspace/burmddit/backend
+python3 -c "
+import psycopg2
+from dotenv import load_dotenv
+import os
+
+load_dotenv()
+conn = psycopg2.connect(os.getenv('DATABASE_URL'))
+cur = conn.cursor()
+
+cur.execute('SELECT COUNT(*) FROM articles WHERE status = %s', ('published',))
+print(f'Published articles: {cur.fetchone()[0]}')
+
+cur.execute('SELECT MAX(published_at) FROM articles')
+print(f'Latest article: {cur.fetchone()[0]}')
+
+cur.close()
+conn.close()
+"
+```
+
+## Troubleshooting
+
+### Issue: Translation fails
+**Solution:** Check Anthropic API key in `.env` file
+
+### Issue: Scraping fails
+**Solution:** Check internet connection, source websites may be down
+
+### Issue: Database connection fails
+**Solution:** Verify DATABASE_URL in `.env` file
+
+### Issue: No new articles
+**Solution:** Check logs for errors, increase `articles_per_day` in config
+
+## Next Steps (Once API Key Added)
+
+1. ✅ Add API key to `.env`
+2. ✅ Test manual run: `python3 run_pipeline.py`
+3. ✅ Verify articles published
+4. ✅ Set up cron job
+5. ✅ Monitor first automated run
+6. ✅ Weekly check: article quality, view counts
+
+## Revenue Target
+
+**Goal:** $5,000/month by Month 12
+
+**Strategy:**
+- Month 3: Google AdSense application (need 50+ articles/month ✅)
+- Month 6: Affiliate partnerships
+- Month 9: Sponsored content
+- Month 12: Premium features
+
+**Current Progress:**
+- ✅ 87 articles published
+- ✅ Categories + tags working
+- ✅ SEO-optimized
+- ⏳ Automation pending (API key)
+
+## Contact
+
+**Questions?** Ping Modo on Telegram or modo@xyz-pulse.com
+
+---
+
+**Status:** ⏳ Waiting for Anthropic API key to complete setup
+**ETA to Full Automation:** 10 minutes after API key provided