burmddit/PIPELINE-AUTOMATION-SETUP.md

# Burmddit Pipeline Automation Setup

## Status: ⏳ READY (Waiting for Anthropic API Key)

Date: 2026-02-20
Setup by: Modo

## What's Done ✅

### 1. Database Connected
- **Host:** 172.26.13.68:5432
- **Database:** burmddit
- **Status:** ✅ Connected successfully
- **Current Articles:** 87 published (from Feb 19)
- **Tables:** 10 (complete schema)

### 2. Dependencies Installed
```bash
✅ psycopg2-binary - PostgreSQL driver
✅ python-dotenv - Environment variables
✅ loguru - Logging
✅ beautifulsoup4 - Web scraping
✅ requests - HTTP requests
✅ feedparser - RSS feeds
✅ newspaper3k - Article extraction
✅ anthropic - Claude API client
```

### 3. Configuration Files Created
- ✅ `/backend/.env` - Environment variables (DATABASE_URL configured)
- ✅ `/run-daily-pipeline.sh` - Automation script (executable)
- ✅ `/.credentials` - Secure credentials storage

### 4. Website Status
- ✅ burmddit.com is LIVE
- ✅ Articles displaying correctly
- ✅ Categories working (fixed yesterday)
- ✅ Tags working
- ✅ Frontend pulling from database successfully

## What's Needed ❌

### Anthropic API Key
**Required for:** Article translation (English → Burmese)

**How to get:**
1. Go to https://console.anthropic.com/
2. Sign up for free account
3. Get API key from dashboard
4. Paste key into `/backend/.env` file:
   ```bash
   ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxxx
   ```

**Cost:**
- Free: $5 credit (enough for ~150 articles)
- Paid: $15/month for 900 articles (30/day)

## Automation Setup (Once API Key Added)

### Cron Job Configuration

Add to crontab (`crontab -e`):

```bash
# Burmddit Daily Content Pipeline
# Runs at 9:00 AM Singapore time (UTC+8) = 1:00 AM UTC
0 1 * * * /home/ubuntu/.openclaw/workspace/burmddit/run-daily-pipeline.sh
```

This will:
1. **Scrape** 200-300 articles from 8 AI news sources
2. **Cluster** similar articles together
3. **Compile** 3-5 sources into 30 comprehensive articles
4. **Translate** to casual Burmese using Claude
5. **Extract** 5 images + 3 videos per article
6. **Publish** automatically to burmddit.com

### Manual Test Run

Before automation, test the pipeline:

```bash
cd /home/ubuntu/.openclaw/workspace/burmddit/backend
python3 run_pipeline.py
```

Expected output:
```
✅ Scraped 250 articles from 8 sources
✅ Clustered into 35 topics
✅ Compiled 30 articles (3-5 sources each)
✅ Translated 30 articles to Burmese
✅ Published 30 articles
```

Time: ~90 minutes

## Pipeline Configuration

Current settings in `backend/config.py`:

```python
PIPELINE = {
    'articles_per_day': 30,
    'min_article_length': 600,
    'max_article_length': 1000,
    'sources_per_article': 3,
    'clustering_threshold': 0.6,
    'research_time_minutes': 90,
}
```

### 8 News Sources:
1. Medium (8 AI tags)
2. TechCrunch AI
3. VentureBeat AI
4. MIT Technology Review
5. The Verge AI
6. Wired AI
7. Ars Technica
8. Hacker News (AI/ChatGPT)

## Logs & Monitoring

**Logs location:** `/home/ubuntu/.openclaw/workspace/burmddit/logs/`
- Format: `pipeline-YYYY-MM-DD.log`
- Retention: 30 days

**Check logs:**
```bash
tail -f /home/ubuntu/.openclaw/workspace/burmddit/logs/pipeline-$(date +%Y-%m-%d).log
```

**Check database:**
```bash
cd /home/ubuntu/.openclaw/workspace/burmddit/backend
python3 -c "
import psycopg2
from dotenv import load_dotenv
import os

load_dotenv()
conn = psycopg2.connect(os.getenv('DATABASE_URL'))
cur = conn.cursor()

cur.execute('SELECT COUNT(*) FROM articles WHERE status = %s', ('published',))
print(f'Published articles: {cur.fetchone()[0]}')

cur.execute('SELECT MAX(published_at) FROM articles')
print(f'Latest article: {cur.fetchone()[0]}')

cur.close()
conn.close()
"
```

## Troubleshooting

### Issue: Translation fails
**Solution:** Check Anthropic API key in `.env` file

### Issue: Scraping fails
**Solution:** Check internet connection, source websites may be down

### Issue: Database connection fails
**Solution:** Verify DATABASE_URL in `.env` file

### Issue: No new articles
**Solution:** Check logs for errors, increase `articles_per_day` in config

## Next Steps (Once API Key Added)

1. ✅ Add API key to `.env`
2. ✅ Test manual run: `python3 run_pipeline.py`
3. ✅ Verify articles published
4. ✅ Set up cron job
5. ✅ Monitor first automated run
6. ✅ Weekly check: article quality, view counts

## Revenue Target

**Goal:** $5,000/month by Month 12

**Strategy:**
- Month 3: Google AdSense application (need 50+ articles/month ✅)
- Month 6: Affiliate partnerships
- Month 9: Sponsored content
- Month 12: Premium features

**Current Progress:**
- ✅ 87 articles published
- ✅ Categories + tags working
- ✅ SEO-optimized
- ⏳ Automation pending (API key)

## Contact

**Questions?** Ping Modo on Telegram or modo@xyz-pulse.com

---

**Status:** ⏳ Waiting for Anthropic API key to complete setup
**ETA to Full Automation:** 10 minutes after API key provided