forked from minzeyaphyo/burmddit
✅ Trigger redeploy: Category pages + Quality control
This commit is contained in:
204
PIPELINE-AUTOMATION-SETUP.md
Normal file
204
PIPELINE-AUTOMATION-SETUP.md
Normal file
@@ -0,0 +1,204 @@
|
||||
# Burmddit Pipeline Automation Setup
|
||||
|
||||
## Status: ⏳ READY (Waiting for Anthropic API Key)
|
||||
|
||||
Date: 2026-02-20
|
||||
Setup by: Modo
|
||||
|
||||
## What's Done ✅
|
||||
|
||||
### 1. Database Connected
|
||||
- **Host:** 172.26.13.68:5432
|
||||
- **Database:** burmddit
|
||||
- **Status:** ✅ Connected successfully
|
||||
- **Current Articles:** 87 published (from Feb 19)
|
||||
- **Tables:** 10 (complete schema)
|
||||
|
||||
### 2. Dependencies Installed
|
||||
```bash
|
||||
✅ psycopg2-binary - PostgreSQL driver
|
||||
✅ python-dotenv - Environment variables
|
||||
✅ loguru - Logging
|
||||
✅ beautifulsoup4 - Web scraping
|
||||
✅ requests - HTTP requests
|
||||
✅ feedparser - RSS feeds
|
||||
✅ newspaper3k - Article extraction
|
||||
✅ anthropic - Claude API client
|
||||
```
|
||||
|
||||
### 3. Configuration Files Created
|
||||
- ✅ `/backend/.env` - Environment variables (DATABASE_URL configured)
|
||||
- ✅ `/run-daily-pipeline.sh` - Automation script (executable)
|
||||
- ✅ `/.credentials` - Secure credentials storage
|
||||
|
||||
### 4. Website Status
|
||||
- ✅ burmddit.com is LIVE
|
||||
- ✅ Articles displaying correctly
|
||||
- ✅ Categories working (fixed yesterday)
|
||||
- ✅ Tags working
|
||||
- ✅ Frontend pulling from database successfully
|
||||
|
||||
## What's Needed ❌
|
||||
|
||||
### Anthropic API Key
|
||||
**Required for:** Article translation (English → Burmese)
|
||||
|
||||
**How to get:**
|
||||
1. Go to https://console.anthropic.com/
|
||||
2. Sign up for free account
|
||||
3. Get API key from dashboard
|
||||
4. Paste key into `/backend/.env` file:
|
||||
```bash
|
||||
ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxxx
|
||||
```
|
||||
|
||||
**Cost:**
|
||||
- Free: $5 credit (enough for ~150 articles)
|
||||
- Paid: $15/month for 900 articles (30/day)
|
||||
|
||||
## Automation Setup (Once API Key Added)
|
||||
|
||||
### Cron Job Configuration
|
||||
|
||||
Add to crontab (`crontab -e`):
|
||||
|
||||
```bash
|
||||
# Burmddit Daily Content Pipeline
|
||||
# Runs at 9:00 AM Singapore time (UTC+8) = 1:00 AM UTC
|
||||
0 1 * * * /home/ubuntu/.openclaw/workspace/burmddit/run-daily-pipeline.sh
|
||||
```
|
||||
|
||||
This will:
|
||||
1. **Scrape** 200-300 articles from 8 AI news sources
|
||||
2. **Cluster** similar articles together
|
||||
3. **Compile** 3-5 sources into 30 comprehensive articles
|
||||
4. **Translate** to casual Burmese using Claude
|
||||
5. **Extract** 5 images + 3 videos per article
|
||||
6. **Publish** automatically to burmddit.com
|
||||
|
||||
### Manual Test Run
|
||||
|
||||
Before automation, test the pipeline:
|
||||
|
||||
```bash
|
||||
cd /home/ubuntu/.openclaw/workspace/burmddit/backend
|
||||
python3 run_pipeline.py
|
||||
```
|
||||
|
||||
Expected output:
|
||||
```
|
||||
✅ Scraped 250 articles from 8 sources
|
||||
✅ Clustered into 35 topics
|
||||
✅ Compiled 30 articles (3-5 sources each)
|
||||
✅ Translated 30 articles to Burmese
|
||||
✅ Published 30 articles
|
||||
```
|
||||
|
||||
Time: ~90 minutes
|
||||
|
||||
## Pipeline Configuration
|
||||
|
||||
Current settings in `backend/config.py`:
|
||||
|
||||
```python
|
||||
PIPELINE = {
|
||||
'articles_per_day': 30,
|
||||
'min_article_length': 600,
|
||||
'max_article_length': 1000,
|
||||
'sources_per_article': 3,
|
||||
'clustering_threshold': 0.6,
|
||||
'research_time_minutes': 90,
|
||||
}
|
||||
```
|
||||
|
||||
### 8 News Sources:
|
||||
1. Medium (8 AI tags)
|
||||
2. TechCrunch AI
|
||||
3. VentureBeat AI
|
||||
4. MIT Technology Review
|
||||
5. The Verge AI
|
||||
6. Wired AI
|
||||
7. Ars Technica
|
||||
8. Hacker News (AI/ChatGPT)
|
||||
|
||||
## Logs & Monitoring
|
||||
|
||||
**Logs location:** `/home/ubuntu/.openclaw/workspace/burmddit/logs/`
|
||||
- Format: `pipeline-YYYY-MM-DD.log`
|
||||
- Retention: 30 days
|
||||
|
||||
**Check logs:**
|
||||
```bash
|
||||
tail -f /home/ubuntu/.openclaw/workspace/burmddit/logs/pipeline-$(date +%Y-%m-%d).log
|
||||
```
|
||||
|
||||
**Check database:**
|
||||
```bash
|
||||
cd /home/ubuntu/.openclaw/workspace/burmddit/backend
|
||||
python3 -c "
|
||||
import psycopg2
|
||||
from dotenv import load_dotenv
|
||||
import os
|
||||
|
||||
load_dotenv()
|
||||
conn = psycopg2.connect(os.getenv('DATABASE_URL'))
|
||||
cur = conn.cursor()
|
||||
|
||||
cur.execute('SELECT COUNT(*) FROM articles WHERE status = %s', ('published',))
|
||||
print(f'Published articles: {cur.fetchone()[0]}')
|
||||
|
||||
cur.execute('SELECT MAX(published_at) FROM articles')
|
||||
print(f'Latest article: {cur.fetchone()[0]}')
|
||||
|
||||
cur.close()
|
||||
conn.close()
|
||||
"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Issue: Translation fails
|
||||
**Solution:** Check Anthropic API key in `.env` file
|
||||
|
||||
### Issue: Scraping fails
|
||||
**Solution:** Check internet connection, source websites may be down
|
||||
|
||||
### Issue: Database connection fails
|
||||
**Solution:** Verify DATABASE_URL in `.env` file
|
||||
|
||||
### Issue: No new articles
|
||||
**Solution:** Check logs for errors, increase `articles_per_day` in config
|
||||
|
||||
## Next Steps (Once API Key Added)
|
||||
|
||||
1. ✅ Add API key to `.env`
|
||||
2. ✅ Test manual run: `python3 run_pipeline.py`
|
||||
3. ✅ Verify articles published
|
||||
4. ✅ Set up cron job
|
||||
5. ✅ Monitor first automated run
|
||||
6. ✅ Weekly check: article quality, view counts
|
||||
|
||||
## Revenue Target
|
||||
|
||||
**Goal:** $5,000/month by Month 12
|
||||
|
||||
**Strategy:**
|
||||
- Month 3: Google AdSense application (need 50+ articles/month ✅)
|
||||
- Month 6: Affiliate partnerships
|
||||
- Month 9: Sponsored content
|
||||
- Month 12: Premium features
|
||||
|
||||
**Current Progress:**
|
||||
- ✅ 87 articles published
|
||||
- ✅ Categories + tags working
|
||||
- ✅ SEO-optimized
|
||||
- ⏳ Automation pending (API key)
|
||||
|
||||
## Contact
|
||||
|
||||
**Questions?** Ping Modo on Telegram or modo@xyz-pulse.com
|
||||
|
||||
---
|
||||
|
||||
**Status:** ⏳ Waiting for Anthropic API key to complete setup
|
||||
**ETA to Full Automation:** 10 minutes after API key provided
|
||||
Reference in New Issue
Block a user