Frontend changes: - Add /admin dashboard for article management - Add AdminButton component (Alt+Shift+A on articles) - Add /api/admin/article API endpoints Backend improvements: - scraper_v2.py: Multi-layer fallback extraction (newspaper → trafilatura → readability) - translator_v2.py: Better chunking, repetition detection, validation - admin_tools.py: CLI admin commands - test_scraper.py: Individual source testing Docs: - WEB-ADMIN-GUIDE.md: Web admin usage - ADMIN-GUIDE.md: CLI admin usage - SCRAPER-IMPROVEMENT-PLAN.md: Scraper fixes details - TRANSLATION-FIX.md: Translation improvements - ADMIN-FEATURES-SUMMARY.md: Implementation summary Fixes: - Article scraping from 0 → 96+ articles working - Translation quality issues (repetition, truncation) - Added 13 new RSS sources
6.2 KiB
🚀 Burmddit: Next Steps (START HERE)
Created: 2026-02-26
Priority: 🔥 CRITICAL
Status: Action Required
🎯 The Problem
burmddit.com is broken:
- ❌ 0 articles scraped in the last 5 days
- ❌ Stuck at 87 articles (last update: Feb 21)
- ❌ All 8 news sources failing
- ❌ Pipeline runs daily but produces nothing
Root cause: newspaper3k library failures + scraping errors
✅ What I've Done (Last 30 minutes)
1. Research & Analysis
- ✅ Identified all scraper errors from logs
- ✅ Researched 100+ AI news RSS feeds
- ✅ Found 22 high-quality new sources to add
2. Planning Documents Created
- ✅
SCRAPER-IMPROVEMENT-PLAN.md- Detailed technical plan - ✅
BURMDDIT-TASKS.md- Day-by-day task tracker - ✅
NEXT-STEPS.md- This file (action plan)
3. Monitoring Scripts Created
- ✅
scripts/check-pipeline-health.sh- Quick health check - ✅
scripts/source-stats.py- Source performance stats - ✅ Updated
HEARTBEAT.md- Auto-monitoring every 2 hours
🔥 What Needs to Happen Next (Priority Order)
TODAY (Next 4 hours)
1. Install dependencies (5 min)
cd /home/ubuntu/.openclaw/workspace/burmddit/backend
pip3 install trafilatura readability-lxml fake-useragent lxml_html_clean
2. Create improved scraper (2 hours)
- File:
backend/scraper_v2.py - Features:
- Multi-method extraction (newspaper → trafilatura → beautifulsoup)
- User agent rotation
- Better error handling
- Retry logic with exponential backoff
3. Test individual sources (1 hour)
- Create
test_source.pyscript - Test each of 8 existing sources
- Identify which ones work
4. Update config (10 min)
- Disable broken sources
- Keep only working ones
5. Test run (90 min)
cd /home/ubuntu/.openclaw/workspace/burmddit/backend
python3 run_pipeline.py
- Target: At least 10 articles scraped
- If successful → deploy for tomorrow's cron
TOMORROW (Day 2)
Morning:
- Check overnight cron results
- Fix any new errors
Afternoon:
- Add 5 high-priority new sources:
- OpenAI Blog
- Anthropic Blog
- Hugging Face Blog
- Google AI Blog
- MarkTechPost
- Test evening run (target: 25+ articles)
DAY 3
- Add remaining 17 new sources (30 total)
- Full test with all sources
- Verify monitoring works
DAYS 4-7 (If time permits)
- Parallel scraping (reduce runtime 90min → 40min)
- Source health scoring
- Image extraction improvements
- Translation quality enhancements
📋 Key Files to Review
Planning Docs
-
SCRAPER-IMPROVEMENT-PLAN.md- Full technical plan- Current issues explained
- 22 new RSS sources listed
- Implementation details
- Success metrics
-
BURMDDIT-TASKS.md- Task tracker- Day-by-day breakdown
- Checkboxes for tracking progress
- Daily checklist
- Success criteria
Code Files (To Be Created)
backend/scraper_v2.py- New scraper (URGENT)backend/test_source.py- Source testerscripts/check-pipeline-health.sh- Health monitor ✅ (done)scripts/source-stats.py- Stats reporter ✅ (done)
Config Files
backend/config.py- Source configurationbackend/.env- Environment variables (API keys)
🎯 Success Criteria
Immediate (Today)
- ✅ At least 10 articles scraped in test run
- ✅ At least 3 sources working
- ✅ Pipeline completes without crashing
Day 3
- ✅ 30+ sources configured
- ✅ 40+ articles scraped per run
- ✅ <5% error rate
Week 1
- ✅ 30-40 articles published daily
- ✅ 25/30 sources active
- ✅ 95%+ pipeline success rate
- ✅ Automatic monitoring working
🚨 Critical Path
BLOCKER: Scraper must be fixed TODAY for tomorrow's 1 AM UTC cron run.
Timeline:
- Now → +2h: Build
scraper_v2.py - +2h → +3h: Test sources
- +3h → +4.5h: Full pipeline test
- +4.5h: Deploy if successful
If delayed, website stays broken for another day = lost traffic.
📊 New Sources to Add (Top 10)
These are the highest-quality sources to prioritize:
- OpenAI Blog -
https://openai.com/blog/rss/ - Anthropic Blog -
https://www.anthropic.com/rss - Hugging Face -
https://huggingface.co/blog/feed.xml - Google AI -
http://googleaiblog.blogspot.com/atom.xml - MarkTechPost -
https://www.marktechpost.com/feed/ - The Rundown AI -
https://rss.beehiiv.com/feeds/2R3C6Bt5wj.xml - Last Week in AI -
https://lastweekin.ai/feed - Analytics India Magazine -
https://analyticsindiamag.com/feed/ - AI News -
https://www.artificialintelligence-news.com/feed/rss/ - KDnuggets -
https://www.kdnuggets.com/feed
(Full list of 22 sources in SCRAPER-IMPROVEMENT-PLAN.md)
🤖 Automatic Monitoring
I've set up automatic health checks:
-
Heartbeat monitoring (every 2 hours)
- Runs
scripts/check-pipeline-health.sh - Alerts if: zero articles, high errors, or stale pipeline
- Runs
-
Daily checklist (9 AM Singapore time)
- Check overnight cron results
- Review errors
- Update task tracker
- Report status
You'll be notified automatically if:
- Pipeline fails
- Article count drops below 10
- Error rate exceeds 50
- No run in 36+ hours
💬 Questions to Decide
-
Should I start building
scraper_v2.pynow?- Or do you want to review the plan first?
-
Do you want to add all 22 sources at once, or gradually?
- Recommendation: Start with top 10, then expand
-
Should I deploy the fix automatically or ask first?
- Recommendation: Test first, then ask before deploying
-
Priority: Speed or perfection?
- Option A: Quick fix (2-4 hours, basic functionality)
- Option B: Proper rebuild (1-2 days, all optimizations)
📞 Contact
Owner: Zeya Phyo
Developer: Bob
Deadline: ASAP (ideally today)
Current time: 2026-02-26 08:30 UTC (4:30 PM Singapore)
🚀 Ready to Start?
Recommended action: Let me start building scraper_v2.py now.
Command to kick off:
Yes, start fixing the scraper now
Or if you want to review the plan first:
Show me the technical details of scraper_v2.py first
All planning documents are ready. Just need your go-ahead to execute. 🎯