3.9 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Burmddit is an automated AI news aggregator that scrapes English AI content, compiles related articles, translates them to Burmese using Claude API, and publishes them daily. It has two independent sub-systems:
- Backend (
/backend): Python pipeline — scrape → compile → translate → publish - Frontend (
/frontend): Next.js 14 App Router site that reads from PostgreSQL
Both connect to the same PostgreSQL database hosted on Railway.
Commands
Frontend
cd frontend
npm install
npm run dev # Start dev server (localhost:3000)
npm run build # Production build
npm run lint # ESLint
Backend
cd backend
pip install -r requirements.txt
# Run full pipeline (scrape + compile + translate + publish)
python run_pipeline.py
# Run individual stages
python scraper.py
python compiler.py
python translator.py
# Database management
python init_db.py # Initialize schema
python init_db.py stats # Show article/view counts
python init_db.py --reset # Drop and recreate (destructive)
Required Environment Variables
Frontend (.env.local):
DATABASE_URL=postgresql://...
NEXT_PUBLIC_SITE_URL=https://burmddit.vercel.app
Backend (.env):
DATABASE_URL=postgresql://...
ANTHROPIC_API_KEY=sk-ant-...
ADMIN_PASSWORD=...
Architecture
Data Flow
[Scraper] → raw_articles table
↓
[Compiler] → clusters related raw articles, generates compiled English articles
↓
[Translator] → calls Claude API (claude-3-5-sonnet-20241022) to produce Burmese content
↓
[Publisher] → inserts into articles table with status='published'
↓
[Frontend] → queries published_articles view via @vercel/postgres
Database Schema Key Points
raw_articles— scraped source content, flaggedprocessed=TRUEonce compiledarticles— final bilingual articles with both English and Burmese fields (title/title_burmese,content/content_burmese, etc.)published_articles— PostgreSQL view joiningarticles+categories, used by frontend queriespipeline_logs— tracks each stage execution for monitoring
Frontend Architecture
Next.js 14 App Router with server components querying the database directly via @vercel/postgres (sql template tag). No API routes — all DB access happens in server components/actions.
Key pages: homepage (app/page.tsx), article detail (app/[slug]/), category listing (app/category/).
Burmese font: Noto Sans Myanmar loaded from Google Fonts. Apply font-burmese Tailwind class for Burmese text.
Backend Pipeline (run_pipeline.py)
Orchestrates four stages sequentially. Each stage is a standalone module with a run_*() function. Pipeline exits early with a warning if a stage produces zero results. Logs go to both stderr and burmddit_pipeline.log (7-day rotation).
Configuration (backend/config.py)
All tunable parameters live here:
SOURCES— which RSS/scrape sources are enabled and their limitsPIPELINE— articles per day, length limits, clustering thresholdTRANSLATION— Claude model, temperature, technical terms to preserve in EnglishPUBLISHING— default status ('published'or'draft'), image/video extraction settingsCATEGORY_KEYWORDS— keyword lists for auto-detecting one of 4 categories
Automation
Daily pipeline is triggered via GitHub Actions (.github/workflows/daily-publish.yml) at 6 AM UTC, using DATABASE_URL and ANTHROPIC_API_KEY repository secrets. Can also be triggered manually via workflow_dispatch.
Deployment
- Frontend: Vercel (root directory:
frontend, auto-detects Next.js) - Backend + DB: Railway (root directory:
backend, start command:python run_pipeline.py) - Database init: Run
python init_db.py initonce from Railway console after first deploy