# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview Burmddit is an automated AI news aggregator that scrapes English AI content, compiles related articles, translates them to Burmese using Claude API, and publishes them daily. It has two independent sub-systems: - **Backend** (`/backend`): Python pipeline — scrape → compile → translate → publish - **Frontend** (`/frontend`): Next.js 14 App Router site that reads from PostgreSQL Both connect to the same PostgreSQL database hosted on Railway. ## Commands ### Frontend ```bash cd frontend npm install npm run dev # Start dev server (localhost:3000) npm run build # Production build npm run lint # ESLint ``` ### Backend ```bash cd backend pip install -r requirements.txt # Run full pipeline (scrape + compile + translate + publish) python run_pipeline.py # Run individual stages python scraper.py python compiler.py python translator.py # Database management python init_db.py # Initialize schema python init_db.py stats # Show article/view counts python init_db.py --reset # Drop and recreate (destructive) ``` ### Required Environment Variables **Frontend** (`.env.local`): ``` DATABASE_URL=postgresql://... NEXT_PUBLIC_SITE_URL=https://burmddit.vercel.app ``` **Backend** (`.env`): ``` DATABASE_URL=postgresql://... ANTHROPIC_API_KEY=sk-ant-... ADMIN_PASSWORD=... ``` ## Architecture ### Data Flow ``` [Scraper] → raw_articles table ↓ [Compiler] → clusters related raw articles, generates compiled English articles ↓ [Translator] → calls Claude API (claude-3-5-sonnet-20241022) to produce Burmese content ↓ [Publisher] → inserts into articles table with status='published' ↓ [Frontend] → queries published_articles view via @vercel/postgres ``` ### Database Schema Key Points - `raw_articles` — scraped source content, flagged `processed=TRUE` once compiled - `articles` — final bilingual articles with both English and Burmese fields (`title`/`title_burmese`, `content`/`content_burmese`, etc.) - `published_articles` — PostgreSQL view joining `articles` + `categories`, used by frontend queries - `pipeline_logs` — tracks each stage execution for monitoring ### Frontend Architecture Next.js 14 App Router with server components querying the database directly via `@vercel/postgres` (sql template tag). No API routes — all DB access happens in server components/actions. Key pages: homepage (`app/page.tsx`), article detail (`app/[slug]/`), category listing (`app/category/`). Burmese font: Noto Sans Myanmar loaded from Google Fonts. Apply `font-burmese` Tailwind class for Burmese text. ### Backend Pipeline (`run_pipeline.py`) Orchestrates four stages sequentially. Each stage is a standalone module with a `run_*()` function. Pipeline exits early with a warning if a stage produces zero results. Logs go to both stderr and `burmddit_pipeline.log` (7-day rotation). ### Configuration (`backend/config.py`) All tunable parameters live here: - `SOURCES` — which RSS/scrape sources are enabled and their limits - `PIPELINE` — articles per day, length limits, clustering threshold - `TRANSLATION` — Claude model, temperature, technical terms to preserve in English - `PUBLISHING` — default status (`'published'` or `'draft'`), image/video extraction settings - `CATEGORY_KEYWORDS` — keyword lists for auto-detecting one of 4 categories ### Automation Daily pipeline is triggered via GitHub Actions (`.github/workflows/daily-publish.yml`) at 6 AM UTC, using `DATABASE_URL` and `ANTHROPIC_API_KEY` repository secrets. Can also be triggered manually via `workflow_dispatch`. ## Deployment - **Frontend**: Vercel (root directory: `frontend`, auto-detects Next.js) - **Backend + DB**: Railway (root directory: `backend`, start command: `python run_pipeline.py`) - **Database init**: Run `python init_db.py init` once from Railway console after first deploy