Files

Min Zeya Phyo f0146c311c Add CLAUDE.md with project guidance for Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-23 10:51:05 +06:30

3.9 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Burmddit is an automated AI news aggregator that scrapes English AI content, compiles related articles, translates them to Burmese using Claude API, and publishes them daily. It has two independent sub-systems:

Backend (/backend): Python pipeline — scrape → compile → translate → publish
Frontend (/frontend): Next.js 14 App Router site that reads from PostgreSQL

Both connect to the same PostgreSQL database hosted on Railway.

Commands

Frontend

cd frontend
npm install
npm run dev      # Start dev server (localhost:3000)
npm run build    # Production build
npm run lint     # ESLint

Backend

cd backend
pip install -r requirements.txt

# Run full pipeline (scrape + compile + translate + publish)
python run_pipeline.py

# Run individual stages
python scraper.py
python compiler.py
python translator.py

# Database management
python init_db.py          # Initialize schema
python init_db.py stats    # Show article/view counts
python init_db.py --reset  # Drop and recreate (destructive)

Required Environment Variables

Frontend (.env.local):

DATABASE_URL=postgresql://...
NEXT_PUBLIC_SITE_URL=https://burmddit.vercel.app

Backend (.env):

DATABASE_URL=postgresql://...
ANTHROPIC_API_KEY=sk-ant-...
ADMIN_PASSWORD=...

Architecture

Data Flow

[Scraper] → raw_articles table
    ↓
[Compiler] → clusters related raw articles, generates compiled English articles
    ↓
[Translator] → calls Claude API (claude-3-5-sonnet-20241022) to produce Burmese content
    ↓
[Publisher] → inserts into articles table with status='published'
    ↓
[Frontend] → queries published_articles view via @vercel/postgres

Database Schema Key Points

raw_articles — scraped source content, flagged processed=TRUE once compiled
articles — final bilingual articles with both English and Burmese fields (title/title_burmese, content/content_burmese, etc.)
published_articles — PostgreSQL view joining articles + categories, used by frontend queries
pipeline_logs — tracks each stage execution for monitoring

Frontend Architecture

Next.js 14 App Router with server components querying the database directly via @vercel/postgres (sql template tag). No API routes — all DB access happens in server components/actions.

Key pages: homepage (app/page.tsx), article detail (app/[slug]/), category listing (app/category/).

Burmese font: Noto Sans Myanmar loaded from Google Fonts. Apply font-burmese Tailwind class for Burmese text.

Backend Pipeline (`run_pipeline.py`)

Orchestrates four stages sequentially. Each stage is a standalone module with a run_*() function. Pipeline exits early with a warning if a stage produces zero results. Logs go to both stderr and burmddit_pipeline.log (7-day rotation).

Configuration (`backend/config.py`)

All tunable parameters live here:

SOURCES — which RSS/scrape sources are enabled and their limits
PIPELINE — articles per day, length limits, clustering threshold
TRANSLATION — Claude model, temperature, technical terms to preserve in English
PUBLISHING — default status ('published' or 'draft'), image/video extraction settings
CATEGORY_KEYWORDS — keyword lists for auto-detecting one of 4 categories

Automation

Daily pipeline is triggered via GitHub Actions (.github/workflows/daily-publish.yml) at 6 AM UTC, using DATABASE_URL and ANTHROPIC_API_KEY repository secrets. Can also be triggered manually via workflow_dispatch.

Deployment

Frontend: Vercel (root directory: frontend, auto-detects Next.js)
Backend + DB: Railway (root directory: backend, start command: python run_pipeline.py)
Database init: Run python init_db.py init once from Railway console after first deploy

3.9 KiB Raw Blame History