0a9588abb7
- Add pagination support to findAll (page, limit query params) - Add findByTemplateId method to service - Add GET /by-template/:templateId endpoint to controller - Service already includes CRUD for QuestionBank and QuestionBankItem
203 lines
7.0 KiB
Markdown
203 lines
7.0 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Project Overview
|
|
|
|
Simple Knowledge Base is a full-stack RAG (Retrieval-Augmented Generation) Q&A system built with React 19 + NestJS. It's a monorepo with Japanese/Chinese documentation but English code.
|
|
|
|
**Key Features:**
|
|
- Multi-model support (OpenAI-compatible APIs + Google Gemini native SDK)
|
|
- Dual processing modes: Fast (Tika text-only) and High-precision (Vision pipeline)
|
|
- User isolation with JWT authentication and per-user knowledge bases
|
|
- Hybrid search (vector + keyword) with Elasticsearch
|
|
- Multi-language interface (Japanese, Chinese, English)
|
|
- Streaming responses via Server-Sent Events (SSE)
|
|
|
|
## Development Setup
|
|
|
|
### Prerequisites
|
|
- Node.js 18+
|
|
- Yarn
|
|
- Docker & Docker Compose
|
|
|
|
### Initial Setup
|
|
```bash
|
|
# Install dependencies
|
|
yarn install
|
|
|
|
# Start infrastructure services
|
|
docker-compose up -d elasticsearch tika libreoffice
|
|
|
|
# Configure environment
|
|
cp server/.env.sample server/.env
|
|
# Edit server/.env with API keys and configuration
|
|
```
|
|
|
|
### Development Commands
|
|
```bash
|
|
# Start both frontend and backend in development mode
|
|
yarn dev
|
|
|
|
# Frontend only (port 13001)
|
|
cd web && yarn dev
|
|
|
|
# Backend only (port 3001)
|
|
cd server && yarn start:dev
|
|
|
|
# Run tests
|
|
cd server && yarn test
|
|
cd server && yarn test:e2e
|
|
|
|
# Lint and format
|
|
cd server && yarn lint
|
|
cd server && yarn format
|
|
```
|
|
|
|
### Docker Services
|
|
- **Elasticsearch**: 9200 (vector storage)
|
|
- **Apache Tika**: 9998 (document text extraction)
|
|
- **LibreOffice Server**: 8100 (document conversion)
|
|
- **Backend API**: 3001
|
|
- **Frontend**: 13001 (dev), 80/443 (production via nginx)
|
|
|
|
## Architecture
|
|
|
|
### Project Structure
|
|
```
|
|
simple-kb/
|
|
├── web/ # React frontend (Vite)
|
|
│ ├── components/ # UI components (ChatInterface, ConfigPanel, etc.)
|
|
│ ├── contexts/ # React Context providers
|
|
│ ├── services/ # API client services
|
|
│ └── utils/ # Utility functions
|
|
├── server/ # NestJS backend
|
|
│ ├── src/
|
|
│ │ ├── ai/ # AI services (embedding, etc.)
|
|
│ │ ├── api/ # API module
|
|
│ │ ├── auth/ # JWT authentication
|
|
│ │ ├── chat/ # Chat/RAG module
|
|
│ │ ├── elasticsearch/ # Elasticsearch integration
|
|
│ │ ├── import-task/ # Import task management
|
|
│ │ ├── knowledge-base/# Knowledge base management
|
|
│ │ ├── libreoffice/ # LibreOffice integration
|
|
│ │ ├── model-config/ # Model configuration management
|
|
│ │ ├── vision/ # Vision model integration
|
|
│ │ └── vision-pipeline/# Vision pipeline orchestration
|
|
│ ├── data/ # SQLite database storage
|
|
│ ├── uploads/ # Uploaded files storage
|
|
│ └── temp/ # Temporary files
|
|
├── docs/ # Comprehensive documentation (Japanese/Chinese)
|
|
├── nginx/ # Nginx configuration
|
|
├── libreoffice-server/ # LibreOffice conversion service (Python/FastAPI)
|
|
└── docker-compose.yml # Docker orchestration
|
|
```
|
|
|
|
### Key Architectural Concepts
|
|
|
|
**Dual Processing Modes:**
|
|
1. **Fast Mode**: Apache Tika for text-only extraction (quick, no API cost)
|
|
2. **High-Precision Mode**: Vision Pipeline (LibreOffice → PDF → Images → Vision Model) for mixed image/text documents (slower, incurs API costs)
|
|
|
|
**Multi-Model Support:**
|
|
- OpenAI-compatible APIs (OpenAI, DeepSeek, Claude, etc.)
|
|
- Google Gemini native SDK
|
|
- Configurable LLM, Embedding, and Rerank models
|
|
|
|
**RAG System:**
|
|
- Hybrid search (vector + keyword) with Elasticsearch
|
|
- Streaming responses via Server-Sent Events (SSE)
|
|
- Source citation and similarity scoring
|
|
- Chunk configuration (size, overlap)
|
|
|
|
## Code Standards
|
|
|
|
### Language Requirements
|
|
- **Code comments must be in English**
|
|
- **Log messages must be in English**
|
|
- **Error messages must support internationalization** to enable multi-language frontend interface
|
|
- **API response messages must support internationalization** to enable multi-language frontend interface
|
|
- Interface supports Japanese, Chinese, and English
|
|
|
|
### Testing
|
|
- Backend uses Jest for unit and e2e tests
|
|
- Frontend currently has no test framework configured
|
|
- Run tests: `cd server && yarn test` or `yarn test:e2e`
|
|
|
|
### Code Quality
|
|
- ESLint and Prettier configured for backend
|
|
- Format code: `cd server && yarn format`
|
|
- Lint code: `cd server && yarn lint`
|
|
|
|
## Common Development Tasks
|
|
|
|
### Adding a New API Endpoint
|
|
1. Create controller in appropriate module under `server/src/`
|
|
2. Add service methods with English comments
|
|
3. Update DTOs and validation
|
|
4. Add tests in `*.spec.ts` files
|
|
|
|
### Adding a New Frontend Component
|
|
1. Create component in `web/components/`
|
|
2. Add TypeScript interfaces in `web/types.ts`
|
|
3. Use Tailwind CSS for styling
|
|
4. Connect to backend services in `web/services/`
|
|
|
|
### Debugging
|
|
- Backend logs are in Chinese
|
|
- Check Elasticsearch: `curl http://localhost:9200/_cat/indices`
|
|
- Check Tika: `curl http://localhost:9998/tika`
|
|
- Check LibreOffice: `curl http://localhost:8100/health`
|
|
|
|
## Environment Configuration
|
|
|
|
Key environment variables (`server/.env`):
|
|
- `OPENAI_API_KEY`: OpenAI-compatible API key
|
|
- `GEMINI_API_KEY`: Google Gemini API key
|
|
- `ELASTICSEARCH_HOST`: Elasticsearch URL (default: http://localhost:9200)
|
|
- `TIKA_HOST`: Apache Tika URL (default: http://localhost:9998)
|
|
- `LIBREOFFICE_URL`: LibreOffice server URL (default: http://localhost:8100)
|
|
- `JWT_SECRET`: JWT signing secret
|
|
|
|
## Deployment
|
|
|
|
### Development
|
|
```bash
|
|
docker-compose up -d elasticsearch tika libreoffice
|
|
yarn dev
|
|
```
|
|
|
|
### Production
|
|
```bash
|
|
docker-compose up -d # Builds and starts all services
|
|
```
|
|
|
|
### Ports in Production
|
|
- Frontend: 80/443 (via nginx)
|
|
- Backend API: 3001 (proxied through nginx)
|
|
- Elasticsearch: 9200
|
|
- Tika: 9998
|
|
- LibreOffice: 8100
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
1. **Elasticsearch not starting**: Check memory limits in docker-compose.yml
|
|
2. **File upload failures**: Ensure `uploads/` and `temp/` directories exist with proper permissions
|
|
3. **Vision pipeline errors**: Verify LibreOffice server is running and accessible
|
|
4. **API key errors**: Check environment variables in `server/.env`
|
|
|
|
### Database Management
|
|
- SQLite database: `server/data/metadata.db`
|
|
- Elasticsearch indices: Managed automatically by the application
|
|
- To reset: Delete `server/data/metadata.db` and Elasticsearch data volume
|
|
|
|
## Documentation
|
|
|
|
- **README.md**: Project overview in Japanese
|
|
- **docs/**: Comprehensive documentation (mostly Japanese/Chinese)
|
|
- **DESIGN.md**: System architecture and design
|
|
- **API.md**: API reference
|
|
- **DEVELOPMENT_STANDARDS.md**: Mandates English comments/logs and internationalized messages
|
|
|
|
When modifying code, always add English comments and logs as required by development standards. Error and UI messages must be properly internationalized. The project has extensive existing documentation in Japanese/Chinese - refer to `docs/` directory for detailed technical information. |