Building Production AI Systems

LLM Applications, RAG Pipelines & Document Intelligence — Hong Kong · Remote-Friendly

Most businesses I work with are not short of data. They are short of a system that turns that data into something actionable without requiring someone to manually read through it every day. That is the gap I build for.

LLM Applications RAG Pipelines Document Intelligence Hong Kong Remote Engagements Production Systems

Background

I am an independent AI engineer based in Hong Kong and mainland China, available for remote engagements globally. My background spans 7+ years in data and analytics roles, 2+ years building and deploying production AI systems, and a Master's in Business Analytics and AI from Vlerick Business School (Belgium).

The systems I build are end-to-end: data ingestion, LLM processing, retrieval architecture, delivery layer, and infrastructure. Not prototypes. Production systems with real users and real operational requirements.

What I Build

RAG Pipelines and Document Q&A

Organisations that handle large volumes of documents — policies, reports, contracts, filings — typically face the same bottleneck: the information exists but is not queryable. Finding a specific detail means opening files manually.

A production RAG pipeline changes this. Documents are ingested, chunked, embedded, and stored in a vector database. The result is a system where a natural language question returns a precise answer drawn from the full document archive — regardless of volume or age of the files.

Deployed in production for the Hong Kong insurance market: PDF ingestion across multiple providers, semantic retrieval via pgvector, and a conversational interface for cross-document comparison queries.

Scheduled Data Pipelines and Automated Summaries

The same underlying architecture — scheduled ingestion, LLM processing, structured output — applies to monitoring and summarising information sources on a recurring basis. Configurable inputs, relevance filtering, and delivery via Telegram or email. The pipeline runs on a schedule; the output arrives without manual effort.

This is applicable wherever a team currently has someone manually reading sources and summarising them.

Memory-Enabled AI Systems

I also build AI applications where context persistence matters — systems that remember user history, learn from interactions, and retrieve relevant prior context at inference time.

HKSoka (hksoka.com/en) is a Claude-powered chat platform I designed and built end-to-end. Its core differentiator is a multi-layer RAG memory architecture:

Seed MemoryLong-term user context injected at conversation start

Learned MemoryFacts extracted automatically from conversations, embedded via AWS Lambda, retrieved via pgvector

Embedding PipelineAWS Lambda asynchronous embedding architecture decoupled from serverless request handling to avoid timeout constraints

Auto-Critical MemoryBackground promotion of high-importance facts to always-on injection

Technical SpecsBilingual content, 150K context window token management, Vercel serverless + Neon PostgreSQL + AWS Lambda

This architecture is directly applicable to any business context where continuity across sessions matters — client relationship management, ongoing advisory workflows, or knowledge base assistants.

Infrastructure and Deployment

Production systems require more than good model calls. The infrastructure I build includes:

ComputeAWS Lambda asynchronous embedding pipelines

OrchestrationEventBridge scheduled jobs

DatabaseNeon PostgreSQL + pgvector

Application LayerVercel serverless deployment and API hosting

NotificationsTelegram alerting and scheduled notifications

ControlsAudit logging and moderation layers where required

ModelsClaude, production LLM APIs

ML PipelinesSignal generation, live AWS deployment

I have also built and maintained a production ML trading pipeline with automated signal generation, multi-seed validation, and live deployment on AWS — including full infrastructure ownership from data ingestion to model deployment.

Engagement Model

I work on a project basis. Deliverables are scoped clearly before commencement. I deliver working systems, not slide decks.

For businesses looking to automate document workflows, build internal knowledge bases, or deploy LLM-powered tools into existing operations — I am available for scoping calls.

Contact: smartai.hk+ai.consulting@proton.me
LinkedIn: linkedin.com/in/levi-innovation

Levi is an independent AI engineer based in Hong Kong, building production-grade LLM applications, RAG pipelines, and document intelligence systems for SMEs pursuing AI digitalization internationally, working remotely.

Get in Touch → More enterprise case studies →