Levi · LinkedIn · 2026-06-25

Why AI Projects Fail in Hong Kong: Three Procurement Traps Only an Engineer Can See

It is not the model. It is not the budget. The most common reasons AI projects fail in Hong Kong come from an engineer's perspective — structural problems that procurement decision-makers cannot see at proposal stage, but that can be avoided in advance. This article is not meant to discourage AI deployment; it is meant to help you ask the right questions before signing.

AI Project Management Procurement Hong Kong Enterprise Engineering Perspective

Defining failure

The "failure" referred to in this article is not a system crash or complete unusability. Most AI project failures are more subtle: the system goes live but no one actually uses it; or problems appear after three months of operation and the cost of fixing them turns out to be very high; or when the business tries to scale, the system architecture simply cannot support it.

What these failures have in common is that they are completely invisible at the proposal approval stage, but identifiable as structural problems when an engineer conducts a design review.

Trap 1: The system answers correctly but does not handle "I don't know"

During the demo, the vendor shows the system answering questions accurately. Procurement approves. After go-live, the system starts exhibiting "confidently wrong" behaviour — for questions outside the knowledge base scope, the system generates answers that sound plausible but are factually incorrect.

This is not a model problem. It is a system design problem. A production-grade RAG system needs to explicitly handle "confidence boundaries": when the relevance of retrieval results falls below a certain threshold, the system should respond with "no sufficiently relevant information was found in the available data," rather than forcing a generated answer.

In HKSoka systems and client deliveries, confidence boundaries are a component of the design, not a patch added after the fact. This design has a cost in token terms (an additional relevance scoring step is required), but it is necessary for reliability.

Ask before signing: "When the system cannot find relevant information in the knowledge base, how does it respond? Please show me a question your system cannot answer."

Trap 2: Cost estimates are based on the demo environment, not production

The operating cost of an AI system comes primarily from LLM API calls. Demo environments are typically tested with small data volumes, short conversations, and low frequency. Production involves real users, complete documents, and high-frequency calls — the cost difference can exceed ten times.

A more specific problem lies in context window management. A system with poor context management sends the full conversation history to the LLM on every turn — the longer the conversation, the higher the cost and the slower the response.

Measured data from the HKSoka system: through token-based context window management (replacing message-count truncation), a conversation summary pipeline, and prompt caching, per-turn costs dropped 54% for short conversations and 80–90% for long conversations. These are not theoretical optimisations — they are actual measurement figures from a production system.

This type of optimisation must be completed at the architecture design stage, not remediated after go-live — because modifying context management logic post-launch requires changing how the entire conversation history is processed.

Ask before signing: "Please provide a cost estimation model: based on how many users, average conversation turns, and document volume. If user count triples, how much does cost increase? What cost control measures does the system use?"

Trap 3: Single LLM vendor dependency with no fallback design

The APIs of OpenAI, Anthropic, and Google all have recorded service outages. If your AI system is completely dependent on a single vendor, a service outage is equivalent to a system outage.

This risk is entirely absent from many procurement documents, because vendors habitually only demo systems running normally. Yet production environments need to account for failure scenarios.

In a system delivered to a Hong Kong shipping company, a multi-LLM fallback chain was part of the foundational design: when the primary vendor experiences an outage, the system automatically switches to a backup vendor, transparently to the user. Which vendor serves as primary is determined by empirical scoring on three dimensions — translation consistency, hallucination rate, and output formatting — not by brand.

Another hidden dependency: if a system is deeply reliant on proprietary functions of a specific vendor (such as a vendor-specific API format), the migration cost will be extremely high if that vendor adjusts pricing or policy in future.

Ask before signing: "If the LLM vendor you use has a service outage today, how does the system respond? Is there a fallback mechanism? Does the system deeply depend on non-standard functions of any specific vendor?"

A supplementary observation: framework dependency versus native API

Many AI systems are built on abstraction frameworks such as LangChain or LlamaIndex. These frameworks speed up development, but introduce a hidden problem: the framework itself has versions, bugs, and feature deprecations. The system indirectly depends on a third-party codebase it cannot control.

Systems built directly on native LLM APIs have shorter call chains, lower latency, more direct debugging, and immunity to framework version changes. The trade-off is that the developer must implement the orchestration logic themselves — but this logic is controllable and auditable.

This is not an absolute claim that native API is always superior to a framework. The point is: after asking "what framework did you build with?", also ask "if this framework is deprecated, what impact would that have on your system?"

A complete pre-procurement checklist

Ask the vendor to show a question the system cannot answer; observe how confidence boundaries are handled
Ask for a production-environment cost estimation model, not a demo-environment quote
Clarify the context window management strategy
Confirm the LLM vendor fallback mechanism
Confirm whether the system is deeply dependent on non-standard functions of a single vendor
Understand the framework used in construction and its version dependency risk
Confirm the system has audit logs — problems should be traceable to specific conversations or components

One final question

The issues described in this article share a common characteristic: they are questions an engineer considers during a design review, but they do not typically appear in a sales proposal.

The corresponding procurement strategy is to conduct a technical review before receiving formal proposals — evaluating vendor design decisions from an engineering perspective rather than using sales materials as the basis for assessment. The cost of this review is far lower than the cost of discovering problems after go-live.

Looking to conduct an independent technical review before signing? An independent AI project technical assessment is available — reviewing vendor architecture design from an engineering perspective, not as a compliance audit commissioned by the vendor.

Levi is an independent AI engineer based in Hong Kong, building production-grade LLM applications, RAG pipelines, and document intelligence systems for SMEs pursuing AI digitalization internationally, working remotely.

Get in touch → More enterprise case studies →