Technology — HonChen Semiconductor

● Compiler & SDK

Model Deployment Pipeline

From standard ONNX input to NPU-executable bytecode, our toolchain handles graph optimization, quantization, operator fusion, memory planning and code generation — so your team doesn't have to manually tune the model.

01 · FRONTEND

ONNX Model

Standard format input. Bring any model that exports to ONNX.

02 · FRONTEND

Graph Parser

Parse computational graph and validate operator support.

03 · OPTIMIZER

Graph Optimization

Constant folding, dead-code elimination, operator pruning.

04 · OPTIMIZER

Quantization

INT8 / FP16 mixed-precision, per-channel calibration.

05 · OPTIMIZER

Operator Fusion

Conv + BN + ReLU fused. MatMul + Add + Activation fused.

06 · BACKEND

Memory Planning

SRAM allocation, DMA scheduling, double-buffer management.

07 · BACKEND

Code Generation

NPU instruction set, multi-model runtime descriptor.

08 · DEPLOY

NPU Bytecode

Ready-to-execute binary, runs on HonChen NPU IP.

◆

Multi-Model Runtime

The same NPU runs ASR → LLM → TTS in pipeline with runtime model switching — no recompilation or chip reset.

◆

Mixed-Precision Strategy

Toolchain auto-selects INT8 vs FP16 per-layer to balance accuracy and footprint. Quantization-aware fine-tuning supported.

◆

Zero Manual Tuning

Customers feed ONNX, get NPU-ready binary. Memory layout, kernel fusion, instruction scheduling fully automated.

● Local LLM Roadmap

Why 2–4 TOPS Matters
for Future On-Device LLMs

Mixture-of-Experts models change the edge AI equation: a large model may contain tens of billions of total parameters, while activating only a much smaller subset per token. With enough memory, runtime and tool interfaces, this class of local model can move beyond chat toward agentic AI for daily-life devices.

Example MoE model Qwen3-30B-A3B

30B-class total parameters, about 3B active parameters per token.

NPU scaling target 2 TOPS / 4 TOPS

A practical roadmap for local assistants, appliances and personal robots.

System direction Local Agentic AI

Plan, use tools and act on local context without sending every request to the cloud.

01

MoE lowers active compute

Only selected experts are active for each token, so the compute path is much smaller than the full parameter count suggests. Memory still matters, but the NPU does not need to process every parameter every token.

02

From chatbot to local agent

A 30B-class MoE model can support richer reasoning than small command models. With product software and tool APIs, an appliance, mirror or robot can plan steps, remember context and execute local actions.

03

Right-sized NPU beats overkill silicon

Many lifestyle AI products do not need a datacenter-class accelerator. They need a compact 1–4 TOPS-class IP block that fits cost, power and integration limits.

● SoC Integration

How HonChen NPU IP
Drops into Your SoC

Industry-standard bus protocols. Clean clock and reset boundaries. Designed to integrate, not disrupt — your existing SoC pipeline keeps working.

Process-Independent · 55 nm → 2 nm

Multi-Precision INT8 + FP16

Multi-Model Runtime Switching

Custom AI Instruction Set

Low-Power Architecture

HonChen NPU IP

▲AXI Master Interface▲ to SoC fabric / DRAM

CNN Compute Engine

Conv2D · Depthwise · Pointwise · Pooling · Upsample · BatchNorm · Element-wise · Concat

Transformer Engine

MatMul · Multi-Head Attention · Softmax · LayerNorm · GELU · KV Cache · Embedding

Local SRAM Scratchpad

Weights / activations cache · double-buffer · streaming buffer · DMA-managed

Quantize / Activation

INT8 / FP16 · Per-channel · ReLU · Leaky ReLU · GELU · Softmax · Sigmoid · Tanh

5-Stage RISC-V Control Core

Custom AI instructions · Runtime scheduler · DMA orchestration · Multi-model switching · Clock gating

▼APB Slave·Synchronous FIFO▼ control + streaming data

AXI Master → SoC fabric / DRAM

APB Slave ← CPU control registers

Sync FIFO ↔ streaming data

Local SRAM · DDR optional

Clock + external power gating

HonChen NPU IP uses standardized interfaces that integrate directly into your existing SoC fabric. Industry-standard AXI Master for memory access, APB Slave for CPU control, and Synchronous FIFO for streaming data — no custom interconnect required.

● Verification & Delivery

Engineering You Can
Actually Verify

Real IP development discipline applied to NPU — deliverables built for real integration and production.

FPGA PROTOTYPING

Pre-silicon validation on Xilinx-class FPGA boards for full workload regression

TESTED MODELS

Whisper small / tiny · Qwen 500M / 1.7B LLM · VITS · YOLOv3-tiny · YOLOv4-tiny · Face Detection · Face Angle — continuously expanding

ONNX COMPATIBLE

Standard model format input — bring your trained model from PyTorch, TensorFlow, or any ONNX exporter

BUILT-IN SELF-TEST

BIST mode for integration debugging and post-silicon production testing — verify the IP before tape-out and after

INTEGRATION SUPPORT

On-site / remote engineering hours during customer SoC integration phase

SDK + REFERENCE

Toolchain + model porting examples + integration reference platform

DOCUMENTATION

Integration guide · register reference · performance characterization · ONNX operator support matrix

MAINTENANCE

Annual subscription for bug fixes, minor enhancements, technical consultation

● HonChen Cloud Service

Model Optimization,
Remote & Effortless

A planned cloud-assisted service for customers: upload an ONNX model, receive a NPU-optimized binary ready to deploy on HonChen IP — no local toolchain installation needed.

☁

HonChen Cloud Optimizer

★ Coming soon

How it works. Customers connect to HonChen's secured server, upload an ONNX model, and receive an optimized binary configured for their target NPU configuration. No SDK installation, no local compute, no GPU farm — just the result.

Designed for ASIC service providers and OEM partners who need to iterate on multiple model variants quickly without maintaining the toolchain themselves.

From RTL to Inference

Model Deployment Pipeline

ONNX Model

Graph Parser

Graph Optimization

Quantization

Operator Fusion

Memory Planning

Code Generation

NPU Bytecode

Multi-Model Runtime

Mixed-Precision Strategy

Zero Manual Tuning

Why 2–4 TOPS Matters
for Future On-Device LLMs

MoE lowers active compute

From chatbot to local agent

Right-sized NPU beats overkill silicon

How HonChen NPU IP
Drops into Your SoC

CNN Compute Engine

Transformer Engine

Local SRAM Scratchpad

Quantize / Activation

5-Stage RISC-V Control Core

Engineering You Can
Actually Verify

FPGA PROTOTYPING

TESTED MODELS

ONNX COMPATIBLE

BUILT-IN SELF-TEST

INTEGRATION SUPPORT

SDK + REFERENCE

DOCUMENTATION

MAINTENANCE

Model Optimization,
Remote & Effortless

HonChen Cloud Optimizer

Want to evaluate the toolchain?

From RTL to Inference

Model Deployment Pipeline

ONNX Model

Graph Parser

Graph Optimization

Quantization

Operator Fusion

Memory Planning

Code Generation

NPU Bytecode

Multi-Model Runtime

Mixed-Precision Strategy

Zero Manual Tuning

Why 2–4 TOPS Mattersfor Future On-Device LLMs

MoE lowers active compute

From chatbot to local agent

Right-sized NPU beats overkill silicon

How HonChen NPU IPDrops into Your SoC

CNN Compute Engine

Transformer Engine

Local SRAM Scratchpad

Quantize / Activation

5-Stage RISC-V Control Core

Engineering You CanActually Verify

FPGA PROTOTYPING

TESTED MODELS

ONNX COMPATIBLE

BUILT-IN SELF-TEST

INTEGRATION SUPPORT

SDK + REFERENCE

DOCUMENTATION

MAINTENANCE

Model Optimization,Remote & Effortless

HonChen Cloud Optimizer

Want to evaluate the toolchain?

Why 2–4 TOPS Matters
for Future On-Device LLMs

How HonChen NPU IP
Drops into Your SoC

Engineering You Can
Actually Verify

Model Optimization,
Remote & Effortless