Technology

From RTL to Inference

The engineering inside HonChen NPU IP — hand-crafted hardware, automated software toolchain, and a cloud-assisted optimization service designed to make integration painless.

Compiler & SDK

Model Deployment Pipeline

From standard ONNX input to NPU-executable bytecode, our toolchain handles graph optimization, quantization, operator fusion, memory planning and code generation — so your team doesn't have to manually tune the model.

01 · FRONTEND

ONNX Model

Standard format input. Bring any model that exports to ONNX.

02 · FRONTEND

Graph Parser

Parse computational graph and validate operator support.

03 · OPTIMIZER

Graph Optimization

Constant folding, dead-code elimination, operator pruning.

04 · OPTIMIZER

Quantization

INT8 / FP16 mixed-precision, per-channel calibration.

05 · OPTIMIZER

Operator Fusion

Conv + BN + ReLU fused. MatMul + Add + Activation fused.

06 · BACKEND

Memory Planning

SRAM allocation, DMA scheduling, double-buffer management.

07 · BACKEND

Code Generation

NPU instruction set, multi-model runtime descriptor.

08 · DEPLOY

NPU Bytecode

Ready-to-execute binary, runs on HonChen NPU IP.

Multi-Model Runtime

The same NPU runs ASR → LLM → TTS in pipeline with runtime model switching — no recompilation or chip reset.

Mixed-Precision Strategy

Toolchain auto-selects INT8 vs FP16 per-layer to balance accuracy and footprint. Quantization-aware fine-tuning supported.

Zero Manual Tuning

Customers feed ONNX, get NPU-ready binary. Memory layout, kernel fusion, instruction scheduling fully automated.

Why 2–4 TOPS Matters
for Future On-Device LLMs

Mixture-of-Experts models change the edge AI equation: a large model may contain tens of billions of total parameters, while activating only a much smaller subset per token. With enough memory, runtime and tool interfaces, this class of local model can move beyond chat toward agentic AI for daily-life devices.

Example MoE model Qwen3-30B-A3B

30B-class total parameters, about 3B active parameters per token.

NPU scaling target 2 TOPS / 4 TOPS

A practical roadmap for local assistants, appliances and personal robots.

System direction Local Agentic AI

Plan, use tools and act on local context without sending every request to the cloud.

Local LLM edge AI devices
01

MoE lowers active compute

Only selected experts are active for each token, so the compute path is much smaller than the full parameter count suggests. Memory still matters, but the NPU does not need to process every parameter every token.

02

From chatbot to local agent

A 30B-class MoE model can support richer reasoning than small command models. With product software and tool APIs, an appliance, mirror or robot can plan steps, remember context and execute local actions.

03

Right-sized NPU beats overkill silicon

Many lifestyle AI products do not need a datacenter-class accelerator. They need a compact 1–4 TOPS-class IP block that fits cost, power and integration limits.

SoC Integration

How HonChen NPU IP
Drops into Your SoC

Industry-standard bus protocols. Clean clock and reset boundaries. Designed to integrate, not disrupt — your existing SoC pipeline keeps working.

Process-Independent · 55 nm → 2 nm
Multi-Precision INT8 + FP16
Multi-Model Runtime Switching
Custom AI Instruction Set
Low-Power Architecture
HonChen NPU IP
AXI Master Interface▲ to SoC fabric / DRAM
CNN Compute Engine

Conv2D · Depthwise · Pointwise · Pooling · Upsample · BatchNorm · Element-wise · Concat

Transformer Engine

MatMul · Multi-Head Attention · Softmax · LayerNorm · GELU · KV Cache · Embedding

Local SRAM Scratchpad

Weights / activations cache · double-buffer · streaming buffer · DMA-managed

Quantize / Activation

INT8 / FP16 · Per-channel · ReLU · Leaky ReLU · GELU · Softmax · Sigmoid · Tanh

5-Stage RISC-V Control Core

Custom AI instructions · Runtime scheduler · DMA orchestration · Multi-model switching · Clock gating

APB Slave·Synchronous FIFO▼ control + streaming data
AXI Master → SoC fabric / DRAM
APB Slave ← CPU control registers
Sync FIFO ↔ streaming data
Local SRAM · DDR optional
Clock + external power gating

HonChen NPU IP uses standardized interfaces that integrate directly into your existing SoC fabric. Industry-standard AXI Master for memory access, APB Slave for CPU control, and Synchronous FIFO for streaming data — no custom interconnect required.

Verification & Delivery

Engineering You Can
Actually Verify

Real IP development discipline applied to NPU — deliverables built for real integration and production.

FPGA PROTOTYPING

Pre-silicon validation on Xilinx-class FPGA boards for full workload regression

TESTED MODELS

Whisper small / tiny · Qwen 500M / 1.7B LLM · VITS · YOLOv3-tiny · YOLOv4-tiny · Face Detection · Face Angle — continuously expanding

ONNX COMPATIBLE

Standard model format input — bring your trained model from PyTorch, TensorFlow, or any ONNX exporter

BUILT-IN SELF-TEST

BIST mode for integration debugging and post-silicon production testing — verify the IP before tape-out and after

INTEGRATION SUPPORT

On-site / remote engineering hours during customer SoC integration phase

SDK + REFERENCE

Toolchain + model porting examples + integration reference platform

DOCUMENTATION

Integration guide · register reference · performance characterization · ONNX operator support matrix

MAINTENANCE

Annual subscription for bug fixes, minor enhancements, technical consultation

HonChen Cloud Service

Model Optimization,
Remote & Effortless

A planned cloud-assisted service for customers: upload an ONNX model, receive a NPU-optimized binary ready to deploy on HonChen IP — no local toolchain installation needed.

HonChen Cloud Optimizer

★ Coming soon

How it works. Customers connect to HonChen's secured server, upload an ONNX model, and receive an optimized binary configured for their target NPU configuration. No SDK installation, no local compute, no GPU farm — just the result.

Designed for ASIC service providers and OEM partners who need to iterate on multiple model variants quickly without maintaining the toolchain themselves.

Want to evaluate the toolchain?

Get a technical briefing on architecture, compiler, and integration — under NDA.