BATENCORE Research investigates compute architectures in which neural inference executes directly in programmable logic — without processor instructions, without an operating system, and without the von Neumann bottleneck on the critical compute path.
Patent pending · PCT/IB2026/053450Hand-written RTL finite-state machines that drive AXI4 transactions directly — every handshake, address, and beat explicit and auditable at the register level.
Q16.16 signed multiply-accumulate on DSP Slices. No IEEE 754 floating-point hardware required. Precision validated against software reference on the same dataset.
Multi-layer feed-forward networks with ReLU activation executing entirely in programmable logic, reading weights from DDR3 via PS7 High-Performance ports.
Every result verified by direct JTAG register readback (Xilinx XSCT), not simulation. Ground truth is the silicon, not the waveform.
FLVH is a progressive hardware architecture for neural network inference on a Zynq-7020 FPGA (Arty Z7-20 development board, XC7Z020CLG400-1). The architecture chains five AXI4 Master engines — DDR3 write, Q16.16 ALU, single-neuron MAC, multi-neuron layer propagation (SPREAD), and multi-layer propagation with ReLU (DEEP) — into a complete inference pipeline validated on silicon across six development phases (N through S).
A 2-2-1 network (2 inputs, 2 hidden neurons with ReLU, 1 linear output) trained offline on XOR classifies all four input cases correctly on silicon. Hardware compute time: approximately 1.3 µs at 125 MHz. ReLU activation required 5 additional lines of Verilog and zero new GPIO registers. Scores land exactly at 0.0 or 1.0 in Q16.16.
Each phase validated by direct XSCT register readback before the next begins. No simulation-only results.
The patent covers the FLVH architecture, including the AXI4 Master engine design pattern, the Q16.16 fixed-point multiply-accumulate pipeline, and the multi-layer propagation protocol in which the ARM processor is removed from the neural compute path. Licensing inquiries: hounaine.hamiani@batencore.com
Hardware architect and inventor of the FLVH architecture. Specialises in FPGA-based neural inference, AXI4 interconnect design, and fixed-point arithmetic in programmable logic. Designed, implemented, and validated the complete FLVH pipeline from first principles in hand-written Verilog RTL on silicon, from individual engine validation to multi-layer XOR inference without CPU intervention on the compute path.
Current implementation requires one ARM trigger per network layer. An extended FSM can chain layers automatically using the previous layer's output buffer address as the next layer's input — eliminating ARM involvement entirely from multi-layer inference. (Phase T)
The DEEP engine processes neurons sequentially. Instantiating M engines in parallel with a multi-master SmartConnect configuration would achieve M× speedup at the cost of additional DSP and LUT resources, with 218 DSP Slices still available on the XC7Z020. (Phase U)
Inference energy per MAC (pJ/MAC) has not yet been measured. The Zynq-7020 XADC provides on-chip power sensors accessible from bare-metal C, enabling direct comparison with published GPU and dedicated AI accelerator figures.