FPL2020 conference: Using DSP Slices as Content-Addressable Update Queues

Non-intrusive online monitoring of embedded processors can only be realized with high-end FPGA solutions.
To get an impression of the underlying complexity, check out Tom’s presentation held at the 30th International Conference on Field-Programmable Logic and Applications (FPL 2020).

Abstract: Content-Addressable Memory (CAM) is a powerful abstraction for building memory caches, routing tables and hazard detection logic. Without a native CAM structure available on FPGA devices, their functionality must be emulated using the structural primitives at hand. Such an emulation causes significant overhead in the consumption of the underlying resources, typically general-purpose fabric and on-chip block RAM (BRAM). This often motivates mitigating trade-offs, such as the reduction of the associativity of memory caches. This paper describes a technique to implement the hazard resolution in a memory update queue that hides the off-chip memory readout latency of read-modify-write cycles while guaranteeing the delivery of the full memory bandwidth. The innovative use of DSP slices allows them to assume and combine the functions of (a) the tag and data storage, (b) the tag matching, and (c) the data update in this key-value storage scenario. The proposed approach provides designers with extra flexibility by adding this resource type as another option to implement CAM.