The fundamental idea behind the new methodology for capturing trace data is to equip the microcontroller with a facility that allows the synchronization with an external emulator. Thus, the only data communicated from the SoC to the emulator is the one required to reconstruct all internal data and the program flow. All other system responses are defined by the program code. Fig. 1-1 shows the emulation principle.

Conventional methods transfer all read data and write data as well as all jumps to the emulator. Whereas our proposed methodology only transfers all read data that originates from the peripheral components of the controller and the interrupts, because only this data/these events can change the program flow unpredictably.

 hidICE principle. All related clocks, all data read from the periphery and all interrupts are transferred to the emulation. One or more hash values of instructions, addresses and data can be transferred to the emulation.

The hidICE principle is not only applicable for microcontrollers with a single CPU without DMA, but also for DMA equipped and multi-core microcontrollers. Identical hash values of both systems guarantee the validity of the captured trace data. As the emulation does not influence the system itself, it is possible to use the emulation concurrently to traditional on-chip debug support.

Therefore, the emulator must meet these requirements:

  • The emulator must replicate the microcontroller’s bus master cores (one or more CPUs and DMA controller), but none of the peripherals such as ADC, UART or CAN.
  • Its RAM memory must be the same size or larger and have the same or faster access times as the SoC.
  • The emulator must have ROM of the same size or larger, same or faster access times and the same content as that of the SoC.
  • Via the synchronization interface the following signals have to be transmitted from the SoC to the emulator: 
    • CPU clock 
    • Results of the CPU / DMA read operations in the peripheral area
    • Interrupt and DMA requests

Given these properties, emulation will precisely match the behaviour and the instructions carried out by the microcontroller being emulated. Both SoC and emulator start with the same internal state. Also, both have the same content of ROM. Thus, the information read from peripheral components is sufficient to exactly reconstruct the RAM content in the emulation, since writes to the local RAM are reflected in the emulation.

A read operation to non-initialized RAM addresses is not allowed, due to identical system behaviour not being guaranteed in case of different RAM content. The system integrity control discussed below will detect and signalize such a difference.

All branch decisions will be the same in the SoC and the emulator. The only remaining changes of program flow which can not be predicted in the emulator are interrupts or DMA requests, which are communicated to the emulation to replicate the exact behaviour of the SoC.

Depending on complexity and speed, the emulation can be run in an FPGA for slower CPUs or in an ASIC for faster CPUs. The ASIC / FPGA emulation must include the CPU core(s), the DMA controller and the internal memory.

In difference to the traditional evaluation chips, only one implementation of the emulation core is required for each CPU series. A new evaluation chip for each new device with new or different periphery is no longer required, since the implementation of peripheral units is not necessary for the emulation. The costs for new evaluation chips, their physical limitations and associated time-to-market delays are no longer encountered. The proposed principle is particularly suited for microcontroller families which have identical cores and varying periphery, as for each new derivative full trace support can be provided immediately at no additional cost.

For slower CPUs (up to ~50 MHz), emulating the CPU in an FPGA lowers tool cost dramatically because only the existing FPGA needs to be reconfigured and one emulator can support different CPU families. Also the emulator logic which analyzes or pre-processes the available trace data can be implemented in the same FPGA. This will provide a very compact and cost efficient emulation system.

In case of an ASIC implementation, the trace data can be made available on a configurable interface. Due to the very high width of the available trace data, it seems reasonable to provide a configurable interface, which provides a subset of the available trace data depending on the current demand. For instance, for a branch / decision code coverage analysis of a CPU the program counter and the data read by the CPU can be made available on the output. For another problem e.g. stack analysis, the stack pointer and program counter may be selected for output. Currently, we are discussing a convenient interface with some major emulator vendors.

Alternatively, the principle can also be used with software emulation. A fast buffer captures the synchronization data and a software emulation of the CPU core computes the executed instructions. Yet, this approach does not work in real time for most applications and the available trace interval is limited by the size of the buffer.