Posts

Embedded Trace – The Hidden Gem Inside Your Processor

You’ve probably heard of CoreSight™ Embedded Trace Macrocell (ETM), Intel® PT, or MIPI® Nexus—they’re built-in hardware features that quietly record what your processor is up to. Unlike software logs or added profiling code, embedded trace – or sometimes called hardware trace or processor trace – is generated in silicon. You don’t slow down your app, and you don’t need to sprinkle printf statements everywhere.

Still, many teams leave it unused—an unfortunate oversight, considering how much it could simplify debugging and embedded development. Let’s see what embedded trace is, why it’s often ignored, and all the cool stuff you can do with it.


So, What Exactly Is Embedded Trace?

Think of your CPU whispering its every move:

  • Program Counter: the address of each instruction as it runs.
  • Branches Taken: every jump, call, or return your code makes.
  • Memory Reads/Writes: data loads and stores.
  • Cycle Counts: how many clock cycles tick between instructions.

Tiny on-chip circuits capture all this. Gigahertz-speed cores churn out massive data, so the embedded trace IP compresses and packages it, then outputs it over a dedicated trace port (not to confused with JTAG/DAP) or holds it in on-chip buffers at ~500 Mbps or more.


Where You’ll Find Trace Support

  • Intel® NUC & Laptops: Intel® PT is baked into modern Core and Atom CPUs.
  • Raspberry Pi 4: BCM2711 includes CoreSight™ ETM/PTM.
  • …and most Cortex®‑M/R/A, TriCore™, PowerPC®-based boards you see in industry.

Want to know the individual trace capabilities of your processor? Our experts at Accemic Technologies are here to help. Feel free to contact us.


Why You’ll Love Using It

  1. Profiling: Find the functions or tasks eating up cycles.
  2. Code Coverage: Show auditors you hit every branch without adding any extra code.
  3. Replay Debugging: Record once, and replay spurious bugs after.
  4. Real-Time Guarantees: Get cycle-accurate logs to estimate or prove your functions worst-case execution times.

Why Do People Skip It?

  • Mystery Feature: A lot of developers don’t even realize their board has trace built in.
  • Feels Overkill: Setting up trace drivers or fiddling with low-level registers seems daunting. Anyhow, user-friendly tooling exists.
  • Data Overload: The trace churns out tons of information, and on-chip decoders or FIFOs can’t always keep up—so you often only grab a few seconds before buffers fill up.

For short execution times, on-chip processing is easily accessible. To get easily hands on trace, you can use the tracing features of your own x86 laptop and the built-in Linux support.

1. Create a file called “call_trace.c”

Click here to see “call_trace.c”
#include <stdio.h>

static void func3(void) {
    // deepest leaf
    for (volatile int i = 0; i < 1000000; i++);
}

static void func2(void) {
    func3();
}

static void func1(void) {
    func2();
    func3();
}

int main(void) {
    // calls func1, then func2, then func1 again
    func1();
    func2();
    func1();
    return 0;
}

2. Compile the example program below.

gcc -g -O0 -o call_trace call_trace.c

3. Run & record trace.

sudo perf record -e intel_pt//u ./call_trace

4. View trace – for illustrative purposes a simple call-return trace.

sudo perf script --call-ret-trace --dsos=call_trace -i perf.data | grep -Ev 'psb offs:|cbr'
call_trace 12776 [005]  5007.781813585:   tr end  async       (/home/albert/perf-intelpt/call_trace)	_start                          
call_trace 12776 [005]  5007.781814835:   call                (/home/albert/perf-intelpt/call_trace)	    __libc_start_main           
call_trace 12776 [005]  5007.781816918:   call                (/home/albert/perf-intelpt/call_trace)	            _init               
call_trace 12776 [005]  5007.781817127:   return              (/home/albert/perf-intelpt/call_trace)	            _init               
call_trace 12776 [005]  5007.781817127:   call                (/home/albert/perf-intelpt/call_trace)	            frame_dummy         
call_trace 12776 [005]  5007.781817127:   return              (/home/albert/perf-intelpt/call_trace)	            register_tm_clones  
call_trace 12776 [005]  5007.781817127:   return              (/home/albert/perf-intelpt/call_trace)	        __libc_csu_init         
call_trace 12776 [005]  5007.781817335:   call                (/home/albert/perf-intelpt/call_trace)	            func1               
call_trace 12776 [005]  5007.781817335:   call                (/home/albert/perf-intelpt/call_trace)	                func2           
call_trace 12776 [005]  5007.781817335:   call                (/home/albert/perf-intelpt/call_trace)	                    func3       
call_trace 12776 [005]  5007.782483168:   return              (/home/albert/perf-intelpt/call_trace)	                    func3       
call_trace 12776 [005]  5007.782483168:   return              (/home/albert/perf-intelpt/call_trace)	                func2           
call_trace 12776 [005]  5007.782483168:   call                (/home/albert/perf-intelpt/call_trace)	                func3           
call_trace 12776 [005]  5007.782989627:   return              (/home/albert/perf-intelpt/call_trace)	                func3           
call_trace 12776 [005]  5007.782989627:   return              (/home/albert/perf-intelpt/call_trace)	            func1               
call_trace 12776 [005]  5007.782989627:   call                (/home/albert/perf-intelpt/call_trace)	            func2               
call_trace 12776 [005]  5007.782989627:   call                (/home/albert/perf-intelpt/call_trace)	                func3           
call_trace 12776 [005]  5007.783481293:   return              (/home/albert/perf-intelpt/call_trace)	                func3           
call_trace 12776 [005]  5007.783481293:   return              (/home/albert/perf-intelpt/call_trace)	            func2               
call_trace 12776 [005]  5007.783481293:   call                (/home/albert/perf-intelpt/call_trace)	            func1               
call_trace 12776 [005]  5007.783481293:   call                (/home/albert/perf-intelpt/call_trace)	                func2           
call_trace 12776 [005]  5007.783481293:   call                (/home/albert/perf-intelpt/call_trace)	                    func3       
call_trace 12776 [005]  5007.783926085:   return              (/home/albert/perf-intelpt/call_trace)	                    func3       
call_trace 12776 [005]  5007.783926085:   return              (/home/albert/perf-intelpt/call_trace)	                func2           
call_trace 12776 [005]  5007.783926085:   call                (/home/albert/perf-intelpt/call_trace)	                func3           
call_trace 12776 [005]  5007.784296085:   tr end  async       (/home/albert/perf-intelpt/call_trace)	                func3           
call_trace 12776 [005]  5007.784381293:   return              (/home/albert/perf-intelpt/call_trace)	                func3           
call_trace 12776 [005]  5007.784381293:   return              (/home/albert/perf-intelpt/call_trace)	            func1               
call_trace 12776 [005]  5007.784381293:   return              (/home/albert/perf-intelpt/call_trace)	        main

That’s it—perf now speaks silicon-native trace.


No-Compromise: High-End External Analysis

For rugged industrial gear and safety-certified systems—think ISO 26262 automotive, aerospace, or critical embedded applications (DO-178C, IEC 61508)—on-chip trace buffers just can’t keep pace. You get limited capture windows, and live tracing can even interfere with CPU timing. The solution? Stream trace off the chip for full-power analysis:

Unlimited Observation: Send raw trace off-board to record minutes (or hours) of execution.

Live Decoding with CEDARtools: CEDARtools grabs that off-chip feed and decodes it on the fly—even with high-end Cortex‑A72 silicon—so you see execution flow and hotspots in real time.

Infinite Coverage & WCET: Point CEDARtools.Coverage at your trace target for instant coverage results. Switch to CEDARtools.SmartTrace to nail down worst-case execution times you can bank on.

Speed Up Testing: Hook coverage results into Teamscale to slash system-test runs from days to minutes. Only tests touching changed code paths run, so you get fast feedback after every commit.


Ready to Unlock Embedded Trace?

If you want to explore how our CEDARtools-powered solutions can supercharge your system, our Accemic Technologies experts are here to help.