Embedded Trace – The Hidden Gem Inside Your Processor
You’ve probably heard of CoreSight™ Embedded Trace Macrocell (ETM), Intel® PT, or MIPI® Nexus—they’re built-in hardware features that quietly record what your processor is up to. Unlike software logs or added profiling code, embedded trace – or sometimes called hardware trace or processor trace – is generated in silicon. You don’t slow down your app, and you don’t need to sprinkle printf statements everywhere.
Still, many teams leave it unused—an unfortunate oversight, considering how much it could simplify debugging and embedded development. Let’s see what embedded trace is, why it’s often ignored, and all the cool stuff you can do with it.
So, What Exactly Is Embedded Trace?
Think of your CPU whispering its every move:
- Program Counter: the address of each instruction as it runs.
- Branches Taken: every jump, call, or return your code makes.
- Memory Reads/Writes: data loads and stores.
- Cycle Counts: how many clock cycles tick between instructions.
Tiny on-chip circuits capture all this. Gigahertz-speed cores churn out massive data, so the embedded trace IP compresses and packages it, then outputs it over a dedicated trace port (not to confused with JTAG/DAP) or holds it in on-chip buffers at ~500 Mbps or more.
Where You’ll Find Trace Support
- Intel® NUC & Laptops: Intel® PT is baked into modern Core and Atom CPUs.
- Raspberry Pi 4: BCM2711 includes CoreSight™ ETM/PTM.
- …and most Cortex®‑M/R/A, TriCore™, PowerPC®-based boards you see in industry.
Want to know the individual trace capabilities of your processor? Our experts at Accemic Technologies are here to help. Feel free to contact us.
Why You’ll Love Using It
- Profiling: Find the functions or tasks eating up cycles.
- Code Coverage: Show auditors you hit every branch without adding any extra code.
- Replay Debugging: Record once, and replay spurious bugs after.
- Real-Time Guarantees: Get cycle-accurate logs to estimate or prove your functions worst-case execution times.
Why Do People Skip It?
- Mystery Feature: A lot of developers don’t even realize their board has trace built in.
- Feels Overkill: Setting up trace drivers or fiddling with low-level registers seems daunting. Anyhow, user-friendly tooling exists.
- Data Overload: The trace churns out tons of information, and on-chip decoders or FIFOs can’t always keep up—so you often only grab a few seconds before buffers fill up.
Dive In: On-Chip Profiling with perf and intel_pt
For short execution times, on-chip processing is easily accessible. To get easily hands on trace, you can use the tracing features of your own x86 laptop and the built-in Linux support.
1. Create a file called “call_trace.c”
Click here to see “call_trace.c”
#include <stdio.h>
static void func3(void) {
// deepest leaf
for (volatile int i = 0; i < 1000000; i++);
}
static void func2(void) {
func3();
}
static void func1(void) {
func2();
func3();
}
int main(void) {
// calls func1, then func2, then func1 again
func1();
func2();
func1();
return 0;
}
2. Compile the example program below.
gcc -g -O0 -o call_trace call_trace.c
3. Run & record trace.
sudo perf record -e intel_pt//u ./call_trace
4. View trace – for illustrative purposes a simple call-return trace.
sudo perf script --call-ret-trace --dsos=call_trace -i perf.data | grep -Ev 'psb offs:|cbr'
call_trace 12776 [005] 5007.781813585: tr end async (/home/albert/perf-intelpt/call_trace) _start
call_trace 12776 [005] 5007.781814835: call (/home/albert/perf-intelpt/call_trace) __libc_start_main
call_trace 12776 [005] 5007.781816918: call (/home/albert/perf-intelpt/call_trace) _init
call_trace 12776 [005] 5007.781817127: return (/home/albert/perf-intelpt/call_trace) _init
call_trace 12776 [005] 5007.781817127: call (/home/albert/perf-intelpt/call_trace) frame_dummy
call_trace 12776 [005] 5007.781817127: return (/home/albert/perf-intelpt/call_trace) register_tm_clones
call_trace 12776 [005] 5007.781817127: return (/home/albert/perf-intelpt/call_trace) __libc_csu_init
call_trace 12776 [005] 5007.781817335: call (/home/albert/perf-intelpt/call_trace) func1
call_trace 12776 [005] 5007.781817335: call (/home/albert/perf-intelpt/call_trace) func2
call_trace 12776 [005] 5007.781817335: call (/home/albert/perf-intelpt/call_trace) func3
call_trace 12776 [005] 5007.782483168: return (/home/albert/perf-intelpt/call_trace) func3
call_trace 12776 [005] 5007.782483168: return (/home/albert/perf-intelpt/call_trace) func2
call_trace 12776 [005] 5007.782483168: call (/home/albert/perf-intelpt/call_trace) func3
call_trace 12776 [005] 5007.782989627: return (/home/albert/perf-intelpt/call_trace) func3
call_trace 12776 [005] 5007.782989627: return (/home/albert/perf-intelpt/call_trace) func1
call_trace 12776 [005] 5007.782989627: call (/home/albert/perf-intelpt/call_trace) func2
call_trace 12776 [005] 5007.782989627: call (/home/albert/perf-intelpt/call_trace) func3
call_trace 12776 [005] 5007.783481293: return (/home/albert/perf-intelpt/call_trace) func3
call_trace 12776 [005] 5007.783481293: return (/home/albert/perf-intelpt/call_trace) func2
call_trace 12776 [005] 5007.783481293: call (/home/albert/perf-intelpt/call_trace) func1
call_trace 12776 [005] 5007.783481293: call (/home/albert/perf-intelpt/call_trace) func2
call_trace 12776 [005] 5007.783481293: call (/home/albert/perf-intelpt/call_trace) func3
call_trace 12776 [005] 5007.783926085: return (/home/albert/perf-intelpt/call_trace) func3
call_trace 12776 [005] 5007.783926085: return (/home/albert/perf-intelpt/call_trace) func2
call_trace 12776 [005] 5007.783926085: call (/home/albert/perf-intelpt/call_trace) func3
call_trace 12776 [005] 5007.784296085: tr end async (/home/albert/perf-intelpt/call_trace) func3
call_trace 12776 [005] 5007.784381293: return (/home/albert/perf-intelpt/call_trace) func3
call_trace 12776 [005] 5007.784381293: return (/home/albert/perf-intelpt/call_trace) func1
call_trace 12776 [005] 5007.784381293: return (/home/albert/perf-intelpt/call_trace) main
That’s it—perf now speaks silicon-native trace.
No-Compromise: High-End External Analysis
For rugged industrial gear and safety-certified systems—think ISO 26262 automotive, aerospace, or critical embedded applications (DO-178C, IEC 61508)—on-chip trace buffers just can’t keep pace. You get limited capture windows, and live tracing can even interfere with CPU timing. The solution? Stream trace off the chip for full-power analysis:
Unlimited Observation: Send raw trace off-board to record minutes (or hours) of execution.
Live Decoding with CEDARtools: CEDARtools grabs that off-chip feed and decodes it on the fly—even with high-end Cortex‑A72 silicon—so you see execution flow and hotspots in real time.
Infinite Coverage & WCET: Point CEDARtools.Coverage at your trace target for instant coverage results. Switch to CEDARtools.SmartTrace to nail down worst-case execution times you can bank on.
Speed Up Testing: Hook coverage results into Teamscale to slash system-test runs from days to minutes. Only tests touching changed code paths run, so you get fast feedback after every commit.
Ready to Unlock Embedded Trace?
If you want to explore how our CEDARtools-powered solutions can supercharge your system, our Accemic Technologies experts are here to help.
Intel and Intel® PT are trademarks of Intel Corporation in the U.S. and/or other countries. CoreSight™ and Cortex®-M/R/A are trademarks of Arm Limited. MIPI® and MIPI Alliance® are registered trademarks of the MIPI Alliance, Inc. TriCore™ is a trademark of Infineon Technologies AG. PowerPC® is a registered trademark of International Business Machines Corporation. CEDARtools® is a trademark of Accemic Technologies.

© 2025 Accemic GmbH. All rights reserved.