Together with our engineering colleagues from Virginia Commonwealth University and NIST: National Institute of Standards and Technology, we have expanded our white paper into a full article and are excited to have it published in the January 2021 issue of IEEE Computer Magazine.

Understanding fault types can lead to novel approaches to debugging and runtime verification. Dealing with complex faults, particularly in the challenging area of embedded systems, craves for more powerful tools, which are now becoming available to engineers.

Embedded Trace is an integral part of nearly all modern processors. This whitepaper summarizes the essential facts about this powerful but still far too seldom used functional unit that application engineers, test engineers and project managers should know in order to test, optimize and debug a system efficiently.

This paper briefly discusses the problems associated with software-based instrumentation and how non-intrusive electronic probing was defeated by advancing computer architecture and system integration. We then identify embedded trace as the solution to this observability conundrum laying out the techniques that enable efficient and economically reasonable implementations for this innovative technology. We describe the challenges in bandwidth and volume that are faced by hopeful observers and backend applications highlighting the benefits of modern innovative online analysis capabilities. Finally, we provide a short overview over common physical trace interfaces and simple guidelines that ensure that your next system design is capable of leveraging this cutting-edge technology.

This whitepaper briefly examines common types and the nature of software anomalies. It explains how mistakes lead to observable anomalies and how these are differentiated into Bohrbugs and Mandelbugs according to their reproducibility. The principle of “scientific debugging” is explained. It is shown that the comprehensive observability of a system is a key capability for efficient debugging. Subsequently, the advantages and limitations of various existing and novel monitoring solutions such as printf()-debugging, start/stop-debugging, omniscient debugging, runtime verification and the novel CEDARtools® approach are presented and discussed.

While multi-core processors offer more processing power than single-core architectures, they are more likely to produce hard-to-detect and concurrency-related
errors. This article presents a new technology that enables the measurement of the timing behavior of programs on multi-core architectures, the measurement of
the coverage achieved by test running on the target system, and the analysis of complex errors.

Original German version of this article @ Hanser Automotive 03/2020

Structural testing is an important acceptance criterion for safety-critical embedded and cyber-physical systems (as in Aerospace, Transport, and Critical Infrastructure). The coverage of both all specified requirements and, vice versa, all code to be deployed makes testing very difficult and costly. We elaborate on a solution that validates the conclusiveness of high-level functional tests on fully-integrated safety-critical applications in a non-intrusive fashion. Our approach performs an online analysis of hardware processor trace data in real-time to establish coverage proofs on-the-fly during test runs. It offers deep insights into the completeness of both the tests and their underlying requirements. Establishing the validity of tests on a high functional level reduces the effort enormously that is required on lower, less integrated levels to achieve and justify conclusive coverage statements. The savings achieved by our approach in the development process will be demonstrated and quantified.

This paper briefly examines common types and the nature of failures in complex, embedded multi-core environments. A novel tool CEDARtools ® is presented, overcoming these issues by processing hardware-generated trace in real-time, providing complex triggers and variable monitoring scopes, facilitating a post-mortem analysis. The approach permits preventive monitoring even before the program fails and allows mastering the evolving complexity in embedded development.

Modern multi-core Systems-on-Chip (SoC) provide very high computational power. On the downside, they are hard to debug and it is often very difficult to understand what is going on in these chips because of the limited observability inside the SoC. Chip manufacturers try to compensate this difficulty by providing highly compressed trace data from the individual cores. In the past, the common way to deal with this data was storing it for later offline analysis, which severely limits the time span that can be observed. In this contribution, we present an FPGA-based solution that is able to process the trace data in real-time, enabling continuous observation of the state of a core. Moreover, we discuss applications enabled by this technology.

Proof of functional safety requires the collection of structural coverage information to confirm that the structural coverage is appropriate for the required safety level.
A new observation methodology based on processor traces provides the key means to gain non-intrusive insight into the execution of production code for multi-core SoCs. Trace data analysis must be automated to cope with the enormous bandwidth of trace data streams.
This new technique supports certification of test coverage and enables automated detection of emerging timing constraints. Most importantly, it allows structural tests to be executed on the basis of the production code without the need for software instrumentation.
The paper provides an overview of the applicable functional safety standards, explains the advantages of executing structural tests even at higher test levels, and gives practical hints for hardware and software architecture considerations for providing best observability for executing structural tests.

In this tutorial, we present a comprehensive approach to non-intrusive monitoring of multi-core processors. Modern multi-core processors come with trace-ports that provide a highly compressed trace of the instructions executed by the processor. We describe how these compressed traces can be used to reconstruct the actual control flow trace executed by the program running on the processor and to carry out analyses on the control flow trace in real time using FPGAs. We further give an introduction to the temporal stream-based specification language TeSSLa and show how it can be used to specify typical constraints of a cyber-physical system from the railway domain. Finally, we describe how light-weight, hardware-supported instrumentation can be used to enrich the control-flow trace with data values from the application.

This paper presents an approach for rapidly adjustable embedded trace online monitoring of multi-core systems, called RETOM. Today, most commercial multi-core SoCs provide accurate runtime information through an embedded trace unit without affecting program execution. Available debugging solutions can use it to reconstruct the run offline, but usually for up to a few seconds only. RETOM employs a novel online reconstruction technique that makes the program run available outside the SoC and allows for evaluating a specification formulated in the stream-based specification language TeSSLa in real time. The necessary computing performance is provided by an FPGA-based event processing system. In contrast to other hardware-based runtime verification techniques, changing the specification requires no circuit synthesis and thus seconds rather than minutes or hours. Therefore, iterated testing and property adjustment during development and debugging becomes feasible while preserving the option of arbitrarily extending observation time, which may be necessary to detect rarely occurring errors. Experiments show the feasibility of the approach.

Further Reading

Ein Praxishandbuch für Entwickler, Tester und technische Projektleiter

A Practical Guide for Aviation Software and DO-178C Compliance

The Economics of Software Quality

by Capers Jones and Olivier Bonsignour

This book will help you

  • Move beyond functional quality to quantify non-functional and structural quality
  • Prove that improved software quality translates into strongly positive ROI and greatly reduced TCO
  • Drive better results from current investments in Quality Assurance and Testing
  • Use quality improvement techniques to stay on schedule and on budget


All topics about the efficient Software Development Process

Keep up to Date

Be amongst the first to read our latest articales and blogs by subscribing for our occasional newsletter,
or just checking back to