Publications

Rapidly Adjustable Non-intrusive Online Monitoring for Multi-core Systems

This paper presents an approach for rapidly adjustable embedded trace online monitoring of multi-core systems, called RETOM. Today, most commercial multi-core SoCs provide accurate runtime information through an embedded trace unit without affecting program execution. Available debugging solutions can use it to reconstruct the run offline, but usually for up to a few seconds only. RETOM employs a novel online reconstruction technique that makes the program run available outside the SoC and allows for evaluating a specification formulated in the stream-based specification language TeSSLa in real time. The necessary computing performance is provided by an FPGA-based event processing system. In contrast to other hardware-based runtime verification techniques, changing the specification requires no circuit synthesis and thus seconds rather than minutes or hours. Therefore, iterated testing and property adjustment during development and debugging becomes feasible while preserving the option of arbitrarily extending observation time, which may be necessary to detect rarely occurring errors. Experiments show the feasibility of the approach.

Hardware-Based Runtime Verification with Embedded Tracing Units and Stream Processing

In this tutorial, we present a comprehensive approach to non-intrusive monitoring of multi-core processors. Modern multi-core processors come with trace-ports that provide a highly compressed trace of the instructions executed by the processor. We describe how these compressed traces can be used to reconstruct the actual control flow trace executed by the program running on the processor and to carry out analyses on the control flow trace in real time using FPGAs. We further give an introduction to the temporal stream-based specification language TeSSLa and show how it can be used to specify typical constraints of a cyber-physical system from the railway domain. Finally, we describe how light-weight, hardware-supported instrumentation can be used to enrich the control-flow trace with data values from the application.

Online analysis of debug trace data for embedded systems

Modern multi-core Systems-on-Chip (SoC) provide very high computational power. On the downside, they are hard to debug and it is often very difficult to understand what is going on in these chips because of the limited observability inside the SoC. Chip manufacturers try to compensate this difficulty by providing highly compressed trace data from the individual cores. In the past, the common way to deal with this data was storing it for later offline analysis, which severely limits the time span that can be observed. In this contribution, we present an FPGA-based solution that is able to process the trace data in real-time, enabling continuous observation of the state of a core. Moreover, we discuss applications enabled by this technology.

The CEDARtools Platform – Massive External Memory with High Bandwidth and Low Latency Under Fine-Granular Random Access Patterns

This demo showcases the ZUSPRL302 platform that was developed as the hardware vehicle enabling the work of the EU-funded H2020 Project COEMS (https://www.coems.eu/). This platform features an extensive amount of large and fast external reduced-latency RLDRAM modules, which mitigate the critical memory access times of pointer-chasing algorithms. We demonstrate that this platform enables the online reconstruction of the control flow of an application running on a standard off-the-shelf processor monitored via its execution trace interface. This reconstruction has to traverse the prepped control flow graph of the application on the basis of minimal highly-compressed execution trace data and does so at the pace of the processor for the purpose of coverage monitoring and integrity checking. The ZUSPRL302 platform is valuable contribution to the FPGA community as it enables the FPGA acceleration of fine-grained pointer chasing algorithms, which have traditionally been considered a misfit for this domain.

Conclusive On-the-fly Validation of High-Level Functional Tests

Structural testing is an important acceptance criterion for safety-critical embedded and cyber-physical systems (as in Aerospace, Transport, and Critical Infrastructure). The coverage of both all specified requirements and, vice versa, all code to be deployed makes testing very difficult and costly. We elaborate on a solution that validates the conclusiveness of high-level functional tests on fully-integrated safety-critical applications in a non-intrusive fashion. Our approach performs an online analysis of hardware processor trace data in real-time to establish coverage proofs on-the-fly during test runs. It offers deep insights into the completeness of both the tests and their underlying requirements. Establishing the validity of tests on a high functional level reduces the effort enormously that is required on lower, less integrated levels to achieve and justify conclusive coverage statements. The savings achieved by our approach in the development process will be demonstrated and quantified.

Debugging Complex Failures of Real-Time Multi-Core Systems

This paper briefly examines common types and the nature of failures in complex, embedded multi-core environments. A novel tool CEDARtools ® is presented, overcoming these issues by processing hardware-generated trace in real-time, providing complex triggers and variable monitoring scopes, facilitating a post-mortem analysis. The approach permits preventive monitoring even before the program fails and allows mastering the evolving complexity in embedded development.

Publications in German

Test und Fehlersuche in komplexen Autonomen Systemen

Was nach den Modul-Tests kommt - Dynamische und strukturelle Tests auf höheren Ebenen

In diesem Papier wird ein neuartiger Ansatz für Test und Fehlersuche in komplexen Autonomen Systemen vorgestellt. Basis der Lösung ist die Analyse von Trace-Daten, die von dem Zielsystem über eine dedizierte, häufig schon vorhandene Prozessoreinheit zur Verfügung gestellt werden. Im Vergleich mit dem aktuellen Stand der Technik wird der Ansatz eingeordnet und die besonderen Vorzüge hinsichtlich der Nicht-Intrusivität und der unbegrenzten Überwachungslaufzeit herausgestellt.

Nach einem kurzen Einblick in die Funktionsweise des Systems folgen zwei Anwendungsbeispiele, die Code-Coverage-Messung und die dynamische Analyse, mit welchen der praktische Nutzen des Systems verdeutlicht und ein Einblick in die Handhabung der neuen Methode geliefert wird. Die Möglichkeiten des Systems reichen von einfachen Zeitmessungen, über Wirkkettenmessungen (WCRT Abschätzung), bis hin zu komplexen funktionalen Tests. Wesentlicher Bestandteil für die dynamische Analyse stellt dabei die Verifikationssprache TeSSLa dar, welche dem Leser anhand von kurzen, prägnanten Beispielen näher gebracht wird. Abschließend werden Anforderungen an das zu beobachtende System gegeben, um den Einsatz des vorgestellten Werkzeugs zu ermöglichen.

Die mit zunehmender Komplexität kontinuierlich steigende Anzahl von Post-Release-Defekten erfordert die Anwendung neuer Testmethoden auf höheren Testebenen. Dazu zählen die Messung der strukturellen Testabdeckung sowie die automatisierte Ausführung von Laufzeitanalysen während der Ausführung von Integrations- und Systemtests. Zudem muss die Fähigkeit vorhanden sein, auch im Feld effizient die Ursache von komplexen Fehlerbildern analysieren zu können. Die damit einhergehenden Anforderungen an eine umfassende Beobachtbarkeit lassen sich durch die Live-Analyse von Trace-Daten erfüllen. Um diese technische Möglichkeit nutzen zu können, muss der breitbandige Zugriff auf die vom Prozessor ausgegebenen Trace-Daten gegeben sein, sowie Anpassungen an der Softwarearchitektur vorgenommen werden.