Comment on Observability Trends 2022 The Linux kernel feature eBPF as a telemetry driver
11.01.2022A guest comment by Michele Mancioppi
Two factors are driving various innovations in the open source observability landscape: distributed tracing and eBPF, the “extended Berkley Package Filter” as part of the Linux kernel. What else can we expect from this?
An overview of the possibilities of the extended Berkeley Packet Filter.
While distributed tracing is “growing up” with OpenTelemetry, eBPF is at the center of innovations as the next frontier of observability and monitoring under Linux. The “extended Berkley Packet Filter” is a fascinating technology that is integrated directly into the Linux kernel.
Created as a programmable method of dealing with networks, eBPF has become a fairly generic and powerful device for secure, dynamic access to the Linux kernel. Among other things, it makes it possible to extract telemetry for observation and security purposes. The feature is so powerful and so popular that Microsoft is working on an eBPF port to Windows.
In terms of observability, eBPF has proven to be remarkably versatile in 2021. For example, there are tools that use eBPF to perform distributed tracing, such as the Pixie project, which was released as an open source project by New Relic in mid-2021. But the most promising aspect of eBPF for 2022, in my opinion, is the fact that with eBPF it is possible to implement continuous profiling of applications in production.
Examples of this are the already mentioned Pixie project and Parca. As Google’s Cloud Profiler and various commercial observability tools prove, a constantly active profiler with little effort in production is incredibly powerful for fixing latency and memory problems in production environments that are notoriously difficult and time-consuming to reproduce in test labs.
Although profiling with eBPF is already taking place in the production environments, it has not yet established itself to such an extent that it can be used with most applications. The way profiling works is amazingly different in the different runtime environments and programming languages.
While almost all languages on the market behave similarly in terms of CPU usage and memory allocation (but there are differences even in such fundamental issues of data processing), the way they deal with concurrency is very different, and in all areas.
An example: Java uses threads within the Java Virtual Machine, which are mapped to threads in the operating system (in fairness, it must be said that there are also other options, such as NIO and libraries such as RxJava, Reactor, Vert.x; in addition, Project Loom will be shipped at some point); in Node.js, on the other hand, concurrency is mainly handled with the event loop, and “real” threads in the OS are abstracted from the developer.
Differences such as these are of great importance for a profiler, since the data that is displayed to the person fixing a problem must be “translated” into the concurrency model used by the programming language. This requires explicit support in the profiler for certain runtimes.
2022, in my opinion, will be the year when we will receive fully functional, open source, eBPF-based production profiling tools that understand the specifics of the many runtime environments that prevail in today’s cloud-native applications, such as Java, Node.js, Python and .NET; as well as compiled languages such as Go and Rust. And this will be an absolute game-changer for all DevOps, SRE and operators to make software better and solve problems faster!”
* Michele Mancioppi, PhD in Computer Science, is Canonical’s Product Manager for Observability.