Or: How I use GitHub Copilot to go from feature to idea
A few weeks back, I introduced you to jstall (Quickly Inspect your Java Application with JStall), a tool that analyses what your JVM is currently doing. This week is the first time I’m bringing you in on the development and letting you peek behind the curtains to see how I go from idea to implemented feature. The feature I’ll implement the jvm-support analysis that checks that the JVM running your application is not outdated.
This is the first time I recorded my development process, so I hope you still liked it.
See you on another day for something command-line parser- or redaction-related.
This blog post is part of my work in the SapMachine team at SAP, making profiling easier for everyone.
Over the last 30 years, Java has added many features, including generics, lambdas, pattern matching, and records. You surely know that lambdas have been introduced in Java eight and records in Java 16, but can you distinguish the other Java versions?
I felt I couldn’t, so I created a tiny little Java game: The Java Version Quiz. In this quiz, you get a Java snippet and have to decide between five different Java versions. Pick the smallest Java version where the snippet is valid code (without using preview features).
The screenshot shows the alpha version of the game, which includes features introduced in Java 1.0alpha2 and 1.0alpha3, including bug fixes. A game version for only the nerdiest of Java connaisseurs. Source is a dump of the alpha2 and alpha3 packages on GitHub.
The game focuses on Java language differences and major runtime differences, which are easy to check without semantic analysis. It’s a by-product of another fun little project.
I hope you learn some new features of Java and discover that it is evolving over the years, while still keeping the syntax similar enough that it’s hard to spot the differences between versions. And if you’re unsure what a specific feature in the shown code snippet is, the quiz gives you a handy description.
If you find any issues or have new code examples, feel free to contribute to the quiz on GitHub.
See you in another week for another blog post on something JFR-related.
Ever had a tricky bug caused by a race condition or rare concurrency condition that was really hard to reproduce? It’s great when you have a fix that should work in theory, but without a reproducer, only time will tell whether your fix really worked. In this blog post, we’ll revisit my old blog post Hello eBPF: Concurrency Testing using Custom Linux Schedulers (19), and try to use the concurrency-fuzz-scheduler to reproduce a bug I fixed a while ago in the OpenJDK.
The scheduler aims to be as chaotic as possible; hence, Jake Hillion’s Rust version is called scx_chaos. But we’ll focus on the Java version, the concurrency-fuzz-scheduler, because it’s not only implemented in Java on top of my hello-ebpf library, but it’s also optimized for fuzzing Java applications, inserting random sleeps at the scheduler level with a focus on non-VM threads.
TL;DR: The concurrency scheduler is a nice tool to provoke rare parallelism conditions and create reproducers.
The bug in question is JDK-8366486, reported by David Holmes in August 2025: A test case that checks that we can run multiple recordings with the CPU-time sampler in direct succession does work. The only problem: The test should not work, but it still worked most of the time. If you’re only interested in the actual bug, skip ahead to the end of the blog post for an explanation.
You’ll find the fixed version here and the broken version here (because the old JDK with the actual bug had compilation issues on my current system, I had to reintroduce the bug in a separate branch).
Let’s start with running the test case with the standard Linux scheduler on a large machine, so that everything can run nicely in parallel:
While JFR is a great tool for profiling your application and gaining insights, the file format is, on purpose, not well documented or specified. One of the best sources of information is Gunnar Morling’s blog post on the topic, and of course, the OpenJDK source code.
But of course, there are ready-made APIs for reading JFR files and OpenJDK-adjacent libraries to write them. In this overview blog post, I’ll showcase the built-in Java JFR API, Jaroslav Bachorik’s jafar API, and the JMC JFR writer API, as well as my own basic-jfr-processor library based on the latter.
Last week, I demonstrated that OpenJDK is faster than GraalVM Java, at least for obtaining the Java version. This even prompted the mighty Thomas Wuerthinger (creator of GraalVM) to react. But the measured ~20ms for the OpenJDK is still too slow for applications like execjar, where it could significantly increase the runtime of short-running CLI tools. In this week’s brief blog post, I’ll show you the fastest way to access the Java version.
The main performance issue is that calling java -version creates a process with a fairly large (around 38MB) maximum resident set size, and using a proper command line parser. But do we actually need to call the java binary to get the version?
TL;DR: I created the java-version tool, which can obtain the Java version in under a millisecond.
Basic Idea
No, we can just realize that most Java installations have a release file that contains the relevant information in a machine-readable format. You can find this file in the main folder of the installation (./release when java is in ./bin).
Well, we all know that the most crucial feature of the JVM runtime is the -version output. So how does the OpenJDK (in the form of SapMachine) compare with GraalVM? It’s significantly faster. Using hyperfine, we can see that GraalVM 25 CE takes almost twice as long to emit the version number as a regular SapMachine 25 on my MacBook Pro M5:
The slowness of java -version was actually one of the performance issues of the tool I showcased in How to Build an Executable from a JAR using ExecJAR, as it originally used java -version a lot to check the Java version constraint.
Is this relevant? Not really. However, so are most microbenchmarks and benchmarks in general that are taken out of context. You should not generalize small benchmarks, and modern systems are complex.
Join me next week for a blog post on something different and learn how to check the version of a Java installation even faster in under one millisecond:
P.S.: I just ran some more benchmarks: OpenJDK 25 is 18% faster than OpenJDK 17 and 21 and a whopping 84% faster than OpenJDK 11. Upgrade now!
P.P.S.: As many people (Thomas Wuerthinger, Fabio Niebhaus, Volker Simonis, and multiple of my SapMachine colleagues) pointed out, the differences between OpenJDK and GraalVM are due to the GraalVM initializing the JVM Compiler Interface (JVMCI). The difference between the two becomes negligible when running OpenJDK with enabled JVMCI (initialize the JIT at the beginning):
In my last blog post, I covered a new tool called jstall, which enables you to quickly check on a Java application. Because it was tiresome to always call the tool via java -jar jstall, I looked for a way to create executables directly from JARs, inspired by async-profiler’s build system. And I, of course, went down a rabbit hole. In this blog post, I’ll show you how use execjar to easily create your own executable JARs that you can execute directly on the command line while still being valid JARs.
TL;DR:execjar is a CLI and Maven plugin that enables you to create executables from JARs by just adding a few lines to your Maven file:
Welcome to the last blog post of the year. Last week, I discussed the limitations of custom JFR events. This week, I’ll also be covering a profiling-related topic and showcasing a tiny tool called JStall.
I hope I’m not the only one who sometimes wonders: “What is my Java application doing right now?” When you don’t see any output. Yes, you could perform a simple thread dump via jstack, but it is hard to understand which threads are actually consuming CPU and making any sort of progress. This is where my tiny tool called JStall comes in:
JStall is a small command-line tool for one-shot inspection of running JVMs using thread dumps and short, on-demand profiling. The tool essentially takes multiple thread dumps of your application and uses the per-thread cpu-time information to find the most CPU-time-consuming Java threads.
First, download the JStall executable from the GitHub releases page. Let us then start by finding the currently running JVMs:
> ./jstall
Usage: jstall <command> <pid|files> [options]
Available commands:
status - Show overall status (deadlocks + most active threads)
deadlock - Check for deadlocks
most-work - Show threads doing the most work
flame - Generate flame graph
threads - List all threads
Available JVMs:
7153 ./jstall
1223 <unknown>
8136 ./renaissance-gpl-0.16.0.jar
6138 org.jetbrains.idea.maven.server.RemoteMavenServer36
5597 DeadlockDemo
49294 com.intellij.idea.Main
This provides us with a list of options for the main status command, as well as a list of JVM processes and their corresponding main classes. Let’s start checking for deadlocking:
JDK Flight Recorder (JFR) provides support for custom events as a profiler. Around two years ago, I wrote a blog post on this very topic: Custom JFR Events: A Short Introduction. These custom events are beneficial because they enable us to record additional project-specific information alongside the standard JFR events, all in the same file. We can then view and process this information with the JFR tools. You can freely specify these events in Java.
There is only one tiny problem nobody talks about: Array support (and, in more general, the support of complex types).
This week is a bit different as I’m working on a fun year-end blog and doing my regular work, so I’ll share with you how to create an advent calendar:
Normal Advent calendars are boring: so let’s make our own! We’ll combine all our favorite technologies: Gridfinity (for the grid system), 3D printing (for the grid), vacuum molding (for the chocolate), laser cutting (for the frame), and automated paper cutting to create advent calendars that are both beautiful and functional.
Along the way, we’ll cover practical food safety considerations and show how these techniques come together to produce something tasty, nerdy, and gift-worthy:
You can mind the positive for the chocolate mold on MakerWorld:
Over a year ago, I wrote a blog post called Who instruments the instrumenters? together with Mikaël Francoeur on how we debugged the Java instrumentation code. In the meantime, I gave a more detailed talk on this topic at VoxxedDays Amsterdam. The meta-agent that I developed for this worked well for Java agents/instrumenters, but what about native agents? Marco Sussitz found my agent and asked exactly this question. Native agents are agents that utilize the JVMTI API to, for example, modify class bytecode; however, they are not written in Java. With this blog post, I’m proud to announce that the meta-agent now supports instrumenting native agents.
TL;DR: Meta-agent allows you to see how an agent, native or Java, transforms bytecode.
There are many examples of native agents, like DynaTrace‘s monitoring agent or async-profiler‘s method tracer. I’m using the latter in my example here, as it’s open-source and readily available. The method tracer instruments the Java bytecode to trace the execution time of specific methods. You can find more about it in the async-profiler forum.
As a sample program, we use Loop.java:
public class Loop {
public static void main(String[] args)
throws InterruptedException {
while (true) Thread.sleep(1000);
}
}
Let’s trace the Thrread.sleep method and use the meta-agent to see what async-profiler does with the bytecode:
This opens a server at localhost:7071 and we check how async-profiler modified the Thread class:
So we can now instrument native agents like any other Java agent. And the part: As all Java agents are built on top of the libinstrument native agent, we can also see what any Java agent is doing. For example, we can see that the Java instrumentation agent instruments itself:
So I finally built an instrumenter that can essentially instrument my instrumentation agent, which in turn instruments other instrumentation agents. Another benefit is that the instrumenter can find every modification of any Java agent.
In my last blog post, I showed you how to work with JFR files using DuckDB, which started a blog series that I surely will continue. Just not this week. Instead, I want to showcase a tiny app to run AI models using the MediaPipe API directly on your phone. I created the app for another purpose (perhaps described in a future blog post) earlier this year, but never wrote anything about it. So here we are.
TL;DR: I built an Android app that offers AI models via a server
The app is open-source and available on GitHub; it’s experimental, but maybe it can help you build your own apps. You can download the releases page of the repo and install it.
The LLM API endpoint, writing a poem on a backyard sceneContinue reading →
In my previous post, I showed you how tricky it is to compare objects from the JFR Java API. You probably wondered why I wrote about this topic. Here is the reason: In this blog post, I’ll cover how to load JFR files into a DuckDB database to allow querying profiling data with simple SQL queries, all JFR views included.
This blog post will start a small series on making JFR quack.
TL;DR
You can now use a query tool (via GitHub) to transform JFR files into similarly sized DuckDB files:
CREATE VIEW "hot-methods" AS
SELECT
(c.javaName || '.' || m.name || m.descriptor) AS "Method",
COUNT(*) AS "Samples",
format_percentage(COUNT(*) / (SELECT COUNT(*) FROM ExecutionSample)) AS "Percent"
FROM ExecutionSample es
JOIN Method m ON es.stackTrace$topMethod = m._id
JOIN Class c ON m.type = c._id
GROUP BY es.stackTrace$topApplicationMethod, c.javaName, m.name, m.descriptor
ORDER BY COUNT(*) DESC
LIMIT 25
In the last blog post, I showed you how to silence JFR’s startup messages. This week’s blog post is also related to JFR, and no, it’s not about the JFR Events website, which got a simple search bar. It’s a short blog post on comparing objects from JFR recordings in Java and why this is slightly trickier than you might have expected.
Example
Getting a JFR recording is simple; just use the RecordingStream API. We do this in the following to record an execution trace of a tight loop using JFR and store it in a list:
List<RecordedEvent> events = new ArrayList<>();
// Know when to stop the loop
AtomicBoolean running = new AtomicBoolean(true);
// We obtain one hundred execution samples
// that have all the same stack trace
final long currentThreadId = Thread.currentThread().threadId();
try (RecordingStream rs = new RecordingStream()) {
rs.enable("jdk.ExecutionSample").with("period", "1ms");
rs.onEvent("jdk.ExecutionSample", event -> {
if (event.getThread("sampledThread")
.getJavaThreadId() != currentThreadId) {
return; // don't record other threads
}
events.add(event);
if (events.size() >= 100) {
// we can signal to stop
running.set(false);
}
});
rs.startAsync();
int i = 0;
while (running.get()) { // some busy loop to produce sample
for (int j = 0; j < 100000; j++) {
i += j;
}
}
rs.stop();
}
[0.172s][info][jfr,startup] Started recording 1. No limit specified, using maxsize=250MB as default.
[0.172s][info][jfr,startup]
[0.172s][info][jfr,startup] Use jcmd 29448 JFR.dump name=1 to copy recording data to file.
when starting the Flight Recorder with -XX:StartFlightRecorder? Even though the default logging level is warning, not info?
This is what this week’s blog post is all about. After I showed you last week how to waste CPU like a Professional, this week I’ll show you how to silence JFR. Back to the problem:
As a short backstory, my profiler needed a test to check that the queue size of the sampler really increased dynamically (see Java 25’s new CPU-Time Profiler: Queue Sizing (3)), so I needed a way to let a thread spend a pre-defined number of seconds running natively on the CPU. You can find the test case in its hopefully final form here, but be aware that writing such cases is more complicated than it looks.
So here we are: In need to essentially properly waste CPU-time, preferably in user-land, for a fixed amount of time. The problem: There are only a few scant resources online, so I decided to create my own. I’ll show you seven different ways to implement a simple
void my_wait(int seconds);
method, and you’ll learn far more about this topic than you ever wanted to. That works both on Mac OS and Linux. All the code is MIT licensed; you can find it on GitHub in my waste-cpu-experiments, alongside some profiling results.
Welcome back to my blog, this time for a blog post on profiling your Java applications in Cloud Foundry and the tool I helped to develop to make it easier.
Cloud Foundry “is an open source, multi-cloud application platform as a service (PaaS) governed by the Cloud Foundry Foundation, a 501(c)(6) organization” (Wikipedia). It allows you to run your workloads easily in the cloud, including your Java applications. You just need to define a manifest.yml, like for example:
But how would you profile this application? This and more is the topic of this blog post.
I will not discuss why you might want to use Cloud Foundry or how you can deploy your own applications. I assume you came this far in the blog post because you already have basic Cloud Foundry knowledge and want to learn how to profile your applications easily.
The Java Plugin
Cloud Foundry has a cf CLI with a proper plugin system with lots of plugins. A team at SAP, which included Tim Gerrlach, started to develop the Java plugin many years ago at SAP. It’s a plugin offering utilities to gain insights into JVMs running in your Cloud Foundry app.
The changes I described in this blog post led to segfaults in tests, so I backtracked on them for now. Maybe I made a mistake implementing the changes, or my reasoning in the blog post is incorrect. I don’t know yet.
Should the queue implementation use Atomics and acquire-release semantics?
This is what we cover in this short blog post. First, to the rather fun topic:
Is it a Queue?
I always called the primary data structure a queue, but recently, I wondered whether this term is correct. But what is a queue?
Definition: A collection of items in which only the earliest added item may be accessed. Basic operations are add (to the tail) or enqueue and delete (from the head) or dequeue. Delete returns the item removed. Also known as “first-in, first-out” or FIFO.
Welcome back to my series on the new CPU-time profiler in Java 25. In the previous blog post, I covered the implementation of the new profiler. In this week’s blog post, I’ll dive deep into the central request queue, focusing on deciding its proper size.
The JfrCPUTimeTraceQueue allows the signal handler to record sample requests that the out-of-thread sampler and the safepoint handler process. So it’s the central data structure of the profiler:
This queue is thread-local and pre-allocated, as it’s used in the signal handler, so the correct sizing is critical:
If the size is too small, you’ll lose many samples because the signal handler can’t record sample requests.
If you size it too large, you waste lots of memory. A sampling request is 48 bytes, so a queue with 500 elements (currently the default) requires 24kB. This adds up fast if you have more than a few threads.
So, in this blog post, we’re mainly concerned about setting the correct default size and discussing a potential solution to the whole problem.