This is one of the more compute-heavy workflows that the
app server performs, we should be able measure how fast it is
against past revisions.
Add a step to the general CI job which will run each benchmark
as a test without trying to actually measure many iterations.