Search

Fully automated luxury pipeline for updating dependencies in Scala projects with Missinglink

Don’t suffer from NoClassDefFoundError or NoSuchMethodError and similar issues in production, use Missinglink to identify issues in CI!

In a previous blog post, we’ve talked about Bulldozer and PolicyBot and how they automate merging Pull requests for you. In this post, we will talk about another piece of the automatization puzzle: Missinglink and it’s plugin for sbt, sbt-missinglink. This tool checks that all required classes and methods are present on the classpath.

Together with Scala Steward Bulldozer/PolicyBot, it will allow us to achieve a fully automated luxury pipeline for keeping our projects up to date. Pipeline that will automate away as much work as possible while being safe to run without humans needing to supervise them. Since we use Scala, we will put an emphasis on sbt-missinglink, but Missinglink is a tool usable for anybody who runs on the JVM, because it works on the level of Java bytecode.

The hazard of updating dependencies on the dynamic JVM

Scala has a powerful type system, which is a great tool for ensuring that errors are caught during development or CI and not at runtime after deployment to production, where they can cause serious issues. However, in contrast to Scala and Java the language, the Java Virtual Machine is very dynamic. It links all classes together only at runtime and that can have unwanted consequences, especially surprising to developers used to thinking “if it type-checks then it will run correctly”. You can learn more about the dynamic nature of the JVM at Welcome to JAR Hell series.

When Scala Steward opens a new PR updating a version of a dependency, we cannot be entirely sure it can’t break anything. Of course, in CI, we compile the project, so we can be sure that any class or method that our project directly uses is present with the expected signature (if not, the compilation will fail). But what if some other class or method in the library being updated, used only transitively by another library, changes? These issues are called “binary incompatibility” in Scala circles (even more here). It can lead to NoClassDefFoundError or NoSuchMethodError in runtime. Such issues can still be uncovered in CI by tests, but that depends on your test coverage, which is difficult to have 100%.

Taken all together, this means that automatically merging every PR from Scala Steward (or any other updating tool, like Renovate or Dependabot) is too risky for our taste, let alone deploying directly to production after merge. We need a tool that will in CI make sure that a particular update of a dependency doesn’t introduce binary incompatibility.

Missinglink to the rescue

Missinglink is a tool developed by Spotify, designed to solve precisely the issues outlined above. It analyses the bytecode of all classes in all libraries (JARs) that our project uses. Because it sees all the code that can be called at runtime, it can verify that indeed all the called methods and used classes are available. By the virtue of working at the level of Java bytecode, it is agnostic of the language you work with — we use it for Scala, but it should work well for Java or Kotlin too.

Since we compile our Scala programs with sbt, we use the sbt-missinglink plugin which wraps the Missinglink tool. You can add it to your plugins.sbt like this:

addSbtPlugin("ch.epfl.scala" % "sbt-missinglink" % "0.3.2")Code language: CSS (css)

Besides sbt, there is a plugin for Maven, but sadly none for Gradle. Yet! Will you be the one to take up the challenge?

To run the tool in CI or locally on your computer, use the sbt task missinglinkCheck. If it finds conflicts, you will see a log output like this (Pureconfig, together with Circe, are common offenders with regards to binary incompatibility):

242 conflicts found!
Category: Class being called not found
In artifact: rabbitmq-client-pureconfig_2.12-8.6.0.jar
In class: com.avast.clients.rabbitmq.pureconfig.PureconfigImplicits$anon$lazy$macro$636$1
In method: inst$macro$629$lzycompute():90
Call to: pureconfig.Derivation$Successful.<init>(java.lang.Object)
Problem: Class not found: pureconfig.Derivation$Successful
--------
In method: inst$macro$632$lzycompute():90
Call to: pureconfig.Derivation$Successful.<init>(java.lang.Object)
Problem: Class not found: pureconfig.Derivation$Successful
...
Code language: HTML, XML (xml)

In this particular case, this output means that rabbitmq-client-pureconfig depends on a version of Pureconfig which is binary incompatible with the version of Pureconfig which is actually present in the project. The fix was to adjust the Pureconfig version used in the rabbitmq-client project. If we didn’t use Missinglink, our application would surely crash at runtime.

In general, the output of sbt-missinglink can be quite long, especially overwhelming when setting it up for the first time. But in our experience, it is enough to focus on the lines In artifact: some-library-1.2.3.jar which tell you what the problematic libraries are and you can work from there. sbt-missinglink developers are aware of this problem, so hopefully it will improve in the future.

Another downside of Missinglink is that it can report false positives: methods/classes which are not present, but won’t be ever called in our application. Typically, those are Java libraries, often related to logging; we can speculate that those occur because these libraries use reflection heavily. Thankfully, you can filter such false positives out via missinglinkExcludedDependencies. The rule of thumb is that if you encounter a new conflict and it’s a Java library, you can safely exclude it, however, if it is a Scala dependency, chances are that it is a true positive and you should investigate further. For simplicity, at Avast, we reuse the same long list of offending JARs in every project:

  missinglinkExcludedDependencies ++= List(
    moduleFilter(organization = "ch.qos.logback", name = "logback-classic"),
    moduleFilter(organization = "ch.qos.logback", name = "logback-core"),
    moduleFilter(organization = "com.squareup.okhttp3", name = "okhttp"),
    moduleFilter(organization = "com.sun.activation", name = "jakarta.activation"),
    moduleFilter(organization = "com.sun.mail", name = "jakarta.mail"),
    moduleFilter(organization = "com.zaxxer", name = "HikariCP"),
    moduleFilter(organization = "commons-logging", name = "commons-logging"),
    moduleFilter(organization = "io.micrometer", name = "micrometer-registry-statsd"),
    moduleFilter(organization = "io.netty", name = "netty-codec"),
    moduleFilter(organization = "io.netty", name = "netty-common"),
    moduleFilter(organization = "io.netty", name = "netty-handler"),
    moduleFilter(organization = "io.sentry", name = "sentry"),
    moduleFilter(organization = "jakarta.activation", name = "jakarta.activation-api"),
    moduleFilter(organization = "org.apache.logging.log4j", name = "log4j-api"),
    moduleFilter(organization = "org.apache.logging.log4j", name = "log4j-core"),
    moduleFilter(organization = "org.apache.logging.log4j", name = "log4j-slf4j-impl"),
    moduleFilter(organization = "org.asynchttpclient", name = "async-http-client"),
    moduleFilter(organization = "org.javassist", name = "javassist"),
    moduleFilter(organization = "org.jboss.logging", name = "jboss-logging"),
    moduleFilter(organization = "org.jboss.resteasy", name = "resteasy-jaxrs"),
    moduleFilter(organization = "org.reflections", name = "reflections"),
    moduleFilter(organization = "org.slf4j", name = "slf4j-api")
  )Code language: PHP (php)

Missinglink can consume a lot of memory for its analysis, but you can simply increase the heap size by passing -J-Xmx to sbt.

We’ve made sure that (sbt-)missinglink is production ready for modern enterprise apps: it can now handle so called Multi-Release JARs and can run concurrently on all sbt modules at once, thus reducing the wall time spent on the analysis (which could otherwise take a long time). You can still limit the degree of concurrency with concurrentRestrictions, fox example to 4 modules at a time:

concurrentRestrictions += Tags.limit(missinglinkConflictsTag, 4)

This limit can help if you don’t have enough memory to give to sbt if it were to run the analysis on all modules at once.

The quest for binary compatibility continues

Preventing issues in CI, before they happen in production is great. Missinglink is a tool for preventing linking errors on the very dynamic Java Virtual Machine. It works great on its own, but in our opinion it’s a must if one wants to have automated merging of dependency update PRs. So now we have a pipeline set up that Scala Steward opens a PR updating a dependency, in CI we run compilation, tests and Missinglink and if all that passes, Bulldozer can safely merge such PR, no need for humans to check anything. It works even more reliably, because a human could only guess whether a dependency update could cause binary incompatibility.

Preventing linking errors in CI helps us improve our automation game. Our team is considering automating the deployment of our applications from master on each push/merge and we can do that safely, because we know that our app won’t crash because of linking errors, caused by a dependency update or any other reason.

There is also the other side of the binary compatibility coin. Missinglink ensures that the libraries you use won’t crash your app. But if you are a library developer, you need to make sure that you stay binary compatible, so that you won’t break your users’ apps. There are tools for that, like MiMa and the sbt-versioning-policy which has recently released a 1.0 version. In our team, we’re now looking at these tools, but that will be for a future post.