How we fought with ANR rate in Android Vitals.

We were fighting with the ANR rate in our app for almost a year. We fixed all the bugs in our code and our libraries and reported some issues to 3rd party lib vendors. It was mostly executing a long operation on the main thread. But we were still over the bad behavior threshold (0.47%), something slightly over 0.50%. There was no remaining ANR in our code, everything was somewhere inside of the Android code itself. So it looked like there was no hope for us. We were thinking that something else was wrong in our code, indirectly causing ANR in the Android code itself. We noticed a lot of ANR reports with a stack containing this:

- locked <0x0ecdb2ec> (a java.lang.Object)
  at sun.misc.Unsafe.park (
  at java.util.concurrent.locks.LockSupport.park (
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt (
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly (
  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly (
  at java.util.concurrent.CountDownLatch.await (
  at$EditorImpl$ (
  at (
Code language: HTML, XML (xml)

It happened mostly during the app start. We investigated it a lot.  We were using apply() everywhere, so we did not understand why there was some shared preference writing on the main thread. In the end, we found out that when Android is starting/stopping an Activity/Service it forces to store all changes in the queue on the main thread. So, apply() will save data on background thread only if it happens before the activity/service finishes. It’s hard to say why exactly.  I really think that Android should have some special maintenance thread for such things and the system should wait also for it when it needs to kill the app. We tried to split the shared preference to multiple files to avoid as many write operations during the app start as possible, with very little influence on the ANR rate.

So we decided to do an experiment by replacing the default implementation of SharedPreferences with our custom implementation. And it had an amazing result. The ANR rate dropped from more like 0.5% to 0.23%. It was more than all our previous fixes combined.

So, how did we do it? We started with a very simple solution. We basically just overrode the default SharedPreferences implementation in a way that the apply() method calls the default commit() method on a dedicated background thread. This thread is shared between all instances, so it doesn’t overload system resources. We just needed to handle our own state cache, because all getter methods need to return the updated values immediately after apply() is called, even before commit() is called on the background thread. The full implementation code is here:

We used the Robolectric test suite to verify that our implementation behaves in the same way as the original. There are several implementation nuances that need to be fulfilled. In the original implementation, there was also one violation against documentation. In the description of remove() method, it is mentioned that: “All removals are done first, regardless of whether you called remove before or after put methods on this editor.” But it is not true. If you call edit().put("x").remove("x").commit() variable x will be removed after commit. So we had the same bug in our implementation as well, because we kept the original behavior.

Replacing SharedPreferences with the custom implementation is actually quite easy, you just need to override getSharedPreferences(String name, int mode) in the Application class and optionally also in the common parent of all your activities. Then all 3rd libraries will also use the custom implementation.
Our very simple implementation also has several disadvantages. It is possible that the app is killed before the background job finishes all the writes. If you call apply() several times it always ends up with several commit() calls. There is no duplicity handling. And all instances of OnSharedPreferenceChangeListener are called not directly after apply(), but after commit() is called. We have been using it in production already for more than half a year without any issue.

Join the discussion