Kohei Nozaki's blog 

Synchronize access to shared mutable data


Posted on Thursday Dec 17, 2020 at 06:00PM in Technology


In Java, an update on mutable data made by one thread may not be visible by other threads unless an appropriate synchronization is applied. Such unsynchronized mutable data can be the cause of a nasty bug which is difficult to spot, reproduce and fix.

Why do we need a synchronization?

Let’s say we have a class called Counter which holds mutable data:

class Counter {
    int count = 0;
    int getCount() { return count; }
    int increment() { return ++count; }
}

There is an int field named count. There are also a simple getter and a method which increments the counter. The increment method can change the value in the count field which means that this class holds mutable data.

Now let’s see what happens when an instance of the Counter class is shared across 2 threads without any synchronization. We have this client code of the Counter class:

class ThisNeverFinishesOnMyLaptop {
    public static void main(String[] args) throws InterruptedException {
        Counter counter = new Counter();
        Thread backgroundThread = new Thread(() -> {
            while (true) { // wait until the counter gets incremented
                if (counter.getCount() != 0) break;
            }
            System.out.println("Finished");
        });
        backgroundThread.start();
        TimeUnit.SECONDS.sleep(1);
        counter.increment();
    }
}

First we create an instance of the Counter class then we launch a background thread which starts a while loop and waits until the counter gets incremented. Then the main thread sleeps for 1 second and calls the increment method. This increment() call is supposed to make the background thread get out of the while loop. So what this code is supposed to do is to finish after 1 second with having the "Finished" message printed into the console.

But in some environment, this code never even finishes which means the while loop becomes an infinite loop. The problem here is that due to lack of synchronization, the background thread fails to see the new value which the main thread set. In Java, without synchronization, it’s not guaranteed that updates that have been made by one thread will be visible by other threads. Another thread might see the update at some point or might not see at all like in this example on my laptop. The behavior might vary depending on the environment where this code runs in and it is totally unpredictable.

How do we do a synchronization?

Then how can we make it work properly? One way to do that is the following:

class NowItFinishesOnMyLaptop {
    public static void main(String[] args) throws InterruptedException {
        Counter counter = new Counter();
        Object lock = new Object();
        Thread backgroundThread = new Thread(() -> {
            while (true) { // wait until the counter gets incremented
                synchronized (lock) {
                    if (counter.getCount() != 0) break;
                }
            }
            System.out.println("Finished");
        });
        backgroundThread.start();
        TimeUnit.SECONDS.sleep(1);
        synchronized (lock) {
            counter.increment();
        }
    }
}

Now we create an object named lock and when we read or write data in the counter object, we do it in a synchronized block with the lock object. This guarantees that any update which is done in a synchronized block will be seen by any other code inside a synchronized block that uses the same lock object.

There is also another way to fix this. We can add the volatile keyword to the count field:

class VolatileCounter {
    volatile int count = 0;
    int getCount() { return count; }
    int increment() { return ++count; }
}

And we can replace the use of the Counter class in the ThisNeverFinishesOnMyLaptop class by the VolatileCounter class. The volatile keyword guarantees that updates made by one thread will be visible to other threads.

class ThisAlsoFinishesOnMyLaptop {
    public static void main(String[] args) throws InterruptedException {
        VolatileCounter counter = new VolatileCounter();
        Thread backgroundThread = new Thread(() -> {
            while (true) { // wait until the counter gets incremented
                if (counter.getCount() != 0) break;
            }
            System.out.println("Finished");
        });
        backgroundThread.start();
        TimeUnit.SECONDS.sleep(1);
        counter.increment();
    }
}

But there are some cases where the volatile keyword is not enough. Let’s think about the following code:

class IncrementByMultipleThreads {
    public static void main(String[] args) {
        VolatileCounter counter = new VolatileCounter();
        Set<Integer> ints = Collections.synchronizedSet(new HashSet<>());
        Runnable incrementer = () -> {
            while (true) {
                int increment = counter.increment();
                boolean added = ints.add(increment);
                if (!added) System.out.println("duplicate number detected");
            }
        };
        Thread t1 = new Thread(incrementer), t2 = new Thread(incrementer), t3 = new Thread(incrementer);
        t1.start(); t2.start(); t3.start();
    }
}

First we create a volatile counter object and a Set of integers. We use a synchronized wrapper for the set in order to make it work properly in a concurrent usecase like this. And create a Runnable object which launches an infinite loop. In the loop, it increments the counter object and puts the return value of the increment method into the set. The set will be keeping all of the numbers the counter returned. And when it fails to add the number to the set, which means the same number was already added to the set, it prints that "duplicate number detected" message. And we launch 3 threads that execute the same Runnable object.

It prints out a lot of the "duplicate number detected" messages immediately when I run it on my laptop. Which means the counter object has returned duplicate numbers. The reason behind this behavior is that the incrementation done by ++count is not atomic. What it does is that first it reads the number stored in the count field, adds 1 to it and stores the result into the count field and those are not atomically executed. And since we have 3 threads that are executing the increment method concurrently, there are chances that some of those threads read, calculate and return the same value.

If we want the counter to return unique values for all of the threads, volatile is not sufficient. We can use a synchronized block to make it happen instead:

class IncrementByMultipleThreadsWithLock {
    public static void main(String[] args) {
        Counter counter = new Counter();
        Object lock = new Object();
        Set<Integer> ints = Collections.synchronizedSet(new HashSet<>());
        Runnable incrementer = () -> {
            while (true) {
                int increment;
                synchronized (lock) {
                    increment = counter.increment();
                }
                boolean added = ints.add(increment);
                if (!added) System.out.println("duplicate number detected");
            }
        };
        Thread t1 = new Thread(incrementer), t2 = new Thread(incrementer), t3 = new Thread(incrementer);
        t1.start(); t2.start(); t3.start();
    }
}

We have introduced the lock object again, and put the increment() call inside the synchronized block. Since we have the synchronized block, we don’t need the counter to be volatile anymore, therefore we can simply use the plain Counter class instead.

Now we get unique numbers from the counter and it doesn’t show the "duplicate number detected" message anymore at least until the overflow of the counter field. In addition to the guarantee about the memory synchronization of mutable data, the synchronized block also guarantees that only one thread can execute the code placed inside the block at any given time. It effectively prevents the race condition where the counter returns duplicate numbers.

Which classes in the JDK require synchronization?

So we have seen that we need to use an appropriate way of synchronization for mutable data when it’s shared across multiple threads. There are many classes in the Java standard library that contain mutable data or require synchronization if they are shared across multiple threads. It includes some of collection classes (HashMap, ArrayList etc.) and the Format class and its subclasses (e.g. SimpleDateFormat). To find out if a class requires synchronization, the first thing you can do is to consult the documentation of the class.

If we fail to apply necessary synchronization, the consequences can be horrible. I once saw code which uses a HashMap shared across multiple threads without a synchronization. The HashMap was used as some kind of cache and there are multiple threads that update the HashMap and it could happen simultaneously. It eventually broke the underlying data structure of the HashMap and triggered an infinite loop under heavy load. Due to the fact that this kind of problem rarely shows up, it passed all of the QA processes we had and unfortunately ended up appearing in the production environment. I will never forget this incident because I was the one to fix the issue in midnight. There is a great article on the web which explains how an unsynchronized HashMap triggers an infinite loop: https://mailinator.blogspot.com/2009/06/beautiful-race-condition.html

I also sometimes see a SimpleDateFormat object stored in a static field and shared across multiple threads. This is also dangerous. It might seem to be working fine for most of the time but occasionally it produces a wrong result or throws a mysterious exception under heavy load.

How do we do synchronization for an unsynchronized class in the JDK?

Whenever we encounter a situation where we need to use some class which requires a synchronization in concurrent usecases, the first thing we can do is to look for a thread-safe equivalent.

As for collection classes, there are quite a few thread-safe classes which cover almost all of the interfaces in the collection framework (e.g. ConcurrentHashMap, CopyOnWriteArrayList). If one of those suits your usecase, simply using one of them is the safest and easiest way to apply the synchronization. But one thing we need to keep in mind is that just using thread-safe classes doesn’t always make your code work properly in concurrent usecases. For example, sometimes we need to make certain operations atomic like we did in one of the examples earlier. In such cases, we still need to use some kind of mutual execution exclusion mechanism like we did with the synchronization blocks.

As for SimpleDateFormat, if the version of your Java is 8 or greater, please check the DateTimeFormatter class, which is immutable and thread-safe.

Also one important thing to remember is that just putting volatile doesn’t make unsynchronized objects thread-safe. For example, when you have a HashMap field shared across multiple threads and you put volatile into the declaration of the HashMap field, it doesn’t guarantee that the HashMap will work property in concurrent use cases.

Conclusion

We have discussed why we need to synchronize mutable data, how we can do that and what happens when we fail to do that in concurrent usecases. The consequences can be horrible and a nasty thing about the failure of applying an appropriate synchronization is that everything looks fine for most of the time and the problem shows up only occasionally and sometimes it’s even hard to reproduce. So I recommend paying extra attention when you have to deal with mutable data in concurrent usecases.

Another good way to reduce this kind of risk is minimizing mutability in the code. Immutable data is always safe to share across threads which means we don’t have to even think about it if no mutable data is shared. One good starting point is that whenever you write a class, I recommend trying making the fields in your class final as much as possible. It will make your class one step closer to immutable.

There are a lot of more things to learn for writing safe concurrent applications in Java. If you are interested, I recommend checking this book out: https://jcip.net