Job scheduling at scale

Eli Segal
3 min readDec 14, 2019

So, you need to send a discount promotion code to your users on their birthday. Sounds easy enough, You use Spring’s own Scheduled annotation to go over all users database and send this email. You also know that Spring is using Quartz under the hood so you know it will work well.

This method works well when we have one instance of a single service. When we have multiple services with multiple instances for each, things can get more complicated than that.

Photo by NeONBRAND on Unsplash

Do I need a scheduler?

Probably yes, there are a lot of ongoing activities that need to happen in most software systems. Sending the weekly email update to customers, Checking for expired passwords, Scale services for specific events, DB backup, System heartbeat and so on.

In process scheduling

Let’s say we have a promotion service that needs to send a weekly promotion email to all users. The simplest solution would be to use some kind of cron inside the service that will invoke the functionality to send that email to all users that didn’t get it already.

In-process scheduling

This method is by far the easiest to implement, but it has a few cons when we go about it at scale:

  • When there are multiple instances of the same service, all of them will do the same thing in the same fixed interval
  • Since multiple instances do the exact same thing, we’ll have to deal with edge cases like race condition
  • We’ll need to implement it per service so we might have multiple scheduler implementations across our system
  • There will be no easy way to monitor and operate all these schedulers from a single place

Using an external scheduler

If we take the previous example and use an external scheduler, we can now avoid the problems we had with an in-process scheduler. In this scenario we can use a queue or a load balancer for example, to make sure each invocation is done once by only one of the instances. We also now have a way to monitor all of our scheduled calls as they are all invoked from the same place.

Of course another big advantage of this method, is that it makes it much more easier to use a 3rd party scheduler, which will save us development time.

Summary

Job scheduling at scale isn’t hard, but we do need to be careful when applying it and take into consideration any pitfalls we might come across while doing it.

After identifying why we need a scheduler and the kind of scheduler we need, in the next part I’ll go over popular 3rd party schedulers that will make our life much easier helping us to manage scheduled jobs.

--

--

Eli Segal

Long time software developer and advocator for a better software craftsmanship