When, how, and why to use Code Engine's two compute models: Jobs and Applications.
Applications are primarily designed to run microservices or web applications in order to respond to http requests. Therefore, Applications are given an endpoint (route) that a client can call. The request is routed to the application instances and the connection is kept open until the response is sent back to the client.
Application instances scale up and down based on the number of http requests they receive. The criteria to scale the application can be defined by the user and is given by the number of concurrent requests of a certain application instance (i.e., the user can specify to only allow 10 concurrent requests to be processed). If the number of concurrent requests in the system is greater than 10, Code Engine scales up the number of instances to meet the user's criteria. Applications are best suited to handle a high volume of http requests/response workloads with low latency. Applications provide a synchronous request-response model, with higher concurrency.
Jobs are primarily designed to run tasks and processes on a given set of input data until completion. Therefore, Jobs can be started to run asynchronously. The user can specify a list of index values and Code Engine will start a Job task for each index in the list (i.e., the user can specify the values 1-10, which results in 10 tasks, where each task is getting a value between 1 and 10 assigned). Based on the assigned index, the task can decide what data to process (e.g., index 1 maps to object 1 on a COS bucket). If a Job task fails, the task is retried a given amount of times until the task finally fails. Since Jobs run asynchronously, Jobs can run for a long time.
When to use Applications and when to use Jobs
Now that we understand the basic functionality of Jobs and Applications, let's take a look at when to use the one vs. the other (or both in combination). In order to pick the right technology, it's critical to understand the nature of the workload and its characteristics.
- Does your workload require low latency or is it interactive? Use Applications: If your workload requires a client or user to wait synchronously for the response of the request, and the response must be available within a few milliseconds, the developer should use Applications. The reason is that Applications provide an externally reachable endpoint and respond synchronously to the request. Examples of such workloads are websites, chatbots, and mobile applications.
- Is your computation lightweight and does it require less CPU/Memory and I/O? Use Applications: If your workload is lightweight and requires less CPU/Memory and I/O, you could benefit from concurrent invocation within the same application instance. A typical example is an API server that provides CRUD operations and is backed with a NoSQL database. To process the request, the amount of data might be small, and not much memory or CPU cycles are required to process that data. With higher concurrency, the application could process the data of a first request while the second request is waiting for I/O. Since CPU and memory requirements are low, many requests can be executed concurrently. It's also possible to use Jobs, but since Jobs run in a single concurrency mode, the overhead and associated costs are higher.
- Is your computation bound to CPU, memory, or I/O? Use Jobs and Applications: If you need to process a given amount of data, where each chunk of the data is large and requires a lot of CPU and memory, Jobs are typically the better choice. However, if the workload requires a request-response pattern, it's also possible to use apps. In both cases, the computation task would run with single concurrency. Each Application instance or Job task would only process one request or chunk of data concurrently in order to fully utilize the resources configured for the instance. Parallelism is achieved by the number of instances/tasks, where the overhead of spawning an additional task is negotiable due to the high resource constraints. A typical example is the processing of image data on a COS bucket or the serving of machine learning models.
- Does your computation require a lot of time? Use Jobs: If the computation runs for longer durations, Jobs are the better choice because of their asynchronous nature. The maximum duration of Applications will always be limited because maintaining open connections at scale is expensive. Typical workloads are the training of machine learning model or hyper-parameter optimization.
- Can you specify the parallelism of your computation upfront? Use Jobs: If you know how much computation you need to perform, you can run a Job with the exact number of instances until completion. Typical examples are hyper-parameter tuning or training of a neural network.
- Does your workload react to some event? Use Applications: If your workload requires to react on an event — such as a Git commit pushed to your repository, an object uploaded into a COS bucket, or a document modified within your database — you should use Applications because they provide an endpoint that can be configured in as the event sink in the event source.
- Do you need to process a large amount of data in a short period of time in response to events/requests? Use Apps: If your workload requires a fast response on unpredicted requests or events, Applications are typically the better fit, because Applications are scaling dynamically.
IBM Cloud Code Engine has two compute models — Jobs and Applications — and each have many use cases, some of which I have laid out in this post. It is even possible to combine Applications and Jobs, where an Applications can start Jobs to outsource specific computations. On the other hand, it's possible for a Job to query an Application. A typical example of a combination of both worlds is the training and serving of machine learning models. Jobs are typically used to train the models, and Applications are used to serve the models.
The following is a summary of when to best use Applications and Jobs:
- Input/output data on external storage
- Input/output data on request/response
- Dynamic scaling based on requests
- Request/response for interactive workload with low latency
- Response to events
- High concurrency with low resource requirements (best suitable)
- Single concurrency with high resource requirements (but with latency overhead)
- High throughput request/response workload
- Input/output data on external storage
- Static scaling based on input data
- Duration of computation takes longer than 10 minutes
- High concurrency with low resource requirements (but with cost overhead)
- Single concurrency with high resource requirements (best suitable)
Learn more and try it out
Ready to give it a try? Head on over to the “Getting started” section of our documentation.
Be sure to try IBM Cloud Code Engine out today (it’s completely free while in beta).