One typical task of an PaaS platforms is to execute an application by using a minimal amount of resources while not breaking the Service Level Agreements of the application. Typically applications are scaled in a horizontal fashion. This means the same application is executed multiple times in parallel while the application load is distributed across all application instances. The platform now has to determine the optimal number of executed application instances so that the running costs of the application are minimized. There are multiple approaches to realize that which are discussed in this post.
Fig. 1 Load balancing on the Joyent Smart platform
On interesing approach is chosen by the designers of the Joyent Smart platform [1]. All HTTP requests received by the platform are handled by a load balancer which in turn distributes them over all available servers (see Fig. 1). It therefore considers the utilization of each such worker server. The worker server puts each received request into a queue (see Fig. 2). Multiple slave processes are executed within each worker server with each executing an infinite loop with the following execution steps:
- Fetch a request from the request queue and determine the target application of the request.
- Create a new JavaScript environment and load that application.
- Execute the fetched request with the created JavaScript environment and application.
- Go to step 1
Fig 2. Request queue in each Joeynt Smart server
The most important design decision of Smart is that a new JavaScript interpreter and application instance are created for each incoming request. The lifetime of each application instance is limited by the handling of a single request. Therefore the applications use a pull model to get the next request instead of a typical push model between the browser client and the application.
The request handling model of Joyent Smart allows a very efficient load balancing mechanism, as the length of the request queue directly depends on the load of the worker server. Furthermore the horizontal scaling out of applications is handled implicitly. If the number of request for an application increases the more slave processes are used to process the requests and therefore start additional instances of the application.
One huge disadvantage of the Smart architecture is that for each request a new application instance has to be started. This especially becomes a problem if one application contains time intensive initialization code.
Another approach is to execute a number of application instances in the PaaS infrastructure and adapt their number continuously to the application load. Platforms like Cloudcontrol [2] and Heroku [3] propose this model. Both of them are able to automatically adjust the number of application instances to the application load. Either this is done by the platform itself or the application. Again the incoming HTTP requests are received and distributed by a load balancing layer. It delegates the requests to worker servers with each of them running multiple applications in separate processes (not virtual machines). Typically the applications are executed within application servers which are built to handle multiple requests in parallel by using multiple threads.
This approach has multiple downsides. First of all one application server may use a huge number of threads and therefore consume all available system resources and slow down other applications on the same machine. Secondly its extremely difficult to determine the load of one application server. This is an important information while deciding if the application should be started in another instance and therefore be scaled horizontally.
To solve this problem Heroku and Cloudcontrol reduce the number of threads of the application servers. Typically one application server only uses a single or two request processing threads. If all request processing threads are occupied and a new request is received, the request is added to a queue. In this scenario determining the application load becomes pretty easy as it directly depends on the size of the queue.
Fig. 3 Each application server has its own request queue
Finally there are multiple ways to implement the request processing queue. There could be one global queue for each application (see Fig. 4) or multiple queues each for one (see Fig. 3) application server of the application. As described by the developers of the Ruby Unicorn server [4] which depends on the Mongrel Server the first approach is better suited in most situations as it is mostly independent to a highly varying request processing time.
Fig. 4 One single request queue for all application servers of one application
[1] Joyent Smart, http://github.com/joyent/smart-platform
[2] Cloudcontrol, http://cloudcontrol.de/
[3] Heorku, http://heroku.com/
[4] Ruby Unicorn Server, http://github.com/blog/517-unicorn
0 Comments:
Post a Comment