Mon 28 Aug 2006
Looking For Optimal Solution: Benchmark Results Summary and Findings
Posted by Scoundrel under Development ·
Aftrer my previous post about high-performance RoR deployment methods, I’ve got lots of messages/emails/IM conversations about some errors in previous benchmarks and there were lots of suggestions about extending of tested software set and modifying testing methodology. So, that is why I decided to perform deep and wide performance testing for all deployment schemes I can find. In this article you can find description of testing methodology and, of course, benchmarks results. If you want to know details of specific software setup, take a look at articles in category “Ruby On Rails” to find all articles from “Looking for optimal Solution” series (I’ll post all of them in next few days).
Truly speaking, when I’ve started all these testing stuff, my main aim was to find most optimal solution for our project, but later, thanks to Dmytro Shteflyuk, I decided to perform testing with optimized software settings, to find maximum theoretical speed, that I can expect from RoR on my servers. So, you can interpret results of testing from two different points of view: as simple performance comparison of different solutions, or as highest performance marks that specific solutions can provide you (of course, you can get more by creating some more complicated schemes like “nginx/lighttpd for static content + rails for dymanic”, but this benchmark was done only for dynamic content without any caching).
First of all I want to describe hardware/software plaftorm where all performance benchmarks were done. Our development server has following configuration:
- CPU: 4 x XEON CPUs
- Memory: 4 Gb of RAM
- OS: Debian GNU/Linux Testing with recent 2.6 kernel.
- Ruby: ruby 1.8.4 (2005-12-24) [i486-linux] from Debian Testing repository.
- Rails: Rails 1.1.6 installed by gem.
All tests were performed on simple RoR appliaction with simple single-action controller:
class TestController < ApplicationController
def hw
@hello = "Hello, world!"
@time = Time.now()
end
end
and simple view:
<h1>Test#hw</h1> <p>Hello: <%= @hello %></p> <p>Time: <%= @time %></p>
Rails Framwork was started with default settings in production mode. Tests were performed by ab (Apache Benchmark) utility with following params:
$ ab -c 100 -n 10000 http://SERVERIP:PORT/test/hw
where PORT is specific port number that has was chosen for every test.
I decided to not use any DB-related code because I want to get top performance marks for Ruby On Rails engine, not for some mysql/postgres/oracle/etc software. Simple code, simple tests, simple and understandable results.
Notice: Before each tests all files from tmp/sessions were deleted because with lots of test requests to server I’ve got very poor results for some software because of file-system layer lags. So, if you’ll decide to check my tests on your own hardware/software, clean sessions dir after every test run.
While tests were running I’ve monitored server with top/iostat/vmstat to understand test results better… and that is why all fastcgi/proxy/lsapi tests were done with 4 backend processes - server has 4 CPUs and 4 processes performed better than 2/5/8/10.
So, let me describe tested configurations shortly before I will show you benchmark results (links will be pointed to detailed description of tests later).
- WEBrick/1.3.1 - I’ve tested it only to get some base performance value to compare with all other test results.
- mongrel (single process) - This test was performed to get information about what performance gain we can get if will user single mongrel server without any balancing software.
- lighttpd (4 mongrels) - Test of lighttpd load-balancing between several tcp backend servers.
- lighttpd (4 fastcgi processes) - Test of lighttpd load-balancing between several fastcgi backend servers.
- nginx (4 mongrels) - Test of nginx load-balancing between several tcp backend servers.
- nginx (4 fastcgi processes) - Test of nginx load-balancing between several fastcgi backend servers.
- pen (4 mongrels) - Test of pen load-balancing between several tcp backend servers.
- pound (4 mongrels) - Test of pound load-balancing between several tcp backend servers.
- haproxy (4 mongrels) - Test of haproxy load-balancing between several tcp backend servers.
- apache 2.0 (4 fastcgi processes) - Test of apache 2.0 load-balancing between several fastcgi backend servers.
- LiteSpeed (4 lsapi instances) - This really exotic software I tested because someone asked me about this test in comments for previous benchmark. LiteSpeed web-server have some SAPI module for ruby that, AFAIU, works like FastCGI, but with some improvements. But unfortunately, optimized version of this server costs some money and not all can use it for their projects (it has some free version, but if you want server, optimized for performance, you should pay lots of money).
Now, when you know about all of my tests, I can show you information about tests results. First of all, if you want to see results in table view, here is screenshot from my Excel spreadsheet:
![]()
If you like to analyze results in graphical presentation, you can take a look at the following diagram, that shows QPS (queries per second) results for all tests:
![]()
Speaking about these results, I can say, that all of them were predictable - TCP is really slower that unix sockets, that were used in fastcgi tests. So all tests with tcp-based backend commmunications have less QPS, than unix socket based. But you should understand, that tcp-based frontend-backend communication provides you with greay scalability mechanism and you can move backend instances between servers without any problems. So, if you need absolutely best performance on one server, I would recommend you to use nginx with fastcgi backend processes. But if you need better scalability, you can use nginx with mongrel-servers as backend processes.
- Looking For Optimal Solution: Ruby On Rails and Mongrel
- New MySQL Performance Forums
- High-Performance Ruby On Rails Setups Test: mongrel vs lighttpd vs nginx
- HAProxy - The Reliable, High Performance TCP/HTTP Load Balancer
- Dog-pile Effect and How to Avoid it with Ruby on Rails memcache-client Patch
2006-08-28 at 8.34 am
[…] Because of not fully correct testing methodology, benchmark results are not fully correct. So, I decided to redo all tests. New benchmark results you can get in “Looking For Optimal Solution” series Summary post. […]
2006-08-29 at 5.20 am
That’s incredible, Apache performs almost as well as the best solution. We have lots of newer solutions like nginx and mongrel and still Apache is better?
2006-08-29 at 5.45 am
2Bart: Yes. Apache performs well, but it requires really more memory than all other fastcgi solutions. And that is why you should use it only if you need some additional apache features… but I don’t know what these features may be…
2006-08-29 at 6.40 am
Do you have figures on the memory usage compared to the other solutions?
2006-08-29 at 7.07 am
В русском варианте ошибки: послещенный, тетсирования, каждам.
> Но если вам необходима масштабируемость, вы можете использовать nginx с mongrel-серверами в качестве backend-процессов.
Тут хотелось бы заметить - мне кажется, что fastcgi через tcp будет всё равно быстрее, чем mongrel.
nginx жжот.
2006-08-29 at 7.24 am
2Bart: I have no concrete results, but as you maybe know, Apache forks additional processes for every new client… so comparing to FSM-based nginx and lighttpd this approach is not so memory efficient. This is known flaw of apache and that is why many sites using nginx/lighttpd+php even if it can’t give them any performance gain.
2006-08-29 at 9.19 am
На сколько я вижу, apache идет ноздря в ноздрю, ну разве, что кушал оперативы больше… я правда надеялся на больший разрыв
2006-08-29 at 9.21 am
Hi,
thanks for this tests and your time ;-).
Is it possibel to get the configfiles, as tar or every single file
Do you have intentions to make some tests with two maschines, first the webserver second the *appserver*?!
2006-08-29 at 12.06 pm
2al: All config files will be posted in additional posts within next few days. Each post will describe one specific configuration with configs and test results.
2006-08-29 at 12.18 pm
Thanks :-).
2006-08-30 at 2.52 pm
[…] This article is part of “Looking For Optiomal Solution” series, devoted to testing various Ruby On Rails deployment schemes and doing some simple benchmarks on these schemes. General idea of testing is to find subset of most optimal RoR deployment schemes for different situations. […]
2006-08-30 at 9.20 pm
I know benchmarks are never the same across hardware and setups, but I performed similar benchmarks on similar hardware and LiteSpeed came out with a 2×1 advantage over the Apache2 fcgi setup. My Apache spec’d out similarly at ~275 and Litespeed was over 600qps. LiteSpeed does take some configuration for a performance setup so that may very well be the difference. Nice job though.
2006-08-30 at 11.44 pm
2Bob: Have some questions:
1) What HW did you use to get 600 qps from rails?
2) What rails code you have tested?
3) What performance settings you mean in LSWS?
2006-08-31 at 5.28 am
As the developer of LSWS and Ruby LSAPI, I am pretty surprised that nginx+FCGI can beat LSWS+LSAPI in your test.
So, I spent some time on benchmarking LSWS and nginx tonight. I will post the details on our blog tomorrow.
My result is: nginx + 4 FCGI does 270-330 rps, LSWS Enterprise + LSAPI does 370 rps consistently.
Which version of LSWS was tested? Standard or Enterprise?
Maybe LSWS has not been properly tuned in your test. A few check points: the lastest 2.2 release is used, .htaccess is off, epoll instead of poll is used, “Run On Start Up” is “Yes” and “Priority” set to “5″ in Rails Default Settings.
2006-08-31 at 5.55 am
2George Wang: I’ve done everything like you, except htaccess and priority settings…
2006-08-31 at 5.59 am
2George Wang: Can you contact me (I would prefer some IM) to speak about my test parameters. Maybe I can redo some tests to get another results from lsws. Or maybe you have used some unoptimal nginx settings… Will look forward for your answer (my TZ is GMT-5, and I will be online from 9-10 AM).
2006-08-31 at 7.41 pm
[…] After reading Scoundrel’s blog “Looking for Optimal Solution: Benchmark Results Summary and Findings”, I am pretty surprised that LSWS + LSAPI falls behind. The best performer in his tests is nginx + FCGI. “All right, let me spend some time benchmarking them.” I said to myself. […]
2006-09-04 at 6.23 pm
А почему Apache 2.2 не рассмотрен?
2006-09-04 at 6.28 pm
Даже не знаю, что ответить. Когда хотел потестить его mod_proxy_balancer, тот тупо не скомпилился на gcc 4.1… И я решил, что не буду тестить то, что никогда бы не поставил в продакшн по причине сырости (а она очевидна).
2006-09-04 at 6.33 pm
Понял. Ну я на apache2.2 не перешел еще по той простой причине, что его нет в репозитории дебиана, но дома на Макоси (не продакшн, конечно) из портов он собрался моментально и так же моментально настроился.
Я почему хотел бы в этих тестах видеть apache2.2 — что бы в тех же условиях сравнить. Я серьезно думаю насчет использования именно такой схемы в production.
С fastcgi есть огромные проблемы с правами. Заебывает, когда в твоем каталоге есть что-то, что принадлежит веб-серверу, а если оно ему не будет принадлежать, то просто увидишь Application Error.
Т.е. для shared хостинга мне сейчас mongrel кажется более удобным вариантом, нежели fastcgi.
2006-09-06 at 11.25 am
Покажите, пожалуйста, конфиг nginx.
Конкретно интересует удалось ли его настроить так чтобы корректно работало кеширование в рельсах?
2006-09-06 at 7.57 pm
Any chance you could run a benchmark of 4 mongrels behind apache with mod_proxy_balancer?
2006-11-13 at 4.22 am
Apache doesn’t fork a new instance for every connection.
Rather it “preforks” some instances, and all those wait for connections. When there is an incoming connection, a random instance processes it. Since selecting this process is an O(1) operation, it actually scales better than single-process servers which have to allocate data structures for processing the connection when receiving it, this depends on the data structures used but it’s usually O(log n) [In the best case?!]. (Because the preforked process’ stack is already allocated)
All the instances share the same memory segment for code and data, only when modifying data is it copied and then written to. So every instance only uses minimal memory, and only while processing connections. This is the same with single-process servers, processing connections takes at least as much memory for them too (or more: counting the poll arrays they can very well take more).
Since “zero” copy file sending is more efficient than copying from kernel to user-space and back again, efficient servers will want to use sendfile or equivalent. But sendfile is synchronous, so the kernel has no chance to reorder disk accesses when a single process is sendfileing from scattered places from all over the disk(s). When multiple processes sendfile however, the kernel can efficiently reorder disk access, because the threads block and yield. Therefore sending the file will have more latency, but higher throughput, and therefore overall higher performance. The disk has an upper limit to throughput, for example with a modern raid array this can be 300MB/s for sequential access, and 20MB/s when seeking randomly. Depending on the size of the site served (sites needing to pay attention to httpd performance can to be huge), the disk and in-memory cache can saturate, and single-process servers will hit the 20MB/s throughput limitation because of seeking, while multi-process servers will only hit the sequential throughput limit which is much higher (say 300MB/s).
Therefore, “preforking” servers such as Apache, xs-httpd or AOLserver are theoretically, in every respect i can think of, higher performance than single-processing servers such as lighttpd, nginx, thttpd or Zeus.
Thanks for reading this rather long comment. I know that just about everybody else out there disagrees with it, but the truth is that they don’t understand the processes involved. So Apache in the same league as lighttpd and nginx is not surprising to me, rather an indication that this is one of the few benchmarks which are at least almost done correctly.
2006-11-13 at 4.28 am
2h: Thanks for such detailed comment.
2006-11-22 at 4.32 pm
[…] Why go to that much trouble when LiteSpeed “Just Works”™? For a basic “Hello World” application, performance compares very well to the cobbled together systems, and installing it takes less than five minutes. To summarize the screencast, here are the ten essential steps for implementing a name-based virtual host running on an Ubuntu server: […]
2006-11-29 at 4.59 pm
[…] Why go to that much trouble when LiteSpeed “Just Works”™? For a basic “Hello World” application, performance compares very well to the cobbled together systems, and installing it takes less than five minutes. To summarize the screencast, here are the ten essential steps for implementing a name-based virtual host running on an Ubuntu server (this will work on any Unix OS): […]
2006-12-01 at 3.42 am
[…] Original post by unknown […]
2007-01-16 at 6.33 am
[…] The problem was articulated on the mongrel_users mailing list in an email entitled Why Rails + mongrel_cluster + load balancing doesn’t work for us and the beginning of a solution. The problem is that many load balancers choose poor algorithms to balance requests which can become a problem if the requests vary in the amount of time they take to complete. A round-robin algorithm, for example, could place a show request on one process, and then place another one the next trip around despite that process still being busy while others are available. The solution is that only 1 request at a time can be sent to each backend process for ruby on rails. If all processes are busy then the request should be queued for the next available backend process. Now reviewing the apache 2.2 docs about the ProxyPass Directive it seems like apache has some options to address this. It would be worth testing the various options before going to another solution such as writing another balancer for apache method as suggested here and here or adding another load balancer such as haproxy, nginx, pen, pound, lighttpd, etc. I also think that this test case would be en excellent way to benchmark various ruby on rails setups absent of a working app as compared to the case used in this comparison: Looking For Optimal Solution: Benchmark Results Summary and Findings. If nothing else this highlights the need to be very aware of the configuration of the web setup to run ruby on rails applications. […]
2007-02-04 at 6.12 pm
[…] Еще раз отмечу, что FastCGI-сервер определяется вне виртуального хоста и для него указывается абсолютный путь, а для rewrite-правил используется относительный путь к fcgi. FastCGI может "общаться" с фронтенд-севером либо через unix socket, либо через tcp. В первом случае чуть быстрее, во втором - более гибко (в общем случае. FastCGI может быть на другой машине), подробности см. у Ковырина. Конфиги особо не отличаются, но для единообразия я указываю настройки для unix socket. […]
2007-02-04 at 6.13 pm
[…] Еще раз отмечу, что FastCGI-сервер определяется вне виртуального хоста и для него указывается абсолютный путь, а для rewrite-правил используется относительный путь к fcgi. FastCGI может "общаться" с фронтенд-севером либо через unix socket, либо через tcp. В первом случае чуть быстрее, во втором - более гибко (в общем случае. FastCGI может быть на другой машине), подробности см. у Ковырина. Конфиги особо не отличаются, но для единообразия я указываю настройки для unix socket. […]
2007-03-09 at 11.48 pm
[…] For a performance comparison, Lighttpd provided benchmarks of their own (the page is a little messy, you’ve been warned) which claim superiority. Not everyone agrees. And some more benchmarks for good measure (taken from a blog on Ruby on Rails performance). Even more benchmarks from a competing server technology, Litespeed. […]
2007-07-13 at 6.09 pm
[…] 《High-Performance Ruby On Rails Setups Test: mongrel vs lighttpd vs nginx 》 […]
2007-08-10 at 1.42 pm
[…] http://blog.kovyrin.net/2006/08/28/ruby-performance-results/ […]
2007-08-19 at 1.26 pm
[…] 網路上可以看見許多Cluster的文章,但多半都是介紹單方面的功能,或是許多實做的細節牽扯在一起。這篇文章用商業服務的整體規劃來看Cluster及Rails之間的種種問題與解決方法。文章後面附帶了一節是要解釋如何解讀apache menchmark的數據。而各位如果想要瞭解的是效能,數據上的差異,已經有一篇相當棒的可以參考: http://blog.kovyrin.net/2006/08/28/ruby-performance-results/ […]
2007-08-21 at 4.04 am
[…] After reading Scoundrel’s blog “Looking for Optimal Solution: Benchmark Results Summary and Findings&#…, […]
2007-08-30 at 8.22 am
[…] Looking For Optimal Solution: Benchmark Results Summary and Findings :: Homo-Adminus Blog by Alexey … (tags: rails) […]
2007-10-25 at 11.56 pm
Well it’s ~14months later, the software has been developed some more.
How does everything stack up now?
2007-11-05 at 11.48 pm
It would be interesting to compare these to perlbal - it’s also epoll-based and non-blocking.
2008-01-03 at 9.10 pm
А Вы не пробывали запустить mongrel на юникс сокете? в мануалах к nginx разглядел, что прокси можно кидать и на сокет, если mongrel сможет слушать сокет - мы получим не плохой прирост производительности, не так ли? ищу материалы, свяжусь с вами, если что получится
2008-03-05 at 9.48 pm
Benchmarking HTTP Performance…
Deployment of Rails application is a subject that tends to raise some hot discussions, leading to many misunderstandings. That’s why I decided to try different deployment strategies and check for myself how they perform.
To make any reasonable co…