Today I’ve managed to finish initial version of our bounces-handler package we use for mailing-related stuff in Scribd.

Bounces-handler package is a simple set of scripts to automatically process email bounces and ISP‘s feedback loops emails, maintain your mailing blacklists and a Rails plugin to use those blacklists in your RoR applications.

This piece of software has been developed as a part of more global work on mailing quality improvement in Scribd.com, but it was one of the most critical steps after setting up reverse DNS records, DKIM and SPF.

The package itself consists of two parts:

  • Perl scripts to process incoming email:
    • bounces processor — could be assigned to process all your bounce emails
    • feedback loops messages processor — more specific for Scribd, but still - could be modified for your needs (will be released soon).
  • Rails plugin to work with mailing blacklists

For more information, please check our README file. If you have any questions, comments or suggestions, please leave them here as a comments and I’ll try to reply as soon as possible.

How often do you think about the reasons why your favorite RDBMS sucks? :-) Last few months I was doing this quite often and yes, my favorite RDBMS is MySQL. The reason why I was thinking so because one of my recent tasks at Scribd was fixing scalability problems in documents browsing.

The problem with browsing was pretty simple to describe and as hard to fix - we have large data set which consists of a few tables with many fields with really bad selectivity (flag fields like is_deleted, is_private, etc; file_type, language_id , category_id and others). As the result of this situation it becomes really hard (if possible at all) to display documents lists like “most popular 1-10 pages PDF documents in Italian language from the category “Business” (of course, non-deleted, non-private, etc). If you’ll try to create appropriate indexes for each possible filters combination, you’ll end up having tens or hundreds of indexes and every INSERT query in your tables will take ages.

(more…)

Hello my dear readers.

Today I have a question for all of you. What platforms (32bit or 64 bit) do you use for your servers with more than 4Gb RAM? I’m asking because recently we‘ve hit few really weird bugs in Linux kernels 2.6.18 to 2.6.22 and all those bugs were PAE-related. Now I’d really love to move all machines to 64-bit, but I’m in doubt because we don’t know too much about Rails stack (ruby, mongrel, haproxy) on 64-bit platforms (all our DB boxes are 64-bit of course).

So, please drop me a line if you have any experience (negative or positive) with Rails platform on 64-bit machines. I’d really appreciate your help.

We were using memcache in our application for a long time and it helped a lot to reduce DB servers load on some huge queries. But there was a problem (sometimes called a “dog-pile effect”) - when some cached value was expired and we had a huge traffic, sometimes too many threads in our application were trying to calculate new value to cache it.

For example, if you have some simple but really bad query like

SELECT COUNT(*) FROM some_table WHERE some_flag = X

which could be really slow on a huge tables, and your cache expires, then ALL your clients calling a page with this counter will end up waiting for this counter to be updated. Sometimes there could be tens or even hundreds of such a queries running on your DB killing your server and breaking an entire application (number of application instances is constant, but more and more instances are locked waiting for a counter).

(more…)

How often do we think about our http sessions implementation? I mean, do you know, how your currently used sessions-related code will behave when sessions number in your database will grow up to millions (or, even, hundreds of millions) of records? This is one of the things we do not think about. But if you’ll think about it, you’ll notice, that 99% of your session-related operations are read-only and 99% of your sessions writes are not needed. Almost all your sessions table records have the same information: session_id and serialized empty session in the data field.

Looking at this sessions-related situation we have created really simple (and, at the same time, really useful for large Rails projects) plugin, which replaces ActiveRecord-based session store and makes sessions much more effective. Below you can find some information about implementation details and decisions we’ve made in this plugin, but if you just want to try it, then check out our project site.

(more…)

Today I was developing one small merb application for one of our projects and needed to see ActiveRecord logging on console like I do in Rails. After a short research I’ve found out that merb_active_record plugin passes its MERB_LOGGER to AR by default so I decided to try to change merb log level and here they are - my pretty colored AR logs!

So, if you want to see ActiveRecord logs in your application in development mode, then you need to add one line to your conf/environments/development.rb file:

puts "Loaded DEVELOPMENT Environment..."
MERB_LOGGER.level = Merb::Logger::DEBUG

That’s it for now. Long live merb! :-)

This article is part of “Looking For Optimal Solution” series, devoted to testing various Ruby On Rails deployment schemes and doing some simple benchmarks on these schemes. General idea of testing is to find subset of most optimal RoR deployment schemes for different situations.

This small article is about Rails+Mongrel setup and its performance. List of other tested deployment schemes, description of testing methodology and, of course, all benchmark results you can find on “Ruby On Rails Benchmark Summary and Findings” page.

(more…)

Because of not fully correct testing methodology, benchmark results are not fully correct. So, I decided to redo all tests. New benchmark results you can get in “Looking For Optimal Solution” series Summary post.

This week we have started one new project with Ruby on Rails as primary framework. My first task was to prepare runtime environment for it on one of our development servers. When I have tried to research how people doing it, I noted that there is no information about how to deploy rails application with nginx as frontend and what is performance of such solution. Before blindly make any decisions about future platform I’ve decided to make some performance tests of the some most popular rails web servers/frontends. Results of these tests you can find here with configuration samples for all software that I have used.

(more…)