Month ago I was on a vacation and as usual even though our hotel provided us with an internet connection on a pretty decent speeds, I wasn’t able to work there because they’ve banned all tcp ports but some major ones (like 80, 21, etc) and I needed to be able to use ssh, mysql, IMs and other non-web software.

After a short research I’ve found a pretty simple to set up and easy to use approach to such a connection problems I’d like to describe here.

(more…)

Many people know me as a nginx web server evangelist. But as (IMHO) any professional I think that it is really rewarding to know as much as possible about all the tools available on the market so every time you need to make a decision on some technical issue, you’d consider all pros and cons based on my own knowledge.

This is why when I received an email from Packt company asking if I’d like to read and review their book on Lighttpd I decided to give it a shot (I usually do not review any books because I do not always have enough time to read a book thoroughly to be able to write a review). So, here are my impressions from this book.

(more…)

Few months ago I’ve switched one of our internal projects from doing synchronous database saves of analytics data to an asynchronous processing using starling + a pool of workers. This was the day when I really understood the power of specialized queue servers. I was using database (mostly, MySQL) for this kind of tasks for years and sometimes (especially under a highly concurrent load) it worked not so fast… Few times I worked with some queue servers, but those were either some small tasks or I didn’t have a time to really get the idea, that specialized queue servers were created just to do these tasks quickly and efficiently.

All this time (few months now) I was using starling noticed really bad thing in how it works: if workers die (really die, or lock on something for a long time, or just start lagging) and queue start growing, the thing could kill your server and you won’t be able to do something about it - it just eats all your memory and this is it. Since then I’ve started looking for a better solution for our queuing, the technology was too cool to give up. I’ve tried 5 or 6 different popular solutions and all of them sucked… They ALL had the same problem - if your queue grows, this is your problem and not queue broker’s :-/ The last solution I’ve tested was ActiveMQ and either I wasn’t able to push it to its limits or it is really so cool, but looks like it does not have this memory problem. So, we’ve started using it recently.

In this small post I’d like to describe a few things that took me pretty long to figure out in ruby Stomp client: how to make queues persistent (really!) and how to process elements one by one with clients’ acknowledgments.
(more…)

Sorry for a short outage today - we were moving to a new server we had some problems because of software incompatibilities on the new box. Now all sites on this box should behave as usual :-)

Since the day one when I joined Scribd, I was thinking about the fact that 90+% of our traffic is going to the document view pages, which is a single action in our documents controller. I was wondering how could we improve this action responsiveness and make our users happier.

Few times I was creating a git branches and hacking this action trying to implement some sort of page-level caching to make things faster. But all the time results weren’t as good as I’d like them to be. So, branches were sitting there and waiting for a better idea.
(more…)

Today I’ve managed to finish initial version of our bounces-handler package we use for mailing-related stuff in Scribd.

Bounces-handler package is a simple set of scripts to automatically process email bounces and ISP‘s feedback loops emails, maintain your mailing blacklists and a Rails plugin to use those blacklists in your RoR applications.

This piece of software has been developed as a part of more global work on mailing quality improvement in Scribd.com, but it was one of the most critical steps after setting up reverse DNS records, DKIM and SPF.

The package itself consists of two parts:

  • Perl scripts to process incoming email:
    • bounces processor — could be assigned to process all your bounce emails
    • feedback loops messages processor — more specific for Scribd, but still - could be modified for your needs (will be released soon).
  • Rails plugin to work with mailing blacklists

For more information, please check our README file. If you have any questions, comments or suggestions, please leave them here as a comments and I’ll try to reply as soon as possible.

Today I was doing some work on one of our database servers (each of them has 4 SAS disks in RAID10 on an Adaptec controller) and it required huge multi-thread I/O-bound read load. Basically it was a set of parallel full-scan reads from a 300Gb compressed innodb table (yes, we use innodb plugin). Looking at the iostat I saw pretty expected results: 90-100% disk utilization and lots of read operations per second. Then I decided to play around with linux I/O schedulers and try to increase disk subsystem throughput. Here are the results:

(more…)

Question: Do you think you have what it takes to take a service from a few hundred thousand users to tens of millions of users in 1 year flat? If you do read on and perhaps become the next beloved scalability rockstar of our age.

We are looking for a data charmer. A mysql magician. A code hack. A funny man. A mad man. A passionate man. Or perhaps a woman who does all these things and more.

Here’s what you gotta do:

  • Pro-active and reactive performance analysis, monitoring and general database plumbing of all leaky issues.
  • Work with others on the team to help maintain/improve and support the infrastructure for a high traffic, high growth site
  • Optimize and tune the database day to day
  • Algorithmic bent. Develop algos to quicken search times, response times, find shortest paths between various connections on site.
  • Have solid low level networking/protocol/computer security skills
  • Log everything. Usage stats, search stats, user behaviour stats. Draw conclusions. Constantly refine and tinker.
  • Help with periodic large storage migrations
  • Work intimately with operations, development, and strategy team to ensure smooth deployments of new iterations, high availability of database services.
  • Understand capacity planning. Always thinking 10 steps ahead. (Whether it means looking at distributed systems services, cloud computing options, evaluating HA models used in other industries etc)
  • Have a pulse on the state of the web, social media, social networking, different scalability architectures, benefits/negatives of each.
  • Interest in high concurrency, distributed systems architectures.
  • General low level hacking/scripting/optimizations in perl/python.
  • Evaluate changing conditions in the archi
  • Think creatively. No dogmatists.

Ideal skillset:

  • BS in Comp Sci or equivalent
  • 5+ years experiene with Linux/Unix systems
  • 3+ years with MySQL in production environment
  • Knowledge and experience with partitioned architectures and a database sharding techniques
  • Capacity planning/high growth planning/emergency planning experience
  • Passion, bordering on paranoia, for hunting bottlenecks, and optimizing IO operations
  • Experience with MySQL replication
  • Deep experience with MySQL internals
  • Experience with performance analysis tools, storage engines, backup methodologies for MySQL
  • Great perl/shell scripting experience
  • Team player, self motivated, able to handle high stress situations while maintaining a calm disposition
  • Great communication skills, attention for detail, and an interest in the business side of the equation of systems/scale planning
  • Eat/sleep/breathe the web, startups, and the landscape of the social web
  • Insomniac

We’re ready to offer an aggressive salary with tremendous upside by way of stock options, commensurate with your experience, your drive and your results.

Apply directly to:

net ‘dot’ startup ‘at’ googles mail service dot com

by sending us a CV/resume, and optionally, a link to your blog or Linkedin profile.

Please help save Ivan, son of Andrii Nikitin (MySQL Support Engineer), who needs a bone marrow transplant. Andrii’s message is below:

“My family got bad news - doctors said allogenic bone marrow transplantation is the only chance for my son Ivan.

“8 months of heavy and expensive immune suppression brought some positive results so we hoped that recovering is just question of time.

“Ivan is very brave boy - not every human meets so much suffering during whole life, like Ivan already met in his 2,5 years. But long road is still in front of us to get full recover - we are ready to come it through.

“Ukrainian clinics have no technical possibility to do such complex operation, so we need 150-250K EUR for Israel or European or US clinic. The final decision will be made considering amount we able to find. Perhaps my family is able to get ~60% of that by selling the flat where parents leave and some other goods, but we still require external help.”

– Andrii Nikitin, MySQL Engineer

For donation: Donation can be made through PayPal (via MySQL/Sun website)

Because of some weird bug my RSS feeds were broken ever since I’ve upgraded to WP 2.5. Today they were fixed and I hope they’ll have some new posts there soon :-) Stay tuned.

Next Page »