Databases


Today I was doing some work on one of our database servers (each of them has 4 SAS disks in RAID10 on an Adaptec controller) and it required huge multi-thread I/O-bound read load. Basically it was a set of parallel full-scan reads from a 300Gb compressed innodb table (yes, we use innodb plugin). Looking at the iostat I saw pretty expected results: 90-100% disk utilization and lots of read operations per second. Then I decided to play around with linux I/O schedulers and try to increase disk subsystem throughput. Here are the results:

(more…)

Question: Do you think you have what it takes to take a service from a few hundred thousand users to tens of millions of users in 1 year flat? If you do read on and perhaps become the next beloved scalability rockstar of our age.

We are looking for a data charmer. A mysql magician. A code hack. A funny man. A mad man. A passionate man. Or perhaps a woman who does all these things and more.

Here’s what you gotta do:

  • Pro-active and reactive performance analysis, monitoring and general database plumbing of all leaky issues.
  • Work with others on the team to help maintain/improve and support the infrastructure for a high traffic, high growth site
  • Optimize and tune the database day to day
  • Algorithmic bent. Develop algos to quicken search times, response times, find shortest paths between various connections on site.
  • Have solid low level networking/protocol/computer security skills
  • Log everything. Usage stats, search stats, user behaviour stats. Draw conclusions. Constantly refine and tinker.
  • Help with periodic large storage migrations
  • Work intimately with operations, development, and strategy team to ensure smooth deployments of new iterations, high availability of database services.
  • Understand capacity planning. Always thinking 10 steps ahead. (Whether it means looking at distributed systems services, cloud computing options, evaluating HA models used in other industries etc)
  • Have a pulse on the state of the web, social media, social networking, different scalability architectures, benefits/negatives of each.
  • Interest in high concurrency, distributed systems architectures.
  • General low level hacking/scripting/optimizations in perl/python.
  • Evaluate changing conditions in the archi
  • Think creatively. No dogmatists.

Ideal skillset:

  • BS in Comp Sci or equivalent
  • 5+ years experiene with Linux/Unix systems
  • 3+ years with MySQL in production environment
  • Knowledge and experience with partitioned architectures and a database sharding techniques
  • Capacity planning/high growth planning/emergency planning experience
  • Passion, bordering on paranoia, for hunting bottlenecks, and optimizing IO operations
  • Experience with MySQL replication
  • Deep experience with MySQL internals
  • Experience with performance analysis tools, storage engines, backup methodologies for MySQL
  • Great perl/shell scripting experience
  • Team player, self motivated, able to handle high stress situations while maintaining a calm disposition
  • Great communication skills, attention for detail, and an interest in the business side of the equation of systems/scale planning
  • Eat/sleep/breathe the web, startups, and the landscape of the social web
  • Insomniac

We’re ready to offer an aggressive salary with tremendous upside by way of stock options, commensurate with your experience, your drive and your results.

Apply directly to:

net ‘dot’ startup ‘at’ googles mail service dot com

by sending us a CV/resume, and optionally, a link to your blog or Linkedin profile.

Please help save Ivan, son of Andrii Nikitin (MySQL Support Engineer), who needs a bone marrow transplant. Andrii’s message is below:

“My family got bad news - doctors said allogenic bone marrow transplantation is the only chance for my son Ivan.

“8 months of heavy and expensive immune suppression brought some positive results so we hoped that recovering is just question of time.

“Ivan is very brave boy - not every human meets so much suffering during whole life, like Ivan already met in his 2,5 years. But long road is still in front of us to get full recover - we are ready to come it through.

“Ukrainian clinics have no technical possibility to do such complex operation, so we need 150-250K EUR for Israel or European or US clinic. The final decision will be made considering amount we able to find. Perhaps my family is able to get ~60% of that by selling the flat where parents leave and some other goods, but we still require external help.”

– Andrii Nikitin, MySQL Engineer

For donation: Donation can be made through PayPal (via MySQL/Sun website)

How often do you think about the reasons why your favorite RDBMS sucks? :-) Last few months I was doing this quite often and yes, my favorite RDBMS is MySQL. The reason why I was thinking so because one of my recent tasks at Scribd was fixing scalability problems in documents browsing.

The problem with browsing was pretty simple to describe and as hard to fix - we have large data set which consists of a few tables with many fields with really bad selectivity (flag fields like is_deleted, is_private, etc; file_type, language_id , category_id and others). As the result of this situation it becomes really hard (if possible at all) to display documents lists like “most popular 1-10 pages PDF documents in Italian language from the category “Business” (of course, non-deleted, non-private, etc). If you’ll try to create appropriate indexes for each possible filters combination, you’ll end up having tens or hundreds of indexes and every INSERT query in your tables will take ages.

(more…)

Since I wasn’t able to get to this year’s MySQL UC (employer change caused problems with US visa obtaining and I didn’t get visa in time) I’m really interested in all presentations people are posting after their sessions. I decided to collect them all in one place and would like to share with others - maybe someone will find it interesting to read what people have to say about many interesting aspects of MySQL usage.

So, I’ve created a folder in my Scribd.com account which you could use (and track using RSS readers) to find out what interesting presentations were published. You can use either my account or mysqluc08 folder there. One more possible option to track mysqluc presentations/documents is using our tagging (I tag all my docs with mysqluc08 tag).

Even though I didn’t go to MySQL conf this year (really sad about this), this week is gonna be most active in the community so I decided to do some community stuff too :-) Today I’ve released version 0.3 of our innodb recovery toolkit. Now it became much faster, stable and accurate. At this moment it is possible to recover almost any table from corrupted/deleted tablespace without so much effort as it was before. Here is a short changes list (since 0.1 announced here):

  • More MySQL data types added: DECIMAL (both old and new), DATE, TIME
  • CHAR data type handling improved in table definitions generator
  • Indexes filtering added to page_parser
  • 64-bit stat() support added to all tools
  • Linux has no isnumber() function so we define our own implementation (pretty simple)
  • Lots of fixes in create_defs.pl script - now it generates definitions which could recover your data in 80% cases w/o any changes.
  • Min/max record size calculation fixed in constraints-based parser.
  • Nullable fixed-size columns support is fixed.
  • Debug logging is much cleaner now.

As always, if you need any help with your recovery, we would love to help.

We were using memcache in our application for a long time and it helped a lot to reduce DB servers load on some huge queries. But there was a problem (sometimes called a “dog-pile effect”) - when some cached value was expired and we had a huge traffic, sometimes too many threads in our application were trying to calculate new value to cache it.

For example, if you have some simple but really bad query like

SELECT COUNT(*) FROM some_table WHERE some_flag = X

which could be really slow on a huge tables, and your cache expires, then ALL your clients calling a page with this counter will end up waiting for this counter to be updated. Sometimes there could be tens or even hundreds of such a queries running on your DB killing your server and breaking an entire application (number of application instances is constant, but more and more instances are locked waiting for a counter).

(more…)

How often do we think about our http sessions implementation? I mean, do you know, how your currently used sessions-related code will behave when sessions number in your database will grow up to millions (or, even, hundreds of millions) of records? This is one of the things we do not think about. But if you’ll think about it, you’ll notice, that 99% of your session-related operations are read-only and 99% of your sessions writes are not needed. Almost all your sessions table records have the same information: session_id and serialized empty session in the data field.

Looking at this sessions-related situation we have created really simple (and, at the same time, really useful for large Rails projects) plugin, which replaces ActiveRecord-based session store and makes sessions much more effective. Below you can find some information about implementation details and decisions we’ve made in this plugin, but if you just want to try it, then check out our project site.

(more…)

Last few days one of our customers (one of the largest Ruby on Rails sites on the Net) was struggling to solve some really strange problem - once upon a time they were getting an error from ActiveRecord on their site:

(ActiveRecord::StatementInvalid) "Mysql::Error: Lock wait timeout exceeded; try restarting transaction: UPDATE some_table.....

They have innodb_lock_wait_timeout set to 20 seconds. After a few hours of looking for strange transactions we were decided to create s script to dump SHOW INNODB STATUS and SHOW FULL PROCESSLIST commands output to a file every 10 seconds to catch one of those moments when this error occurred.

Today we’ve got next error and started digging in our logs…

(more…)

I’m returned from my 1-week vacation today and want to say - I’ve never been so productive as I was there ;-) Blue ocean, hot sun and white sand really helped me to finish my work on the first release of one really awesome project.

Today I’m proud to announce our first public release of the Data Recovery Toolkit for InnoDB - set of tools for checking InnoDB tablespaces and recovering data from damaged tablespaces or from dropped/truncated InnoDB tables.

(more…)

Next Page »