Admin-tips


Today I was doing some work on one of our database servers (each of them has 4 SAS disks in RAID10 on an Adaptec controller) and it required huge multi-thread I/O-bound read load. Basically it was a set of parallel full-scan reads from a 300Gb compressed innodb table (yes, we use innodb plugin). Looking at the iostat I saw pretty expected results: 90-100% disk utilization and lots of read operations per second. Then I decided to play around with linux I/O schedulers and try to increase disk subsystem throughput. Here are the results:

(more…)

How often do you think about the reasons why your favorite RDBMS sucks? :-) Last few months I was doing this quite often and yes, my favorite RDBMS is MySQL. The reason why I was thinking so because one of my recent tasks at Scribd was fixing scalability problems in documents browsing.

The problem with browsing was pretty simple to describe and as hard to fix - we have large data set which consists of a few tables with many fields with really bad selectivity (flag fields like is_deleted, is_private, etc; file_type, language_id , category_id and others). As the result of this situation it becomes really hard (if possible at all) to display documents lists like “most popular 1-10 pages PDF documents in Italian language from the category “Business” (of course, non-deleted, non-private, etc). If you’ll try to create appropriate indexes for each possible filters combination, you’ll end up having tens or hundreds of indexes and every INSERT query in your tables will take ages.

(more…)

Hello my dear readers.

Today I have a question for all of you. What platforms (32bit or 64 bit) do you use for your servers with more than 4Gb RAM? I’m asking because recently we‘ve hit few really weird bugs in Linux kernels 2.6.18 to 2.6.22 and all those bugs were PAE-related. Now I’d really love to move all machines to 64-bit, but I’m in doubt because we don’t know too much about Rails stack (ruby, mongrel, haproxy) on 64-bit platforms (all our DB boxes are 64-bit of course).

So, please drop me a line if you have any experience (negative or positive) with Rails platform on 64-bit machines. I’d really appreciate your help.

If you’ve ever worked in companies with 5-10+ servers and it was your responsibility to install new boxes, change some configuration files and install new software on many boxes you definitely know how painful this work is. Every time you need to change something on 3-5-100 boxes, you go there and make those changes. Most experienced of us used some weird scripts to perform some task on many boxes or used some stuff like dsh. Even with those tricks I’d never wish this work to anyone.

While I was working in Galt, I’ve asked our junior admin to check out puppet and try to use it on our servers. After a week of screaming he’s managed to install and configure it and said “Wow! This is what we definitely need!”. Since then he was using it and was happy :-). But, unfortunately (for me) I’ve never tried it and when I’ve changed my job and started doing the same old (but not good) things like dsh and for+ssh I’ve started thinking: “Stop! Try puppet - maybe it is what you need!”. And what can i say? Yes! This is what every admin needs! Today we’ve deployed our next web server (we do it pretty often) and it was done with puppet - pretty painlessly and quickly.

So, what is puppet? Let me quote their site:

Puppet lets you centrally manage every important aspect of your system using a cross-platform specification language that manages all the separate elements normally aggregated in different files, like users, cron jobs, and hosts, along with obviously discrete elements like packages, services, and files.

Puppet’s simple declarative specification language provides powerful classing abilities for drawing out the similarities between hosts while allowing them to be as specific as necessary, and it handles dependency and prerequisite relationships between objects clearly and explicitly. Puppet is written entirely in Ruby.

Basically, puppet is a tool which lets you describe whatever you’d like about your server(s) and deploy your configuration (configs, packages, files, directories, cron jobs, user accounts, ssh keys, etc) to any number of servers you need. You can describe some simple parts of your system (like “apache”, “nginx”, “rails”, “useful_gems”, “admin-ssh-keys”, etc) and then describe your servers using those terms (like machine X = apache+rails+useful_gems+admin_ssh_keys). When it is done, you’ll be able to deploy all this stuff with one command!

So, if you’re struggling with deploying some changes on many servers or, even, have only a few of them, I’d recommend to try puppet - it could save you a lots of time and nerves.

P. S. When you’ll try to use it - do not give up for a few days - it is hard to change your way of thinking and start writing correct and convenient scripts using puppet’s language.

This will be one of those posts I’d like to publish primarily to be able to coma back later and check it out instead of reading docs again :-)

So, we have a server with two (or more) network interfaces are we need to be able to use more than one interface in our VDS machines. How do we set it up?

(more…)

Today I was working on one small consulting task and our client asked for an upgrade from MySQL 5.0 to 5.1. It was pretty easy and task was successfully finished and reported to the customer… But few hours after my report I’ve got an email from customer with something like “WTF? Where is my 5.1?!”. I was shocked when I saw happily running 5.0 on their server w/o anything related to my 5.1 installation…

After some short investigation I’ve found out, that it was cpanel (dumb software for dumb system administrators) - it noticed, that installed mysql version (5.1) is not the same as it thought it should be (5.0), so without any warnings or notices it removed all 5.1 rpms and installed “brand new” 5.0.

Here I’d like to say GREAT THANKS to mysql team for such a great software which did not screwed up user’s data in such situation. But what idiots in cpanel development team decided, that is it appropriate and acceptable to perform such operations?! As an administrator and as a software developer I do not understand them - I just can’t understand such approach….

So, enough complaining - here is a piece of useful information for my readers: If you’re so unlucky to have cpanel installed on your server and you’d like to upgrade your mysql manually, then you can perform following operations:

# touch /etc/mysqlupdisable # chattr +i /etc/mysqlupdisable # service cpanel restart

After these small changes your cpanel will forget about mysql upgrades and you’ll be able to do what you want and not what some dumb developers decided you should do.

Recently I had one customer for consulting and aside from mysql optimization, etc they asked me for cacti installation/setup to monitor their pretty generic LAMP application. I’ve started setting up all this stuff and I’ve never thought it could be so painful… lots of different templates for the same tasks, all of them are incompatible with recent cacti releases, etc, etc… So, this post is generally a list of used templates with a fixes I’ve made to make them work on recent cacti release.

(more…)

This weekend I was doing some development for one of our projects and we needed to make screenshots of a web pages (see my next posts about this task). I’ve managed to develop small piece of code which uses GtkMozEmbed component (Mozilla Gecko-based renderer for web pages) to create screenshots of any page, but there was some problem… The problem was a following: GTK+ library can’t work w/o fully-functional X server running on your machine. Obviously I didn’t want to run such software (no monitor/keyboard/mouse, dumb graphics adapter on the server, etc, etc) so I’ve tried to find some solution… And in this tiny article I’ll describe the method I’ve managed to find.

(more…)

This week aside from tons of different tasks I was working on one of MMM users complaint regarding some issues with MMM on Solaris 10. I knew that this OS has not so user (admin) friendly environment (especially for people with strong GNU-related background), but had no other options and decided to install Solaris 10 in VMWare Fusion on my desktop.

Installation was a bit strange comparing to Debian/RHEL/Ubuntu and FreeBSD where I have a strong experience, but I’ve managed to install it successfully. The major problem after my first boot was a lack of knowledge about how things could be done in Solaris… Below I’ll describe what generic Linux admin could do with Solaris to make it easier to use and more friendly for GNU-addicted mind ;-)

Notice: If you’re Solaris administrator, please, don’t read further because I’m pretty sure that all these suggestions would look dumb for you (I knew some old solaris admins before and they hated GNU environments).

(more…)

Few days ago I worked on some customer’s server and there was a problem - their nfs server went down and we were forced to change some settings on their FC4 clients to prevent shares from dieing because of kernel bug. But when we’ve changed settings in /etc/fstab there was one more step before task was completed - we need to remount this share (I mean unmount/mount). But how to perform this operation if there are some processes in D (non-interruptible sleep) waiting for dead share and prevent it from unmounting? They wait because of hard option on the share and lack of intr option and any unmount request would produce a following results:

streaming01:~# umount /storages/2 umount: /storages/2: device is busy umount: /storages/2: device is busy

So, here is a list of steps you need to do to be able to remount your share.

First of all, you need to send KILL(9) signal to all you processes waiting for share. I’ve used ps axu and filtered all processes in D state. When all processes ‘killed’ (they can’t be killed actually), you’ll need to send the same signal to rpciod processes in your system. After this all your sleeping processes will die and you’ll be able to unmount your share.

That’s it - simple and really useful tip for people using NFS in their systems.

Next Page »