How to track high load on VPS or your server Εκτύπωση

  • 437

One common issue with a dedicated server (or a Virtual Private Server for that matter), is that it can be quite difficult to track down the cause of a high server load.  Most people just write it off as inevitable, but something should be done about it.  If you have a busy site, you'll want to tweak your server to handle high loads as best as possible.  But how can you find out what is causing the high load?  It's simple (kind of)...  These steps are for a *nix based server (Linux, Unix, FreeBSD (I think)).

Find out what's causing the high load

        In *nix, there's a really handy command called TOP.  What TOP does is display process information about the currently running programs.  With some of it's options, and a little output redirection, we can get a glimpse into what's causing our high load.  Here's the command...

 

top -b -i -n 20 >> ./top_procs

        What that does is tell TOP to run in "batch" mode (not look for any user input), show only running processes, loop 20 times, and append the output to the file /top_procs.  Run that command when you are experiencing a high server load.  Then you can view the contents of that file to tell you some information.  To view the file, you can either open it in your favorite editor (vim?), or simply use "cat ./top_procs | less".  Now, that will give you a bunch of output like this:

 

top - 11:06:36 up 69 days,  2:53,  0 users,  load average: 0.02, 0.05, 0.07

Tasks: 137 total,   1 running, 136 sleeping,   0 stopped,   0 zombie

Cpu(s):  2.3% us,  0.5% sy,  0.0% ni, 97.1% id,  0.2% wa,  0.0% hi,  0.0% si

Mem:  12278340k total, 12230332k used,    48008k free,   363352k buffers

Swap: 16386292k total,   157092k used, 16229200k free,  2699912k cached   

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 

8066 root      15   0  1888 1032  776 R  0.1  0.0   0:00.02 top 

 

Tasks: 137 total,   1 running, 136 sleeping,   0 stopped,   0 zombie

Cpu(s):  2.8% us,  1.5% sy,  0.0% ni, 94.6% id,  1.1% wa,  0.0% hi,  0.0% si

Mem:  12278340k total, 12230740k used,    47600k free,   361956k buffers

Swap: 16386292k total,   157092k used, 16229200k free,  2696368k cached  

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 

8066 root      15   0  1880  944  704 R  2.0  0.0   0:00.01 top 

 

top - 11:06:46 up 69 days,  2:53,  0 users,  load average: 0.09, 0.06, 0.07

Tasks: 137 total,   3 running, 134 sleeping,   0 stopped,   0 zombie

Cpu(s):  2.2% us,  0.3% sy,  0.0% ni, 97.2% id,  0.3% wa,  0.0% hi,  0.0% si

Mem:  12278340k total, 12173908k used,   104432k free,   363416k buffers

Swap: 16386292k total,   157092k used, 16229200k free,  2696988k cached  

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 

8066 root      16   0  1888 1032  776 R  0.1  0.0   0:00.03 top 

6103 mailman   16   0 10536 7204 2744 R  0.0  0.1   0:33.08 python2.4 

6108 mailman   16   0 10172 6904 2648 R  0.0  0.1   0:37.92 python2.4

        What does all of that mean? It's really not as bad as it seems.  If you break it down, it's really just 3 repetitions of almost the same information.  Here's what it means, line by line.

 

   1. Line 1 - General server information - Current time, uptime (since last restart of server), number of users logged on (other than yourself), and the load average for the last 1, 5, and 15 minutes

   2. Line 2 - Tasks - Number of processes, number of actively running processes, sleeping process, stopped process, and zombie processes

   3. Line 3 - CPU usage info (User, System, Nice, Idle, Waiting, Hardware Interrupts, Software Interrupts).  Just worry about Idle, user, system, and waiting.

   4. Line 4 - Memory usage

   5. Line 5 - Swap usage (used should be almost 0 if not 0)

   6. Table header for process list (Process ID, User, Priority, Nice, Virtual Memory, Resident Size, Shared Size, , State, CPU, Memory, CPU Time used, Command)

   7. The processes themselves...

        Now, what to look for is a process that has a high CPU % that appears in multiple repetitions, as well as has a high CPU time.  Be aware that you'll more than likely have a few of them.  Write down the highest ones (most likely MySQL, Apache, etc).  Now that you know what you need to tweak, lets look at how to.

Tweaking MySQL

        If one of the top processes is MySQL, you may need to tweak MySQL for the load.  There are a whole bunch of articles out there on tweaking MySQL, so I'm not going to go into too much detail here.  Things that you will want to do is adjust the Key_buffer_size, query_cache_size, thread_cache, and table_cache to larger values (be careful not to go too big, they can easily eat up all available ram). 

Tweaking Apache

        Apache may appear in the list as apache or httpd.  Now, I'm not going to get into tweaking Apache for two reasons.  First, I don't use Apache, so I'm not familiar with tweaking it, and second, there is a whole host of articles on the internet devoted to tweaking Apache.

What if it's something else?

        Now this is where things get interesting.  Are you noticing something else using your CPU time?  There are a few common culprits that like to cause high load.  The two biggest ones are SpamAssassin and Sendmail.  If you need to have SpamAssassin running, you should set it to discard all messages marked as spam to /dev/null (blackhole).  If you don't need it, disable it... It's a great program, but it uses a lot of CPU time to do what it does.  Disable all "Catch All" e-mail accounts (as they add time to the spool).

        So you've tweaked the server.  It's running faster, and more efficient.  For now.  As time goes on, you may need to tweak some more (as your load changes, or resources change, etc).  That's what administrating a server is all about.  Your job is never done.  However, you really should install some kind of server monitoring tool such as SICM or MRTG, and let them watch your server load.  That way you can identify patterns in load, and determine if the problem is with too many users, or something else.  I also suggest moving away from Apache, and use Lighttpd, as it uses less memory, less CPU time, and is significantly faster.


Ήταν χρήσιμη αυτήν την απάντηση;

« Πίσω