Monday, April 28, 2014

Custom Splunk monitor parameters in Puppet

I'm using the example42 Splunk Puppet module from Puppet Forge, which is great for installing Splunk but doesn't list on its Forge page how to create a custom monitor:

# https://github.com/example42/puppet-splunk/blob/master/manifests/input/monitor.pp
splunk::input::monitor { "messages":
  path => "/var/log/messages",
  index => "main",
  sourcetype => "messages",
  ignoreOlderThan => "2d",
  blacklist => "\.(txt|gz|bz2)",
}

I had to hunt through the github repo for this module to figure out the above code.  You add this as a separate class from the main install class.  Hopefully this quick post will help someone in the future!

Puppet Exported Resources

As part of my job I'm installing Bacula via Puppet. The server component of Bacula requires a definition for each client that it pulls data from. One way to manage this is to create a directory in the bacula configuration directory and parse every *.conf file within:

# include everything in /etc/bacula/clients
# this is where backup clients (fd or file daemons in bacula terminology) put their configuration
@|"find /etc/bacula/clients -name '*.conf' -type f -exec echo @{} \;"

To manage this in Puppet you would either have to manually use a template file for each client computer, which could work for smaller deployments. If you have more than a couple of clients you really need something more automated and elegant.

Enter Puppet's "Exported Resources".  These nifty bits of code are a way of saying "for each computer that has puppet class X installed, do something".  You need PuppetDB for this to work, as Puppet queries the stored data about each computer when the "collector" class is run.

First you have to define a resource to export.  This doesn't get instantiated when you run the "client" module, so it won't create anything on the client.  Here's a quick example:

  # this resource can then be exported using the @@ directive
  # and collected by the bacula_server class to actually create the
  # required files on the server side
  define bacula_client_config (

    $client_hostname,
    $dirs_to_backup,
    $backup_schedule,

  ) {

    file { "/etc/bacula/clients/${client_hostname}.conf":
      ensure  => present,
      owner   => "bacula",
      group   => "bacula",
      mode    => 0755,
      content => template("example/bacula-client.conf.erb"),
    }

  }

The parameters in the definition are simply data to be filled in by the template.

Later in the client class we export this newly defined resource like so:

  # export the above definition into the puppetdb for later collection by the bacula_server class
  @@bacula_client_config { "${::fqdn}":
    client_hostname => "${::fqdn}",
    dirs_to_backup  => $backup_directory_list,
    backup_schedule => $backup_schedule_name,
  }

Again, this is pretty straightforward stuff once you know what's happening.  The "@@" means "export this defined type into PuppetDB for later reuse."

Now when we run that class on our puppet client computers, PuppetDB stores the defined type.  Obviously this means that you have to have installed that class on every computer you want to be included in the server's configuration before you run the server configuration class in Puppet.

Now on the server Puppet module, we create a collector:

  # collect the exported resources from any servers (i.e. nodes in puppet-speak)
  # that have example::software::bacula_client installed
  Example::Software::Bacula_client::Bacula_client_config <<||>> {
    notify  => Service['bacula-dir'],
  }

This uses the "spaceship operator", <<||>>, a kind of "for each of this type of defined resource."  In other words, for every bacula client, instantiate the bacula_client_config defined resouce and then notify (restart) the bacula server.

Now when we run the bacula server Puppet module, we get one file for each client saved in /etc/bacula/clients.

Puppet makes this sort of computer management fairly simple.  Puppet uses exported resources to in its Nagios classes, but Puppetlabs' Nagios management code is fairly impenetrable as they use multiple external custom Ruby libraries to create the files.  I hope the example above was informative!

Tuesday, March 16, 2010

Tidbits Part One

While studying for an interview, I found the following little snippets of code that I didn't know, namely brace expansion in bash:
anthony@calcifer:~/Transfer$ mkdir -pv testcode/{one,two,three}
mkdir: created directory `testcode'
mkdir: created directory `testcode/one'
mkdir: created directory `testcode/two'
mkdir: created directory `testcode/three'
That's a little heavy for most day to day shell activities, but it could be useful in a script. An example would be where you want to operate on a large number of text files and create 'in', 'out' and 'log' directories automatically then copy the text file into the script/filename/in directory:
for i in $(ls -1 *.txt); do mkdir -p script/$i/{in,out,log}; cp $i script/$i/in; done
Brace expansion is explained in a little more detail at Eric Bergen's blog.

Monday, March 15, 2010

which process is using the most IO?

Discovering which process uses the most IO on a Linux server is something everyone will run into sooner or later.  If you have a somewhat modern variant of Linux, running a kernel version of at least 2.6.20, then you are in luck, as the kernel stores lots of useful IO information.

The easiest way to monitor the hungriest IO process is via the wonderful monitoring tool 'dstat' by the maintainer of the Dag RPM archive.  You can grab dstat from http://dag.wieers.com/home-made/dstat/

Once you have dstat installed, run 'dstat --top-io' or 'dstat --bw --top-io' if you have a white background terminal.  Dstat will then start displaying, one per line, the process with the most IO.  Here's an example on my home server, showing the small load created by Winamp scanning my Samba share for new mp3s:
root@calcifer:~# dstat --bw --top-io
----most-expensive----
     i/o process      
init [3]    623k 7154B
smbd        223k  200k
smbd        235k  205k
smbd        238k  210k
smbd        232k  204k

The great thing about dstat is that its plugins, like the one displaying IO above, are all written in Python.  As Python is fairly easy to read if you're familiar with any programming language, we can figure out just where in the system the IO stats are stored.

The plugins for dstat are stored in /usr/share/dstat on my system.  Viewing the source for 'dstat_top_io.py'  gives us the following information:
http://svn.rpmforge.net/svn/trunk/tools/dstat/plugins/dstat_top_io.py
  • Line 16 checks for the existence of /proc/self/io
  • Line 22 loops over a list of current processes returned from the function 'proc_pidlist'
  • Line 31 grabs the process name from /proc/PID/stat
  • Lines 34 through 44 grab the IO stats from /proc/PID/io
  • Lines 49 through 61 get the highest IO process and display it
From that we can figure out that /proc has a wealth of information for us to browse.  If we couldn't install dstat (maybe you're trying to diagnose a system that is locked down and cannot be changed), we could still write some scripts to grab the IO stats for the currently running processes.

An Example

We have a Linux server that does multiple tasks;  it stores home directories and also serves as a database server, DNS host, backup server, and terminal server.  You've received a complaint that the server seems slow today, so you start to look at the problem.  The load on the server is definitely high, but you're not sure which process is really overloading the system.

We'd first need to confirm that we are looking for the right thing;  tools like 'iostat' and 'sar' can show us per-device IO stats (which will be covered in a different post!).  Once you have per-device IO stats, you can try to figure out which processes are using that device ('lsof' would help here).  Maybe your guess is that Samba is thrashing a user directory volume shared to lots of Windows users.

So from that, we think we should be monitoring the smbd processes on the system.  Using 'ps -C smbd -o pid --no-heading' we get our list of smbd processes.  We can take that and look manually, but it's much more fun (for a given value of fun) to automate this a little more.  We can pipe the output of ps into xargs and then get the read bytes from the /proc/PID/io.  With a little xargs trickery, we get a somewhat readable list of processes and the amount of data they are reading:
root@calcifer:~# ps -C smbd -o pid --no-heading | xargs -I {} bash -c 'echo -n "{} -> "; cat /proc/{}/io | grep read_bytes' 
2426 -> read_bytes: 91269668864
2465 -> read_bytes: 0
30935 -> read_bytes: 378146816
31215 -> read_bytes: 132063232
31367 -> read_bytes: 94208

As you can see, the information presented is still in a raw state and needs interpretation.  One way to do that would be to run the command above within a 'watch' process and see which processes are reading (or writing) the most bytes.  You could duplicate the functionality of the dstat top-io plugin within perl or a bash script (although I'd prefer the easy route and use dstat!).

From the output above, repeated over a few minutes, you think that process 30935 is the culprit.  Its read_bytes value keeps increasing beyond a value we think is acceptable.  From this we can run 'lsof -p 30935' to get a list of files it has open.

From here we have to use our imagination;  perhaps a user is trying to store their mp3 collection in their home directory, or perhaps a department share is being used for very large photoshop files.

I hope this post has been informative.  It's my first, so please comment if you feel I need to improve in some areas!