Simple Fusion IO Monitor

July 10, 2013 - 12:00 pm

I had a powerful need to monitor a single Fusion IO card. Simple check of media status, capacity reserves, blocks good and pages good.

Adjust $confEmailTo, $confEmailFrom and potentially confCapReserveAlarm, confBlocksGoodAlarm, confPagesGoodAlarm.

Run on cron, run manually, run by intern, it's all the same to me.

#!/usr/bin/ruby
###############################################################################
# Copyright (c) 2013, Workhabit, Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
#    * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
#    * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
#    * Neither the name of Workhabit, Inc., nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
###############################################################################

# Authored by Gary Gogick (gary@workhabit.com)
#
# This is a simple script to monitor certain fio-status parameters and alert
# via e-mail if set thresholds are breached.  Specifically, it checks the
# media status, capacity reserves, blocks good and pages good parameters
# of all ioDimm* sections of fio-status -fk.
#
# Configuration parameters are fairly straightforward; the conf*Alarm
# variables should be in the form of a float (eg, 25.0 for 25%)
#
# The following gems are required:
#    inifile
#    mail

require "rubygems"
require "inifile"
require "mail"

### Configuration
$confEmailTo = "from@example.org"
$confEmailFrom = "root@server.example.org"
confCapReserveAlarm = 75.0
confBlocksGoodAlarm = 75.0
confPagesGoodAlarm = 75.0

### Helper functions
def sendmail(subj, msg)
    Mail.deliver do
        from $confEmailFrom
        to $confEmailTo
        subject subj
        body msg
    end
end

### Initial setup and fio probe
# Set message variables
host = `hostname`.strip
message = ""

# Generate fio status report
system 'fio-status -fk > /tmp/fio-status'

# Load report into IniFile
status = IniFile.load("/tmp/fio-status")

### Alert handling
# Loop through ioDimm sections
status.each_section do |i|
    # This is the [ioDimm *] section; additional sections or checks within this section should be easy to add.
    if i =~ /ioDimm */
        section = status[i]
        if section['media_status'] != 'Healthy'
            message = message + "Device #{i}: media_status has failed: #{section['media_status']} (vs Healthy)\n\r"
        end
        if section['capacity_reserves_percent'].to_f < confCapReserveAlarm
            message = message + "Device #{i}: capacity_reserves_percent is #{section['capacity_reserves_percent']} (vs #{confCapReserveAlarm})\n\r"
        end
        if section['blocks_good_percent'].to_f < confBlocksGoodAlarm
            message = message + "Device #{i}: blocks_good_percent is #{section['blocks_good_percent']} (vs #{confBlocksGoodAlarm})\n\r"
        end
        if section['pages_good_percent'].to_f < confPagesGoodAlarm
            message = message + "Device #{i}: pages_good_percent is #{section['pages_good_percent']} (vs #{confPagesGoodAlarm})\n\r"
        end 
    end
end

# Send e-mail alert
if message != ""
    sendmail("FusionIO alert on #{host}", message)
end