munin-contrib/plugins/logs/service_events

416 lines
14 KiB
Bash
Executable File

#!/bin/bash
set -e
: << =cut
=head1 DESCRIPTION
service_events - Tracks the number of significant event occurrences per service
This plugin is a riff on the loggrep family (C<loggrep> and my own C<loggrepx_>).
However, rather than focusing on single log files, it focuses on providing
insight into all "significant events" happening for a given service, which
may be found across several log files.
The idea is that any given service may produce events in various areas of
operation. For example, while a typical web app might log runtime errors
to it's app.log file, a filesystem change may prevent the whole app from
even being bootstrapped, and this crucial error may be logged in an apache
log or in syslog.
This plugin attempts to give visibility into all such "important events"
that may affect the proper functioning of a given service. It attempts to
answer the question, "Is my service running normally?".
Unfortunately, it won't help you trace down exactly where the events are
coming from if you happen to be watching a number of different logs, but
it will at least let you know that something is wrong and that action
should be taken. To try to help with this, the plugin uses the extinfo
field to list which logs currently have important events in them.
The plugin can be included multiple times to create graphs for various
differing kinds of services. For example, you may have both webservices
and system cleanup services, and you want to keep an eye on them in
different ways.
You can accomplish this by linking the plugin twice with different names
and providing different configuration for each instance. In general, you
should think of a single instance of this plugin as representing a single
class of services.
=head1 CONFIGURATION
Configuration for this plugin is admittedly complicated. What we're doing
here is defining groups of logfiles that we're searching for various
kinds of events. It is assumed that the _way_ we search for events in the
logfiles is related to the type of logfile; thus, we associate match
criteria with logfile groups. Then, we define services that we want to
track, then mappings of logfile paths to those services.
(Note that most instances will probably work best when run as root, since
log files are usually (or at least should be) controlled with strict
permissions.)
Available config options include the following:
Plugin-specific:
env.<type>_logfiles - (reqd) Shell glob pattern defining logfiles of
type <type>
env.<type>_regex - (reqd) egrep pattern for finding events in logs
of type <type>
env.services - (optl) Space-separated list of service names
env.services_autoconf - (optl) Shell glob pattern that expands to paths
whose final member is the name of a service
env.<service>_logbinding - (optl) egrep pattern for binding <service> to
a given set of logfiles (based on path)
env.<service>_warning - (optl) service-specific warning level override
env.<service>_critical - (optl) service-specific critical level override
Munin-standard:
env.title - Graph title
env.vlabel - Custom label for the vertical axis
env.warning - Default warning level
env.critical - Default critical level
For plugin-specific options, the following rules apply:
* C<< <type> >> is any arbitrary string. It just has to match between
C<< <type>_logfiles >> and C<< <type>_regex >>. Common values are "apache",
"nginx", "apt", "syslog", etc.
* <service> is a string derived by passing the service name through a filter
that removes non-alphabet characters from the beginning and replaces all non-
alphanumeric characters with underscore (C<_>).
* logfiles are bound to services by matching C<< <service>_logbinding >> on the
full logfile path. For example, specifying C<my_site_logbinding=my-site> would
bind both F</var/log/my-site/errors.log> and F</srv/www/my-site/logs/app.log>
to the defined C<my-site> service.
=head2 SERVICE AUTOCONF
Because services are often dynamic and you don't want to have to manually update
config every time you deploy a new service, you have the option of defining a
glob pattern that resolves to a collection of paths whose endpoints are service
names. Because of the way services are deployed in real life, it's fairly common
that paths will exist on your system that can accommodate this. Most often it
will be something like /srv/*/*, which would match all children in /srv/www/ and
/srv/local/.
If you choose not to use the autoconf feature, you MUST specify services as a
space-separated list of service names in the C<services> variable.
=head2 EXAMPLE CONFIGS
This example uses services autoconf:
[service_events]
user root
env.services_autoconf /srv/*/*
env.cfxsvc_logfiles /srv/*/*/logs/app.log
env.cfxsvc_regex error|alert|crit|emerg
env.phpfpm_logfiles /srv/*/*/logs/php-fpm*.log
env.phpfpm_regex Fatal error
env.apache_logfiles /srv/*/*/logs/errors.log
env.apache_regex error|alert|crit|emerg
env.warning 1
env.critical 5
env.my_special_service_warning 100
env.my_special_service_critical 300
This example DOES NOT use services autoconf:
[service_events]
user root
env.services auth.example.com admin.example.com www.example.com
env.auth_example_com_logbinding my-custom-binding[0-9]+
env.cfxsvc_logfiles /srv/*/*/logs/app.log
env.cfxsvc_regex error|alert|crit|emerg
env.phpfpm_logfiles /srv/*/*/logs/php-fpm*.log
env.phpfpm_regex Fatal error
env.apache_logfiles /srv/*/*/logs/errors.log
env.apache_regex error|alert|crit|emerg
env.warning 1
env.critical 5
env.auth_example_com_warning 100
env.auth_example_com_critical 300
env.www_example_com_warning 50
env.www_example_com_critical 100
This graph will ONLY ever show values for the three listed services, even
if other services are installed whose logfiles match the logfiles search.
Also notice that in this example, we've only listed a log binding for the
auth service. The plugin will use the service name by default for any
services that don't specify a log binding, so in this case, auth has a
custom log binding, while all other services have log bindings equal to
their names.
=head1 AUTHOR
Kael Shipman <kael.shipman@gmail.com>
=head1 LICENSE
MIT LICENSE
Copyright 2018 Kael Shipman<kael.shipman@gmail.com>
Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
=head1 MAGIC MARKERS
#%# family=manual
=cut
services_autoconf=${services_autoconf:-}
# Get list of all currently set env variables
vars=$(printenv | cut -f 1 -d "=")
# Certain variables MUST be set; check that they are (using bitmask)
setvars=0
reqvars=(_logfiles _regex)
while read -u 3 -r v; do
n=0
while [ "$n" -lt "${#reqvars[@]}" ]; do
if echo "$v" | grep -Eq "${reqvars[$n]}$"; then
setvars=$((setvars | (2 ** n) ))
fi
n=$((n+1))
done
done 3< <(echo "$vars")
# Sum all required variables
n=0
allvars=0
while [ "$n" -lt "${#reqvars[@]}" ]; do
allvars=$(( allvars + 2 ** n ))
n=$((n+1))
done
# And scream if something's not set
if ! [ "$setvars" -eq "$allvars" ]; then
>&2 echo "E: Missing some required variables:"
>&2 echo
n=0
i=1
while [ "$n" -lt "${#reqvars[@]}" ]; do
if [ $(( setvars & i )) -eq 0 ]; then
>&2 echo " *${reqvars[$n]}"
fi
i=$((i<<1))
n=$((n+1))
done
>&2 echo
>&2 echo "Please read the docs."
exit 1
fi
# Check for more difficult variables
if [ -z "$services" ] && [ -z "$services_autoconf" ]; then
>&2 echo "E: You must pass either \$services or \$services_autoconf"
exit 1
fi
if [ -z "$services_autoconf" ] && ! echo "$vars" | grep -q "_logbinding"; then
>&2 echo "E: You must pass either \$*_logbinding (for each service) or \$services_autoconf"
exit 1
fi
# Now go find all log files
LOGFILES=
declare -a LOGFILEMAP
while read -u 3 -r v; do
if echo "$v" | grep -Eq "_logfiles$"; then
# Get the name associated with these logfiles
logfiletype="${v%_logfiles}"
# This serves to expand globs while preserving spaces (and also appends the necessary newline)
while IFS= read -u 4 -r -d$'\n' line; do
LOGFILEMAP+=($logfiletype)
LOGFILES="${LOGFILES}$line"$'\n'
done 4< <(IFS= ; for f in ${!v}; do echo "$f"; done)
fi
done 3< <(echo "$vars")
# Set some defaults and other values
title="${title:-Important Events per Service}"
vlabel="${vlabel:-events}"
# If services_autoconf is passed, it is assumed to be a shell glob, the leaves of which are the services
# This also autobinds the service, if not already bound
if [ -n "$services_autoconf" ]; then
declare -a services
IFS=
for s in $services_autoconf; do
s="$(basename "$s")"
services+=("$s")
done
unset IFS
else
services=($services)
fi
# Import munin functions
. "$MUNIN_LIBDIR/plugins/plugin.sh"
# Now get to the real function definitions
function config() {
echo "graph_title ${title}"
echo "graph_args --base 1000 -l 0"
echo "graph_vlabel ${vlabel}"
echo "graph_category other"
echo "graph_info Lists number of matching lines found in various logfiles associated with each service. Extinfo displays currently affected logs."
local var_prefix
while read -u 3 -r svc; do
var_prefix="$(echo "$svc" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')"
echo "$var_prefix.label $svc"
print_warning "$var_prefix"
print_critical "$var_prefix"
echo "$var_prefix.info Number of event occurrences for $svc"
done 3< <(IFS=$'\n'; echo "${services[*]}")
}
function fetch() {
local curstate n svcnm varnm service svc svc_counter_var logbinding logfile lognm logmatch prvlines curlines matches extinfo_var
local nextstate=()
# Load state
touch "$MUNIN_STATEFILE"
curstate="$(cat "$MUNIN_STATEFILE")"
# Set service counters to 0 and set any logbindings that aren't yet set
while read -u 3 -r svc; do
svcnm="$(echo "$svc" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')"
typeset "${svcnm}_total=0"
varnm="${svcnm}_logbinding"
if [ -z "$(echo "$curstate" | grep "^${varnm}=" | cut -f 2 -d "=")" ]; then
typeset "$varnm=$svc"
fi
done 3< <(IFS=$'\n'; echo "${services[*]}")
n=0
while read -u 3 -r logfile; do
# Handling trailing newline
if [ -z "$logfile" ]; then
continue
fi
# Make sure the logfile exists
if [ ! -e "$logfile" ]; then
>&2 echo "Logfile '$logfile' doesn't exist. Skipping."
n=$((n+1))
continue
fi
# Find which service this logfile is associated with
service=
while read -u 4 -r svc; do
logbinding="$(echo "$svc" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')_logbinding"
if echo "$logfile" | grep -Eq "${!logbinding}"; then
service="$svc"
break
fi
done 4< <(IFS=$'\n'; echo "${services[*]}")
# Skip this log if it's not associated with any service
if [ -z "$service" ]; then
>&2 echo "W: No service associated with log $logfile. Skipping...."
continue
fi
# Get shell-compatible names for service and logfile
svcnm="$(echo "$service" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')"
lognm="$(echo "$logfile" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')"
# Get previous line count to determine whether or not the file may have been rotated (defaulting to 0)
prvlines="$(echo "$curstate" | grep "^${lognm}_lines=" | cut -f 2 -d "=")"
prvlines="${prvlines:-0}"
# Get the current number of lines in the file (defaulting to 0 on error)
curlines="$(wc -l < "$logfile")"
# If the current line count is less than the previous line count, we've probably rotated.
# Reset to 0.
if [ "$curlines" -lt "$prvlines" ]; then
prvlines=0
else
prvlines=$((prvlines + 1))
fi
# Get incidents starting at the line after the last line we've seen
logmatch="${LOGFILEMAP[$n]}_regex"
matches="$(tail -n +"$prvlines" "$logfile" | grep -Ec "${!logmatch}" || true)"
# If there were matches, aggregate them and add this log to the extinfo for the service
if [ "$matches" -gt 0 ]; then
# Aggregate and add to the correct service counter
svc_counter_var="${svcnm}_total"
matches=$((matches + ${!svc_counter_var}))
typeset "$svc_counter_var=$matches"
# Add this log to extinfo for service
extinfo_var="${svcnm}_extinfo"
typeset "$extinfo_var=${!extinfo_var}$logfile, "
fi
# Push onto next state
nextstate+=("${lognm}_lines=$curlines")
n=$((n+1))
done 3< <(echo "$LOGFILES")
# Write state to munin statefile
(IFS=$'\n'; echo "${nextstate[*]}" > "$MUNIN_STATEFILE")
# Now echo values
while read -u 3 -r svc; do
svcnm="$(echo "$svc" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')"
svc_counter_var="${svcnm}_total"
extinfo_var="${svcnm}_extinfo"
echo "${svcnm}.value ${!svc_counter_var}"
echo "${svcnm}.extinfo ${!extinfo_var}"
done 3< <(IFS=$'\n'; echo "${services[*]}")
return 0
}
case "$1" in
config) config ;;
*) fetch ;;
esac