416 lines
14 KiB
Bash
Executable File
416 lines
14 KiB
Bash
Executable File
#!/bin/bash
|
|
|
|
set -e
|
|
|
|
: << =cut
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
service_events - Tracks the number of significant event occurrences per service
|
|
|
|
This plugin is a riff on the loggrep family (C<loggrep> and my own C<loggrepx_>).
|
|
However, rather than focusing on single log files, it focuses on providing
|
|
insight into all "significant events" happening for a given service, which
|
|
may be found across several log files.
|
|
|
|
The idea is that any given service may produce events in various areas of
|
|
operation. For example, while a typical web app might log runtime errors
|
|
to it's app.log file, a filesystem change may prevent the whole app from
|
|
even being bootstrapped, and this crucial error may be logged in an apache
|
|
log or in syslog.
|
|
|
|
This plugin attempts to give visibility into all such "important events"
|
|
that may affect the proper functioning of a given service. It attempts to
|
|
answer the question, "Is my service running normally?".
|
|
|
|
Unfortunately, it won't help you trace down exactly where the events are
|
|
coming from if you happen to be watching a number of different logs, but
|
|
it will at least let you know that something is wrong and that action
|
|
should be taken. To try to help with this, the plugin uses the extinfo
|
|
field to list which logs currently have important events in them.
|
|
|
|
The plugin can be included multiple times to create graphs for various
|
|
differing kinds of services. For example, you may have both webservices
|
|
and system cleanup services, and you want to keep an eye on them in
|
|
different ways.
|
|
|
|
You can accomplish this by linking the plugin twice with different names
|
|
and providing different configuration for each instance. In general, you
|
|
should think of a single instance of this plugin as representing a single
|
|
class of services.
|
|
|
|
|
|
=head1 CONFIGURATION
|
|
|
|
Configuration for this plugin is admittedly complicated. What we're doing
|
|
here is defining groups of logfiles that we're searching for various
|
|
kinds of events. It is assumed that the _way_ we search for events in the
|
|
logfiles is related to the type of logfile; thus, we associate match
|
|
criteria with logfile groups. Then, we define services that we want to
|
|
track, then mappings of logfile paths to those services.
|
|
|
|
(Note that most instances will probably work best when run as root, since
|
|
log files are usually (or at least should be) controlled with strict
|
|
permissions.)
|
|
|
|
Available config options include the following:
|
|
|
|
Plugin-specific:
|
|
|
|
env.<type>_logfiles - (reqd) Shell glob pattern defining logfiles of
|
|
type <type>
|
|
env.<type>_regex - (reqd) egrep pattern for finding events in logs
|
|
of type <type>
|
|
env.services - (optl) Space-separated list of service names
|
|
env.services_autoconf - (optl) Shell glob pattern that expands to paths
|
|
whose final member is the name of a service
|
|
env.<service>_logbinding - (optl) egrep pattern for binding <service> to
|
|
a given set of logfiles (based on path)
|
|
env.<service>_warning - (optl) service-specific warning level override
|
|
env.<service>_critical - (optl) service-specific critical level override
|
|
|
|
Munin-standard:
|
|
|
|
env.title - Graph title
|
|
env.vlabel - Custom label for the vertical axis
|
|
env.warning - Default warning level
|
|
env.critical - Default critical level
|
|
|
|
For plugin-specific options, the following rules apply:
|
|
|
|
* C<< <type> >> is any arbitrary string. It just has to match between
|
|
C<< <type>_logfiles >> and C<< <type>_regex >>. Common values are "apache",
|
|
"nginx", "apt", "syslog", etc.
|
|
* <service> is a string derived by passing the service name through a filter
|
|
that removes non-alphabet characters from the beginning and replaces all non-
|
|
alphanumeric characters with underscore (C<_>).
|
|
* logfiles are bound to services by matching C<< <service>_logbinding >> on the
|
|
full logfile path. For example, specifying C<my_site_logbinding=my-site> would
|
|
bind both F</var/log/my-site/errors.log> and F</srv/www/my-site/logs/app.log>
|
|
to the defined C<my-site> service.
|
|
|
|
|
|
=head2 SERVICE AUTOCONF
|
|
|
|
Because services are often dynamic and you don't want to have to manually update
|
|
config every time you deploy a new service, you have the option of defining a
|
|
glob pattern that resolves to a collection of paths whose endpoints are service
|
|
names. Because of the way services are deployed in real life, it's fairly common
|
|
that paths will exist on your system that can accommodate this. Most often it
|
|
will be something like /srv/*/*, which would match all children in /srv/www/ and
|
|
/srv/local/.
|
|
|
|
If you choose not to use the autoconf feature, you MUST specify services as a
|
|
space-separated list of service names in the C<services> variable.
|
|
|
|
|
|
=head2 EXAMPLE CONFIGS
|
|
|
|
This example uses services autoconf:
|
|
|
|
[service_events]
|
|
user root
|
|
env.services_autoconf /srv/*/*
|
|
env.cfxsvc_logfiles /srv/*/*/logs/app.log
|
|
env.cfxsvc_regex error|alert|crit|emerg
|
|
env.phpfpm_logfiles /srv/*/*/logs/php-fpm*.log
|
|
env.phpfpm_regex Fatal error
|
|
env.apache_logfiles /srv/*/*/logs/errors.log
|
|
env.apache_regex error|alert|crit|emerg
|
|
env.warning 1
|
|
env.critical 5
|
|
env.my_special_service_warning 100
|
|
env.my_special_service_critical 300
|
|
|
|
This example DOES NOT use services autoconf:
|
|
|
|
[service_events]
|
|
user root
|
|
env.services auth.example.com admin.example.com www.example.com
|
|
env.auth_example_com_logbinding my-custom-binding[0-9]+
|
|
env.cfxsvc_logfiles /srv/*/*/logs/app.log
|
|
env.cfxsvc_regex error|alert|crit|emerg
|
|
env.phpfpm_logfiles /srv/*/*/logs/php-fpm*.log
|
|
env.phpfpm_regex Fatal error
|
|
env.apache_logfiles /srv/*/*/logs/errors.log
|
|
env.apache_regex error|alert|crit|emerg
|
|
env.warning 1
|
|
env.critical 5
|
|
env.auth_example_com_warning 100
|
|
env.auth_example_com_critical 300
|
|
env.www_example_com_warning 50
|
|
env.www_example_com_critical 100
|
|
|
|
This graph will ONLY ever show values for the three listed services, even
|
|
if other services are installed whose logfiles match the logfiles search.
|
|
|
|
Also notice that in this example, we've only listed a log binding for the
|
|
auth service. The plugin will use the service name by default for any
|
|
services that don't specify a log binding, so in this case, auth has a
|
|
custom log binding, while all other services have log bindings equal to
|
|
their names.
|
|
|
|
|
|
=head1 AUTHOR
|
|
|
|
Kael Shipman <kael.shipman@gmail.com>
|
|
|
|
|
|
=head1 LICENSE
|
|
|
|
MIT LICENSE
|
|
|
|
Copyright 2018 Kael Shipman<kael.shipman@gmail.com>
|
|
|
|
Permission is hereby granted, free of charge, to any person obtaining a
|
|
copy of this software and associated documentation files (the "Software"),
|
|
to deal in the Software without restriction, including without limitation
|
|
the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
|
and/or sell copies of the Software, and to permit persons to whom the
|
|
Software is furnished to do so, subject to the following conditions:
|
|
|
|
The above copyright notice and this permission notice shall be included
|
|
in all copies or substantial portions of the Software.
|
|
|
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
|
|
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
|
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
|
|
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
|
|
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
|
OTHER DEALINGS IN THE SOFTWARE.
|
|
|
|
|
|
=head1 MAGIC MARKERS
|
|
|
|
#%# family=manual
|
|
|
|
=cut
|
|
|
|
|
|
services_autoconf=${services_autoconf:-}
|
|
|
|
# Get list of all currently set env variables
|
|
vars=$(printenv | cut -f 1 -d "=")
|
|
|
|
# Certain variables MUST be set; check that they are (using bitmask)
|
|
setvars=0
|
|
reqvars=(_logfiles _regex)
|
|
while read -u 3 -r v; do
|
|
n=0
|
|
while [ "$n" -lt "${#reqvars[@]}" ]; do
|
|
if echo "$v" | grep -Eq "${reqvars[$n]}$"; then
|
|
setvars=$((setvars | (2 ** n) ))
|
|
fi
|
|
n=$((n+1))
|
|
done
|
|
done 3< <(echo "$vars")
|
|
|
|
|
|
# Sum all required variables
|
|
n=0
|
|
allvars=0
|
|
while [ "$n" -lt "${#reqvars[@]}" ]; do
|
|
allvars=$(( allvars + 2 ** n ))
|
|
n=$((n+1))
|
|
done
|
|
|
|
# And scream if something's not set
|
|
if ! [ "$setvars" -eq "$allvars" ]; then
|
|
>&2 echo "E: Missing some required variables:"
|
|
>&2 echo
|
|
n=0
|
|
i=1
|
|
while [ "$n" -lt "${#reqvars[@]}" ]; do
|
|
if [ $(( setvars & i )) -eq 0 ]; then
|
|
>&2 echo " *${reqvars[$n]}"
|
|
fi
|
|
i=$((i<<1))
|
|
n=$((n+1))
|
|
done
|
|
>&2 echo
|
|
>&2 echo "Please read the docs."
|
|
exit 1
|
|
fi
|
|
|
|
# Check for more difficult variables
|
|
if [ -z "$services" ] && [ -z "$services_autoconf" ]; then
|
|
>&2 echo "E: You must pass either \$services or \$services_autoconf"
|
|
exit 1
|
|
fi
|
|
if [ -z "$services_autoconf" ] && ! echo "$vars" | grep -q "_logbinding"; then
|
|
>&2 echo "E: You must pass either \$*_logbinding (for each service) or \$services_autoconf"
|
|
exit 1
|
|
fi
|
|
|
|
|
|
# Now go find all log files
|
|
LOGFILES=
|
|
declare -a LOGFILEMAP
|
|
while read -u 3 -r v; do
|
|
if echo "$v" | grep -Eq "_logfiles$"; then
|
|
# Get the name associated with these logfiles
|
|
logfiletype="${v%_logfiles}"
|
|
# This serves to expand globs while preserving spaces (and also appends the necessary newline)
|
|
while IFS= read -u 4 -r -d$'\n' line; do
|
|
LOGFILEMAP+=($logfiletype)
|
|
LOGFILES="${LOGFILES}$line"$'\n'
|
|
done 4< <(IFS= ; for f in ${!v}; do echo "$f"; done)
|
|
fi
|
|
done 3< <(echo "$vars")
|
|
|
|
|
|
# Set some defaults and other values
|
|
title="${title:-Important Events per Service}"
|
|
vlabel="${vlabel:-events}"
|
|
|
|
# If services_autoconf is passed, it is assumed to be a shell glob, the leaves of which are the services
|
|
# This also autobinds the service, if not already bound
|
|
if [ -n "$services_autoconf" ]; then
|
|
declare -a services
|
|
IFS=
|
|
for s in $services_autoconf; do
|
|
s="$(basename "$s")"
|
|
services+=("$s")
|
|
done
|
|
unset IFS
|
|
else
|
|
services=($services)
|
|
fi
|
|
|
|
|
|
# Import munin functions
|
|
. "$MUNIN_LIBDIR/plugins/plugin.sh"
|
|
|
|
|
|
# Now get to the real function definitions
|
|
|
|
function config() {
|
|
echo "graph_title ${title}"
|
|
echo "graph_args --base 1000 -l 0"
|
|
echo "graph_vlabel ${vlabel}"
|
|
echo "graph_category other"
|
|
echo "graph_info Lists number of matching lines found in various logfiles associated with each service. Extinfo displays currently affected logs."
|
|
|
|
local var_prefix
|
|
while read -u 3 -r svc; do
|
|
var_prefix="$(echo "$svc" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')"
|
|
echo "$var_prefix.label $svc"
|
|
print_warning "$var_prefix"
|
|
print_critical "$var_prefix"
|
|
echo "$var_prefix.info Number of event occurrences for $svc"
|
|
done 3< <(IFS=$'\n'; echo "${services[*]}")
|
|
}
|
|
|
|
|
|
function fetch() {
|
|
local curstate n svcnm varnm service svc svc_counter_var logbinding logfile lognm logmatch prvlines curlines matches extinfo_var
|
|
local nextstate=()
|
|
|
|
# Load state
|
|
touch "$MUNIN_STATEFILE"
|
|
curstate="$(cat "$MUNIN_STATEFILE")"
|
|
|
|
# Set service counters to 0 and set any logbindings that aren't yet set
|
|
while read -u 3 -r svc; do
|
|
svcnm="$(echo "$svc" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')"
|
|
typeset "${svcnm}_total=0"
|
|
|
|
varnm="${svcnm}_logbinding"
|
|
if [ -z "$(echo "$curstate" | grep "^${varnm}=" | cut -f 2 -d "=")" ]; then
|
|
typeset "$varnm=$svc"
|
|
fi
|
|
done 3< <(IFS=$'\n'; echo "${services[*]}")
|
|
|
|
n=0
|
|
while read -u 3 -r logfile; do
|
|
# Handling trailing newline
|
|
if [ -z "$logfile" ]; then
|
|
continue
|
|
fi
|
|
|
|
# Make sure the logfile exists
|
|
if [ ! -e "$logfile" ]; then
|
|
>&2 echo "Logfile '$logfile' doesn't exist. Skipping."
|
|
n=$((n+1))
|
|
continue
|
|
fi
|
|
|
|
# Find which service this logfile is associated with
|
|
service=
|
|
while read -u 4 -r svc; do
|
|
logbinding="$(echo "$svc" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')_logbinding"
|
|
if echo "$logfile" | grep -Eq "${!logbinding}"; then
|
|
service="$svc"
|
|
break
|
|
fi
|
|
done 4< <(IFS=$'\n'; echo "${services[*]}")
|
|
|
|
# Skip this log if it's not associated with any service
|
|
if [ -z "$service" ]; then
|
|
>&2 echo "W: No service associated with log $logfile. Skipping...."
|
|
continue
|
|
fi
|
|
|
|
# Get shell-compatible names for service and logfile
|
|
svcnm="$(echo "$service" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')"
|
|
lognm="$(echo "$logfile" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')"
|
|
|
|
# Get previous line count to determine whether or not the file may have been rotated (defaulting to 0)
|
|
prvlines="$(echo "$curstate" | grep "^${lognm}_lines=" | cut -f 2 -d "=")"
|
|
prvlines="${prvlines:-0}"
|
|
|
|
# Get the current number of lines in the file (defaulting to 0 on error)
|
|
curlines="$(wc -l < "$logfile")"
|
|
|
|
# If the current line count is less than the previous line count, we've probably rotated.
|
|
# Reset to 0.
|
|
if [ "$curlines" -lt "$prvlines" ]; then
|
|
prvlines=0
|
|
else
|
|
prvlines=$((prvlines + 1))
|
|
fi
|
|
|
|
# Get incidents starting at the line after the last line we've seen
|
|
logmatch="${LOGFILEMAP[$n]}_regex"
|
|
matches="$(tail -n +"$prvlines" "$logfile" | grep -Ec "${!logmatch}" || true)"
|
|
|
|
# If there were matches, aggregate them and add this log to the extinfo for the service
|
|
if [ "$matches" -gt 0 ]; then
|
|
# Aggregate and add to the correct service counter
|
|
svc_counter_var="${svcnm}_total"
|
|
matches=$((matches + ${!svc_counter_var}))
|
|
typeset "$svc_counter_var=$matches"
|
|
|
|
# Add this log to extinfo for service
|
|
extinfo_var="${svcnm}_extinfo"
|
|
typeset "$extinfo_var=${!extinfo_var}$logfile, "
|
|
fi
|
|
|
|
# Push onto next state
|
|
nextstate+=("${lognm}_lines=$curlines")
|
|
|
|
n=$((n+1))
|
|
done 3< <(echo "$LOGFILES")
|
|
|
|
# Write state to munin statefile
|
|
(IFS=$'\n'; echo "${nextstate[*]}" > "$MUNIN_STATEFILE")
|
|
|
|
# Now echo values
|
|
while read -u 3 -r svc; do
|
|
svcnm="$(echo "$svc" | sed -r 's/^[^a-zA-Z]+//g' | sed -r 's/[^a-zA-Z0-9]+/_/g')"
|
|
svc_counter_var="${svcnm}_total"
|
|
extinfo_var="${svcnm}_extinfo"
|
|
echo "${svcnm}.value ${!svc_counter_var}"
|
|
echo "${svcnm}.extinfo ${!extinfo_var}"
|
|
done 3< <(IFS=$'\n'; echo "${services[*]}")
|
|
|
|
return 0
|
|
}
|
|
|
|
|
|
case "$1" in
|
|
config) config ;;
|
|
*) fetch ;;
|
|
esac
|