PacNOG 5 Papeete, French Polynesia 17 June 2009 Hervey Allen

32 Slides969.00 KB

PacNOG 5 Papeete, French Polynesia 17 June 2009 Hervey Allen nsrc@PacNOG5 Papeete, French Polynesia

Introduction Nagios: a measurement tool that actively monitors availability of devices and services: Popular: One of the most used open source network monitoring software packages. Fast: Uses CGI functionality written in C for faster response and scalability. Scalable: Can support up to thousands of devices and services. Modular Cool-Looking Web Interface nsrc@PacNOG5 Papeete, French Polynesia

“Cool-Looking Web Interface ” nsrc@PacNOG5 Papeete, French Polynesia

Features: 1 Modular Type of availability is largely delegated to plug-ins: The product's architecture is simple enough that writing new plugins is fairly easy in the language of your choice. There are many, many, many plug-ins available. nsrc@PacNOG5 Papeete, French Polynesia

Features: Plug-Ins or Modular The Nagios package in Ubuntu comes with a number of pre-installed plugins: apt.cfg breeze.cfg dhcp.cfg disk-smb.cfg disk.cfg dns.cfg dummy.cfg flexlm.cfg fping.cfg ftp.cfg games.cfg hppjd.cfg http.cfg ifstatus.cfg ldap.cfg load.cfg mail.cfg mrtg.cfg mysql.cfg netware.cfg news.cfg nt.cfg ntp.cfg pgsql.cfg ping.cfg procs.cfg radius.cfg real.cfg rpc-nfs.cfg snmp.cfg ssh.cfg tcp udp.cfg telnet.cfg users.cfg vsz.cfg There are many more available (e.g.). http://sourceforge.net/projects/nagiosplugins nsrc@PacNOG5 Papeete, French Polynesia

Features: 2 Fast and Scalable Compiled, binary CGIs and common plug-ins for faster performance. Parallel checking and forking of checks to support large numbers of devices. This has been considerably improved in version 3 of Nagios. Improvement of efficiency is a controversial topic in the Nagios community. There is now a fork, icinga, trying to re-write Nagios in a different manner. nsrc@PacNOG5 Papeete, French Polynesia

Features: 3 Uses “intelligent” checking capabilities. Attempts to distribute the server load of running Nagios (for larger sites) and the load placed on devices being checked. Configuration is done in simple, plain text files, that can contain much detail and are based on templates. Nagios reads it's configuration from an entire directory. You decide how to define individual files. nsrc@PacNOG5 Papeete, French Polynesia

Features: 4 Topology Aware: To determine dependencies. Differentiates between what is down vs. what is not available. This way it avoids running unnecessary checks. This is done using parent-child relationships between devices. Notifications: How they are sent is based on combinations of: Contacts and lists of contacts. Devices and groups of devices Services and groups of services Defined hours by persons or groups. The state of a service. nsrc@PacNOG5 Papeete, French Polynesia

Features: 5 Service state: When configuring a service you have the following notification options: d: DOWN: The service is down (not available) u: UNREACHABLE: When the host is not visible r: RECOVERY: (OK) Host is coming back up f: FLAPPING: When a host first starts or stops or it's state is undetermined. n: NONE: Don't send any notifications nsrc@PacNOG5 Papeete, French Polynesia

nsrc@PacNOG5 Papeete, French Polynesia

How Checks Work A node/host/device consists of one or more service checks (PING, HTTP, MYSQL, SSH, etc) Periodically Nagios checks each service for each node and determines if state has changed. State changes are: CRITICAL WARNING UNKNOWN For each state change you can assign: Notification options (as mentioned before) Event handlers (scripts, actions to take) nsrc@PacNOG5 Papeete, French Polynesia

How Checks Work Parameters: Set in /etc/nagios3/nagios.cfg: Normal checking interval Re-check interval Maximum number of checks. Period for each check Services check(s) only happen when a node responds (ping check or “is alive yes”): Remember a node can be: DOWN UNREACHABLE (What's the difference?) nsrc@PacNOG5 Papeete, French Polynesia

How Checks Work: 2 In this manner it can take some time before a host changes its state to “down” as Nagios first does a service check and then a node check. By default Nagios does a node check 3 times before it will change the nodes state to down. You can, of course, change all this. /etc/nagios3/nagios.cfg Lots of configuration settings and combinations Default settings have been tested for large install nsrc@PacNOG5 Papeete, French Polynesia

The Concept of “Parents” Nodes can have parents. For example, the parent of a PC connected to the switch mgmt-sw1 would be mgmt-sw1. This allows us to specify the network dependencies that exist between machines, switches, routers, etc. This avoids having Nagios send alarms when a parent does not respond. Note: A node can have multiple parents. nsrc@PacNOG5 Papeete, French Polynesia

The Idea of Network Viewpoint Where you locate your Nagios server will determine your point of view of the network. Nagios allows for parallel Nagios boxes that run at other locations on a network. Often it makes sense to place your Nagios server nearer the border of your network vs. in the core, or. Have someone else run checks for you from an external location as well. nsrc@PacNOG5 Papeete, French Polynesia

Network Viewpoint nsrc@PacNOG5 Papeete, French Polynesia

Nagios Configuration Files nsrc@PacNOG5 Papeete, French Polynesia

Configuration Files Located in /etc/nagios3/ (in Ubuntu) Important files include: cgi.cfg Controls the web interface and security options. commands.cfg The commands that Nagios uses for notifications (i.e. sending email) nagios.cfg Main configuration file. conf.d/* All other configuration goes here! nsrc@PacNOG5 Papeete, French Polynesia

Configuration Files Under conf.d/* (sample only) contacts nagios3.cfg users and groups generic-host nagios2.cfg default host template generic-service nagios2.cfg default service template hostgroups nagios2.cfg groups of nodes services nagios2.cfg what services to check timeperiods nagios2.cfg when to check and who to notifiy nsrc@PacNOG5 Papeete, French Polynesia

Configuration Files Under conf.d some other possible configfiles: host-gateway.cfg extinfo.cfg servicegroups.cfig localhost.cfg pcs.cfg/servers.cfg switches.cfg routers.cfg Default route definition Additional node information Groups of nodes and services Define the Nagios server itself Sample definition of PCs (hosts) Definitions of switches (hosts) Definitions of routers (hosts) nsrc@PacNOG5 Papeete, French Polynesia

Main Configuration Details Global settings File: /etc/nagios2/nagios.cfg Says where other configuration files are. General Nagios behavior: For large installations you should tune the installation via this file. See: Tunning Nagios for Maximum Performance http://nagios.sourceforce.net/docs/2 0/tuning.html nsrc@PacNOG5 Papeete, French Polynesia

CGI Configuration /etc/nagios3/cgi.cfg You can change the CGI directory if you wish Authentication and authorization for Nagios use. Activate authentication via Apache's .htpasswd mechanism, or using RADIUS or LDAP. Users can be assigned rights via the following variables: authorized for system information authorized for configuration information authorized for system commands authorized for all services authorized for all hosts authorized for all service commands authorized for all host commands nsrc@PacNOG5 Papeete, French Polynesia

Time Periods conf.d/timeperiods nagios2.cfg: defines the base periods that control checks, notifications, etc. Defaults: 24 x 7 Could adjust as needed, such as work week only. Could adjust a new time period for “outside of regular hours”, etc. # '24x7' # '24x7' define timeperiod{ define timeperiod name timeperiod{ 24x7 timeperiod name 24x7 alias 24 Hours A Day, 7 Days A Week alias 24 Hours A Day, 7 Days A Week sunday 00:00-24:00 sunday 00:00-24:00 monday 00:00-24:00 monday 00:00-24:00 tuesday 00:00-24:00 tuesday 00:00-24:00 wednesday 00:00-24:00 wednesday 00:00-24:00 thursday 00:00-24:00 thursday 00:00-24:00 friday 00:00-24:00 friday 00:00-24:00 saturday 00:00-24:00 saturday 00:00-24:00 } } nsrc@PacNOG5 Papeete, French Polynesia

Configuring Service/Host Checks Define how you are going to test a service. # 'check-host-alive' command definition define command{ command name check-host-alive command line USER1 /check ping -H HOSTADDRESS -w 2000.0,60% -c 5000.0,100% -p 1 -t 5 } Located in /etc/nagios-plugins/config, then adjust in /etc/nagios3/conf.d/services nagios2.cfg nsrc@PacNOG5 Papeete, French Polynesia

Notification Commands Allows you to utilize any command you wish. You can do this for generating tickets in RT: # 'notify-by-email' command definition define command{ command name notify-by-email command line /usr/bin/printf "%b" "Service: SERVICEDESC \nHost: HOSTNAME \nIn: HOSTALIAS \nAddress: HOSTADDRESS \nState: SERVICESTATE \ nInfo: SERVICEOUTPUT \nDate: SHORTDATETIME " /bin/mail -s ' NOTIFICATIONTYPE : HOSTNAME / SERVICEDESC is SERVICESTATE ' CONTACTEMAIL } From: [email protected] To: grupo-redes@localdomain Subject: Host DOWN alert for switch1! Date: Thu, 29 Jun 2006 15:13:30 -0700 Host: switch1 In: Core Switches State: DOWN Address: 111.222.333.444 Date/Time: 06-29-2006 15:13:30 Info: CRITICAL - Plugin timed out after 6 seconds nsrc@PacNOG5 Papeete, French Polynesia

Nodes and Services Configuration Based on templates This saves lots of time avoiding repetition Similar to Object Oriented programming Create default templates with default parameters for a: generic node generic service generic contact nsrc@PacNOG5 Papeete, French Polynesia

Generic Node Configuration define definehost{ host{ name generic-host name generic-host notifications enabled 11 notifications enabled event handler enabled 11 event handler enabled flap detection enabled 11 flap detection enabled process perf data 11 process perf data retain status information 11 retain status information retain nonstatus information retain nonstatus information 11 check command check-host-alive check command check-host-alive max check attempts 5 max check attempts 5 notification interval 60 notification interval 60 notification period 24x7 notification period 24x7 notification options d,r notification options d,r contact groups nobody contact groups nobody register 0 register 0 }} nsrc@PacNOG5 Papeete, French Polynesia

Individual Node Configuration define definehost{ host{ use use host name host name alias alias address address parents parents contact groups contact groups }} generic-host generic-host switch1 switch1 Core switches Core switches 192.168.1.2 192.168.1.2 router1 router1 switch group switch group nsrc@PacNOG5 Papeete, French Polynesia

Generic Service Configuration define service{ define service{ name name active checks enabled active checks enabled passive checks enabled passive checks enabled parallelize check parallelize check obsess over service obsess over service check freshness check freshness notifications enabled notifications enabled event handler enabled event handler enabled flap detection enabled flap detection enabled process perf data process perf data retain status information retain status information retain nonstatus information retain nonstatus information is volatile is volatile check period check period max check attempts max check attempts normal check interval normal check interval retry check interval retry check interval notification interval notification interval notification period notification period notification options notification options register register } } generic-service generic-service 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 24x7 24x7 5 5 5 5 1 1 60 60 24x7 24x7 c,r c,r 0 0 nsrc@PacNOG5 Papeete, French Polynesia

Individual Service Configuration define defineservice{ service{ host name switch1 host name switch1 use generic-service use generic-service service description PING service description PING check command check-host-alive check command check-host-alive max check attempts 5 max check attempts 5 normal check interval 5 normal check interval 5 notification options notification options c,r,f c,r,f contact groups switch-group contact groups switch-group }} nsrc@PacNOG5 Papeete, French Polynesia

Beeper/SMS Messages It's important to integrate Nagios with something available outside of work Problems occur after hours. (unfair, but true) A critical item to remember: an SMS or message system should be independent from your network. You can utilize a modem and a telephone line Packages like sendpage, qpage, gnoki can help. nsrc@PacNOG5 Papeete, French Polynesia

Some References http://www.nagios.org/ http://sourceforge.net/projects/nagiosplugins http://www.nagiosexchange.org/ http://www.debianhelp.co.uk/nagios.htm http://www.nagios.com/: Commercial Nagios support Nagios, by O'Reilly Media, Inc. Nagios. System and Network Monitoring, by Wolfgang Barth. nsrc@PacNOG5 Papeete, French Polynesia

Back to top button