FreeBSD, Network Architecture, Server and OS

Final Health Checking Script

This is going to be a reasonably short and quick entry.  Last week I went through the process of demonstrating using BGP Anycast on a server in place of a load balancer.  The follow-up post described the health-checking script that I wrote in python to check whether the server was healthy or not.  That health-checking script would then inject the BGP route if the server was healthy, and withdraw the route if unhealthy.

However, I felt the script could use a bit more intelligence, so I kept working at it.  In the previous script, a static variable called service was first set to “down”, which represented the fact that the BGP route was not being announced.  Then in the main loop:

  • If the Apache server was healthy
    • If the service variable was “down”, meaning the BGP route was not being announced
      • Inject the route
      • Set the service variable to “up”
    • Otherwise do nothing
  • If the Apache server was unhealthy
    • If the service variable was “up”, meaning the BGP route was being announced
      • Withdraw the route
      • Set the service variable to “down”
    • Otherwise do nothing

The new version of the script looks like this:


# Loops forever, at an interval defined below, checking the health of the local
# Apache server.  If the server is up, the list of Ethernet interfaces defined
# below will be brought up.  If down, they'll be brought down.
# Best to start this with nohup.
#  nohup &
import urllib3
import socket
import subprocess
import time

# Some variables we'll be using.
# Change as needed.
server = ""  # server's IP
httpport = "80"  # server's port (80 or 443)
index = "/index.html" # file we'll grab during the health check
hc_interval = 5 # health check interval, in seconds
ASN = "65300"  # server's BGP ASN

# These variables probably don't need changing.
url = "http://" + server + index # URL we'll be grabbing to health check
route_add = "/usr/local/bin/vtysh -c 'enable' -c 'config term' -c 'router bgp " + ASN + "' -c 'network " + server + "/32' -c 'exit' -c 'exit'"
route_del = "/usr/local/bin/vtysh -c 'enable' -c 'config term' -c 'router bgp " + ASN + "' -c 'no network " + server + "/32' -c 'exit' -c 'exit'"
route_check = "/usr/local/bin/vtysh -c enable -c 'show ip bgp " + server + "/32' -c exit | grep available"

# isOpen(IP_addr, Port)
# Checks to see if it can open a TCP connection to IP:Port.
# Returns True if it can, False otherwise
def isOpen(ip, port):
	s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
		s.connect((ip, int(port)))
		return True
		return False

# Main loop forever, until killed.
while(not time.sleep(hc_interval)):	
	# Set the stdout/stderr variables; we'll need the stdout one for the loop
	# to make sure the route is or isn't being sent
	result = subprocess.Popen(route_check, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
	stdout,stderr = result.communicate()
	if stdout.decode('utf-8'):   # set the string to the ASCII output, if there is any
		routing = True
		routing = False
# Use our isOpen() function along with a URL request to see if:
# A) the server is accepting connections on its HTTP port (L4)
#   AND
# B) we can pull the HTML file successfully. (L7)
# A success on both will mean the server is healthy.	
	if isOpen(server, httpport) and urllib3.PoolManager().request('GET', url).status == 200:
		if not routing:  # we're not announcing the route, shell=True)  # inject the route
		if routing:  # we are announcing the route, shell=True)  # withdraw the route

I got rid of the static service variable completely.  Now the main loop is using some intelligence to check and see if the server is announcing the prefix or not, before it does any injection or withdrawal.  At the beginning of the loop, you can see I’m calling the subprocess.Popen function, and asking the external vtysh application: is the route being sent in BGP?  If it is, set the variable called routing to True, otherwise set it to False.

The loop then does pretty much the same thing as the previous loop, except it doesn’t manually set the routing variable to True or False after changing the routing.  The routing is actually checked with each loop.

Further Changes?

I don’t think I’m going to continue developing this health-checking script any further.  This was just used as an example of what could be done.  However, were I serious about this, I might add a way to parse arguments, such as the server’s ASN, the prefix, and the health-checking interval.  Further, I might do a few extra error checks, perhaps.

But, again, not at this point in time.


Leave a Reply