How to Run a SrvLb-L4-4 Test

This document describes how to run a SrvLB-L4-4 test. TMF will use similar procedures at the switch-off.

Table of Contents

1. Simulated Internet, an Overview
2. Network Addressing Scheme
3. Install FreeBSD 4.2/4.3
4. Download and Compile Software
    4.1 Polygraph
    4.2 Gnuplot
5. Setup The Workload
    5.1 Edit Workload File
    5.2 Select Client-side Address Mask
    5.3 Configure the VIP
    5.4 Configure Routes
    5.5 Configure DummyNet
    5.6 Configure Forsome
6. Prepare the DUT for Testing
7. Test The Network
    7.1 Ping Connectivity
    7.2 Polyprobe
8. Run a No-Balancing Test
9. Test the DUT
    9.1 Start the Test
    9.2 Wait, Look, Listen
10. Analyze Test Results
    10.1 Collect Log Files
    10.2 Label Log Files
    10.3 Generate A Report

1. Simulated Internet, an Overview

SrvLB-L4-4 tests include a network setup that emulates a large, routed network with segments that have packet loss and delays to simulate WAN conditions.

bench setup

Since robot and server agents use different subnets, some device must route packets between them. In most cases, the servers have static routes configured for the client subnets. However, some load balancers require replies to be explicitly routed through themselves so that packets can be meddled with on the reply path. In these cases, the DUT must have routes configured, and the server routes point to the DUT rather than the clients.

2. Network Addressing Scheme

Several IP addresses used in the workload files depend on ``bench ID'' B. Select any ID from 11 to 99. At the switch-off, each entry is assigned a unique ID from that range.

3. Install FreeBSD 4.2/4.3

We recommend running Polygraph on FreeBSD. Another UNIX operating system can probably be used. FreeBSD is recommended because of the capability of the DummyNet tool to simulate WAN conditions for the client subnets.

Recommended minimum hardware:

To install FreeBSD, please see our Setting Up FreeBSD page. You can find a kernel configuration suitable to the srvlb-l4-4 tests elsewhere.

4. Download and Compile Software

4.1 Polygraph

Get the most recent Polygraph 2.6 version from www.web-polygraph.org. Unpack and install it. For example,


	royal> cd /tmp
	royal> wget http://www.web-polygraph.org/downloads/srcs/polygraph-2.6.0-src.tgz
	royal> tar -xzvf polygraph-2.6.0-src.tgz
	royal> cd polygraph-2.6.0
	royal> ./configure --prefix=/usr/local/polygraph-2.6.0
	royal> make && sudo make install

4.2 Gnuplot

Polygraph's automatic report generation programs require gnuplot complete with PNG support. Gnuplot sources can be obtained from ftp.gnuplot.vt.edu. For PNG support, the libpng and zlib libraries are also required. Fortunately, you can get these from the same place as Gnuplot.


	royal> ftp ftp.gnuplot.vt.edu
	ftp> cd pub/gnuplot
	ftp> mget gnuplot-3.7.1.tar.gz libpng-1.0.8.tar.gz zlib-1.1.3.tar.gz
	ftp> exit

	royal> tar -xxzf gnuplot*.gz libpng*.gz zlib*.gz
	royal> cd zlib-1.1.3
	royal> ./configure
	royal> make && sudo make install

	royal> cd libpng-1.0.8
	royal> ln -a scripts/makefile.std Makefile
	royal> make && sudo make install

	royal> cd gnuplot-3.7.1
	royal> ./configure --with-png
	royal> make && sudo make install

5. Setup The Workload

The exact same configuration file should be used for Polygraph clients and servers. srvlb-l4-4.pg and nolb-l4-4.pg are the only files that you need to change. Do not change any of the #included files, or your test will not comply with this workload.

5.1 Edit Workload File

Copy the srvlb-l4-4.pg and nolb-l4-4.pg files from the Polygraph source distribution workloads directory into a new working directory.

Edit srvlb-l4-4.pg and nolb-l4-4.pg and define TheBench.client_side.hosts and TheBench.server_side.hosts to contain the ``real'' (primary) IP addresses of the client and server machines, respectively. For example, to test with three pairs of machines:

5.2 Select Client-side Address Mask

Tell Polygraph which address mask to use on the client machine's loopback interfaces. This controls the IP addresses of the simulated subnets on each client machine.

Note : lo0 is the name of the interface that Polygraph will create alias addresses on for Robots to bind to. For SrvLB-L4-4 tests, this should be the loopback interface -- which may have a different name under operating systems other then FreeBSD (Linux calls it lo).

5.3 Configure the VIP

The virtual IP (and port) of the load balancer must be set in the srvlb-l4-4.pg file. TheBench.proxy_side.hosts is used to store this value so that all configurable parts of the workload live in TheBench object.

5.4 Configure Routes

As explained in Section 1, the simulated network for these tests is routed. Depending on whether the DUT needs reply traffic to pass through it, there are two options for routing.

First, the configuration where the servers route replies directly to the client. As an example, assume we have 3 machines running as Polygraph clients and TheBench.client_side.addr_mask is set to 'lo0::10.B.0.0'. Each server machine would have the following routes.

Second, the configuration where the DUT routes to the clients. As an example, assume we've 3 machines running as Polygraph clients and TheBench.client_side.addr_mask is set to 'lo0::10.B.0.0'. Each server machine has a route for 10.B.0.0/16 gateway 172.16.B.32 (the DUT). The DUT has 3 static routes configured.

5.5 Configure DummyNet

The SrvLB-L4-4 test simulates WAN conditions between the clients and the DUT with DummyNet feature of FreeBSD. DummyNet is FreeBSD specific.

On all polygraph servers, there are no delays. Use the following command to clear all DummyNet rules.


	# ipfw -f flush

On all polygraph clients, there is a 100ms packet delay in each direction for traffic to/from the alias subnet.


	# ipfw -f flush
	# ipfw pipe 1 config delay 100ms plr 0.0010
	# ipfw pipe 2 config delay 100ms plr 0.0010
	# ipfw add pipe 1 ip from any to 10.B.0.0/16 in
	# ipfw add pipe 2 ip from 10.B.0.0/16 to any out

As always, replace 'B' with whatever bench ID you selected above.

5.6 Configure Forsome

The Polygraph distribution comes with a tool for running commands via RSH/SSH on a large number of computers. This can be very helpful for starting polyprobe or Polygraph tests on a large number of machines.

The forsome script can be found in the Polygraph distribution in the tools/BB/ directory. To configure forsome for the test network, edit the following section of the file so that 'cl1' becomes the IP addresses of the client machines, and 'sv1' the IP addresses of the server machines.


# these are primary host addresses, not aliases!
my %Hosts = (
        cl1 => [&range2cls('172.16.B.61-63') ],
        sv1 => [&range2svs('172.16.B.191-193') ],
);

If you are using several benches, you can also replace cl1 with clB (or just add clB line) and do the same for sv1. For simplicity, we will assume you did not do that.

You can use forsome to execute a command on all client machines, all server machines, or all machines:


	# clients
	% forsome cl1 [command]

	# servers
	% forsome sv1 [command]

	# all machines
	% forsome gr1 [command]

Actually, forsome also takes an explicit IP range as the first parameter so you can run any command on any range of hosts regarless of the settings above.

Finally, prepending a dash before the command name, makes forsome to execute the command locally, placing current address into environment variable $host. This mode is useful for copying files from and to the monitoring machine.


	# push all workload files to remote hosts
	% forsome gr1 - scp -pr workloads \$host:

	# collect some client log files into the logs/ directory
	% forsome 172.16.B.61-63 - scp \$host:clt.log logs/clt.\$host.log

6. Prepare the DUT for Testing

Ensure that the DUT can ping all the server machines. Before each test, the DUT should be power cycled or otherwise rebooted.

As for the actual SLB configuration, the DUT should be configured to have a single pool of servers containing all the Polygraph servers. Any selection metric is acceptable.

Any health check method supported by Polygraph or FreeBSD is acceptable, Polygraph will generate non-empty responses for any URL beginning with /health.

Keep in mind that a snapshot of DUT configuration file(s) will be taken at the time of the switch-off tests.

7. Test The Network

Testing the performance of the network is important to eliminate low-level network-related problems such as bad cables or incorrect duplex/simplex settings.

7.1 Ping Connectivity

Ensure that the clients and servers can ping each other.


	% ping 172.16.B.61
	% ping 172.16.B.191
	...

Ensure that the clients can ping the VIP.


	% ping 172.16.B.254

7.2 Polyprobe

Polyprobe is a simple network performance testing tool included in the Polygraph distribution. Polyprobe tests network throughput between any number of computers, in a 'full-mesh' fashion. That is, throughput between each client and every server is tested.

To run a Polyprobe test on a network with 3 clients and 3 servers, the following command should be executed on each client.


	# polyprobe --duration 4min --clients 172.16.B.61-63 \
		--servers 172.16.B.61-63:2323

And on each server:


	# polyprobe --duration 5min --clients 172.16.B.191-193 \
		--servers 172.16.B.61-63:2323

Polyprobe should report total bandwidth approximately equal to the number of interfaces * interface bandwidth * 80%. If any individual client-server connection shows significantly less bandwidth, inspect network setup, including cables and duplex settings on the interfaces.

Here is a sample Polyprobe output for a device capable of handling no more than 100Mbit/sec of total traffic.


	#link         client_address        server_address  inMbps outMbps  conn   err
	    1            172.16.B.61     172.16.B.191:2323    9.65    9.86     1     0
	    2            172.16.B.61     172.16.B.192:2323    9.55    9.68     1     0
	    3            172.16.B.61     172.16.B.193:2323    9.60    9.79     1     0
	    4            172.16.B.62     172.16.B.191:2323    9.21    8.99     1     0
	    5            172.16.B.62     172.16.B.192:2323    9.09    8.87     1     0
	    6            172.16.B.62     172.16.B.193:2323    9.17    8.94     1     0
	    7            172.16.B.63     172.16.B.191:2323    9.54    9.35     1     0
	    8            172.16.B.63     172.16.B.192:2323    8.88    9.19     1     0
	    9            172.16.B.63     172.16.B.193:2323    8.94    8.84     1     0

	    0                    any                   any   81.41   81.31     9     0

At the time of this writing, Polyprobe is still an experimental tool with some minor inconveniences, such as a requirement to have servers run for a longer time then clients.

8. Run a No-Balancing Test

The Polygraph clients and servers should be able to sustain the targeted peak request rate without a load balancing device involved in the test.

Copy all workload files, including a modified copy of srvlb-l4-4.pg to all client and server machines. Then, start the test. On each polygraph client run:


	# polyclt --config nolb-l4-4.pg --verb_lvl 10 --ports 3000:30000

On each Polygraph server run:


	# polysrv --config nolb-l4-4.pg --verb_lvl 10

Other Polygraph command-line options may be needed / desired. For example, the location of the workload include files needs to be specified (--cfg_dirs). Also, to enable logging of a test, use --log [filename].

Response time during the no-balancing test should not increase by more then 400ms during the test. If the no-balancing test does not complete successfully, re-examine the network setup and workload configuration.

The no-balancing workload differs from the SrvLB-L4-4 workload in that instead of establishing connections to the VIP, robot agents establish connections directly to the server agents. Also, the robot agents try to minimize connection establishment rates to mimic connection establishment pattern during a SrvLB-L4-4.pg test.

Note that rebooting the Polygraph PCs and DUT between tests may help ensure consistent results and reduce the number of ``surprises'' caused by unforeseen inter-test dependencies.

9. Test the DUT

You may want to read all instructions before proceeding with the tests. Standard tests may take a couple of hours so it may be a good idea to practice on very short (say, 5 minute) tests first. Look for Phase definitions in your workload file.

9.1 Start the Test

Start polysrv processes.


	% polysrv --config srvlb-l4-4.pg --verb_lvl 10 --log srv.log

Start polyclt processes.


	% polyclt --config srvlb-l4-4.pg --verb_lvl 10 --log clt.log

It may be prudent to save Polygraph stdout/stderr output to a file for later reference. Use the --console [filename] option for that.

9.2 Wait, Look, Listen

You can monitor a running Polygraph test with the polymon program. In order to use polymon, the --notify command-line arguments must be used with polyclt and polysrv.


	% polyclt --config srvlb-l4-4.pg --verb_lvl 10 --log clt.log \
		--notify 172.16.B.100:18256

Watch for reasonable request rates and response times. A dramatic, sustained increase in response time on the client side could indicate a failing test. Console output may show errors worth investigating.

10. Analyze Test Results

10.1 Collect Log Files

Copy all the log files from the completed test to a single location.

10.2 Label Log Files

Use label_results to 'label' (and, essentially, aggregate) the logs. The examples use 'theTest' as a label, but you should be more creative.


	% label_results theTest clt*.log

10.3 Generate A Report

Use Polygraph's report generator to make a report of the labeled results; reports are HTML files with PNG images for graphs.


	% make_report theTest

To view the report, simply use any graphics-capable browser and point it to the URL at the end of make_report output.


	% mozilla file:/tmp/polyrep/theTest/index.html

Moving the report to an HTTP server is probably optional if you have a browser on the machines where the reports are generated. To move the report, copy the corresponding directory from /tmp/polyrep/ archive.