1. Install FreeBSD
2. Download and Compile Necessary Software
2.1 Install Polygraph
2.2 Install netperf
2.3 Install gnuplot
3. Understanding WebAxe-4 IP Addressing
3.1 WebAxe-4 Routing
4. Test Your Network
4.1 Manually add some alias addresses
4.2 Make sure clients and servers can ping each other
4.3 Netperf tests
5. Set up the WebAxe-4 workload
5.1 Edit workload files
5.2 Run the routing configuration scripts
5.3 Configure Dummynet
6. Run a no-surrogate test
7. Prepare your surrogate for testing
7.1 Ping the surrogate
7.2 Run msl_test
8. Test your Surrogate
8.1 Copy all ".pg" files to every client and server
8.2 Start polysrv processes
8.3 Start polyclt processes
9. Analyze the results
9.1 Copy all log files to a single location
9.2 Label the logfiles
9.3 Generate a report
9.4 View the results
For the technical description of WebAxe-4 please see the WebAxe-4 documentation.
We recommend running Polygraph on FreeBSD. We use FreeBSD for all our official tests, including the Cache-Off. You can use another Unix operating system if you really need to.
Recommended minimum hardware:
To install FreeBSD, please see our Setting Up FreeBSD page.
Get Polygraph version 2.7.4 from http://www.web-polygraph.org/downloads/. Unpack and install it:
% cd /tmp % wget http://www.web-polygraph.org/downloads/srcs/polygraph-2.7.4-src.tgz % tar xzvf polygraph-2.7.4-src.tgz % cd polygraph-2.7.4 % ./configure --prefix=/usr/local/polygraph-2.7.4 % make all % sudo make install
Add /usr/local/polygraph-2.7.4/bin to your PATH, or perhaps make symbolic links in the /usr/local/bin directory.
Later versions of Polygraph may be used. If you are trying to reproduce other results, be sure to use the same version that others have used. Not all Polygraph versions are backward-compatible.
Get and install netperf from www.netperf.org or from our FTP site.
NOTE: netperf-2.1pl3 does not compile out-of-the-box on FreeBSD. Before running 'make' you need to edit makefile and add __FREEBSD__ to line 86:
CFLAGS = -O -D$(LOG_FILE) -DUSE_LOOPER -D__FREEBSD__
To use the automatic report generation programs, you'll need to install gnuplot with PNG support.
You can get gnuplot from ftp.gnuplot.vt.edu. You'll also need libpng-1.0.11.tar.gz and zlib-1.1.3.tar.gz, which you can find in the same FTP directory.
Installing gnuplot looks like this:
% ftp ftp.gnuplot.vt.edu ftp> cd pub/gnuplot ftp> get gnuplot-3.7.1.tar.gz ftp> get libpng-1.0.11.tar.gz ftp> get zlib-1.1.3.tar.gz ftp> bye % tar xzf zlib-1.1.3.tar.gz % cd zlib-1.1.3 % ./configure && make % sudo make install % tar xzf libpng-1.0.11.tar.gz % cd libpng-1.0.11 % ln -s scripts/makefile.std Makefile % make % sudo make install % tar xzf gnuplot-3.7.1.tar.gz % cd gnuplot-3.7.1 % ./configure --with-png % make % sudo make install
For WebAxe-4 we bind the client alias address to loopback interfaces and use the real network interfaces as routers. This keeps ARP tables smaller because each machine needs just a single MAC address for each other machine. However, it also complicates the whole setup because we need to configure a full mesh of routes on each PC.
The standard WebAxe-4 addressing scheme is shown in the figure below:
In the above figure, X represents a "bench-id." It is the only part of the IP addresses that you should change. Cache-off participants are assigned bench-id's on the first day of testing.
Each Polygraph client uses a group of addresses that fit into "/22" subnet bound to the loopback (lo0) interface. The fxp0 interfaces (on the 172.16.X.0/24) subnet act as routers. Thus, each server and surrogate needs a routing table so that the they can talk to the Polygraph robot agents, which are bound to the 10.X.0.0 addresses. We'll talk more about routing in a while.
This addressing scheme allows for up to 31 client/server pairs. If each pair generates 500 TPS, the total maximum throughput for a WebAxe-4 test is 15,000 TPS.
NOTE, the figure shows multiple surrogates, but multiple surrogates are allowed only with an interception configuration. In that case, the Ethernet switch must have L4/7 features and be configured to intercept HTTP traffic and divert it to the surrogate array.
We expect that some surrogates may not support complicated routing tables (as are required in this scheme). In this case, the Ethernet switch may be configured as a router, and the surrogate may use the switch's IP address as a default route. The switch must then be given the routes so that it knows how to reach the different 10.X.0.0 subnets.
We also expect that some cache-off entries may not support complicated routing in the surrogate, AND do not want to use a routing Ethernet switch. In this case the rules allow a router to be used without affecting the reported price. This configuration is shown in the following figure:
The following shell script creates the routes necessary for a WebAxe-4 test. You'll need to assign X with your bench-id before running the script. Of course, the script must run as root to modify the routing tables.
You must run the script on each Polygraph server, the monitoring PC if you have one, and the surrogate. If your surrogate does not support complicated routes, and you're using the router option, then the router must be configured with similar routes.
#!/bin/sh X=13 p=1 while test $p -lt 32; do j=`expr \( $p - 1 \) \* 4` c=`expr $p + 60` route add -net 10.$X.$j.0/22 172.16.$X.$c p=`expr $p + 1` done
If you'll be running a lot of tests, then you probably want to make sure that script runs automatically each time a system is booted.
If your bench has less than 31 client-server pairs, the above script will create some routes that will not be used during the test. That is not a problem.
As of Polygraph version 2.6, the polyclt and polysrv processes automatically create IP alias addresses. However, in order to test your network setup, you'll need to manually add some aliases. You can just use the ".1" address at the beginning of each /22 subnet group. In these examples, the prompt shows the hostname where you should run each command:
clt01# ifconfig lo0 alias 10.X.0.1 netmask 255.255.192.0 clt02# ifconfig lo0 alias 10.X.4.1 netmask 255.255.192.0 clt03# ifconfig lo0 alias 10.X.8.1 netmask 255.255.192.0 ...
You can use ping to test routing and connectivity. Be sure to use the -S option to set the source IP address to one of the loopback alias addresses. For example, to ping the first server from the first client:
% ping -S 10.X.0.1 172.16.X.191
You should take the time to ping more than just one server:
% ping -S 10.X.0.1 172.16.X.192 % ping -S 10.X.0.1 172.16.X.193 ...
And ping from other clients as well:
% ping -S 10.X.4.1 172.16.X.191 % ping -S 10.X.4.1 172.16.X.192 % ping -S 10.X.4.1 172.16.X.193 ...
Start the netserver daemon on every polygraph machine:
Then Run the netperf client on each polygraph machine. For example:
srv01# netperf -l 30 -H 10.X.0.1 -t TCP_STREAM clt01# netperf -l 30 -H 172.16.X.191 -t TCP_STREAM
You should make sure that a client-server pair runs netperf in both directions at the same time. This guarantees that your network is operating well in full-duplex mode. If everything is good, netperf reports a throughput of about 80 MBit/s.
For a unidirectional netperf test, you should get about 92-95 MBit/s.
For longer tests, increase the -l <length> value.
When editing and understanding WebAxe-4 workload files, note that all WebAxe-4 input parameters are set as totals perceived by the surrogate(s) under the test. If a device under test is comprised of several units, treat it a single "big" surrogate for Polygraph configuration purposes.
You should use the exact same configuration files for all polyclt and polysrv processes. No manual adjustments for the number of polyclt processes is needed; all adjustments are done automatically in webaxe-4-guts.pg file which is included from the webaxe-4.pg file.
When in doubt or puzzled by a contradicting or insufficient documentation, do not try to guess the right setting; double check with us instead.
Copy webaxe-4.pg from the /usr/local/polygraph-2.7.4/workloads/ directory into a new working directory.
Do NOT edit any of the files in the workloads/include directory.
All of our examples here use X to represent the bench-id variable. You'll need to choose a value for X in your own testing. At the Cache-off, bench-id values will be between 100 and 199.
Edit your copy of webaxe-4.pg and define the following variables:
TheBench.client_side.addr_space = [ 'lo0::10.X.0-123.1-250/22' ];
TheBench.client_side.hosts = [ '172.16.X.61-62' ];
TheBench.server_side.hosts = [ '172.16.X.191-192:80' ];
TheBench.proxy_side.hosts = [ '172.16.X.32:80' ];
Also note that this peak rate value is used to determine which IP addresses to use for robot and server agents.TheBench.peak_req_rate = 1000/sec;
rate FillRate = 75% * TheBench.peak_req_rate;
or justsize ProxyCacheSize = 50GB + 4GB;
size ProxyCacheSize = 54GB;
size WSS = 1GB;
Given the above settings, a complete WebAxe-4 configuration looks like this:
#include "benches.pg" Bench TheBench = benchWebAxe4; TheBench.client_side.addr_space = [ 'lo0::10.X.0-123.1-250/22' ]; TheBench.client_side.hosts = [ '172.16.X.61-62' ]; TheBench.server_side.hosts = [ '172.16.X.191-192:80' ]; TheBench.proxy_side.hosts = [ '172.16.X.32:80' ]; TheBench.peak_req_rate = 1000/sec; rate FillRate = 75% * TheBench.peak_req_rate; size ProxyCacheSize = 50GB + 4GB; size WSS = 1GB; #include "webaxe-4-guts.pg"
If you haven't already executed the routing script (given previously), do so now. You may need to run the same or similar script on your surrogate, or configure similar routes on your switch/router.
On all polygraph clients, run
# ipfw -f flush # ipfw pipe 1 config delay 40ms plr 0.0005 # ipfw pipe 2 config delay 40ms plr 0.0005 # ipfw add pipe 1 ip from any to 10.X.0.0/16 in # ipfw add pipe 2 ip from 10.X.0.0/16 to any out
On all polygraph servers, run:
# ipfw -f flush
Check your work! Ping a client from a server and you should see round trip times of about 80 msec.
Before starting these tests you should reboot all clients and servers to give them a "clean" configuration.
The polygraph clients and servers should be able to sustain your peak request rate without a surrogate involved. The surrogate must not be connected to the network during this test.
On each polygraph client you would run:
Similarly on the servers:% polyclt --config webaxe-4.pg --verb_lvl 10 --ports 3000:30000
You may want or need additional polyclt/polysrv options. For example, the location of the "workloads/include" directory (or its copy) needs to be specified using the --cfg_dirs option; logging may be enabled using the "log" option; etc.% polysrv --config webaxe-4.pg --verb_lvl 10
We recommend running the no-surrogate test for 30-60 minutes at peak load. To create your custom no-surrogate workload, follow these steps:
% cd polygraph-2.7.4/workloads/ % cat webaxe-4.pg include/webaxe-4-guts.pg > /tmp/my.pg
To send robots requests directly to Polygraph servers, set R.origins (i.e., origins field of your robot configuration R) to S.addresses.The platDur variable can be adjusted to make the test shorter. The phFRamp, phFill phases can be commented out so you don't have to wait very long to get to the peak rate.
The difference in response time among phases should be marginal in a no-surrogate test. Response times should be about 0.53 seconds. If the reply rate and response time look good for at least 30 minutes of peak load, you can stop the no-surrogate test. If response time looks bad, re-examine your network setup or workload config.
Make sure your surrogate has an address on the subnet and that all clients and servers can ping it. Don't forget about the -S option:
% ping -S 10.X.0.1 172.16.X.32
From a polygraph client or server machine, run the msl_test program against your surrogate. This program uses some low-level IP packets to determine the MSL setting for your TCP stack.
Sample usage is:
clt01# ./msl_test -i fxp0 -s 10.X.0.1 -d 172.16.X.32 -p 80
The final argument (port number) should be the port number where your surrogate accepts requests. It can not be any random port.
During this test, you will not be able to send any other traffic from the source machine to the surrogate.
When finished, the program reports the TIME_WAIT value that it found. This value is twice the MSL value. Cache-off rules require the TIME_WAIT value to be 60 seconds. If the msl_test program reports a number smaller than 60 seconds, you may be in violation of the rules. Violators will be disqualified.
For more information, read msl_test.html.
% polysrv --config webaxe-4.pg --verb_lvl 10 --log srv.log
% polyclt --config webaxe-4.pg --verb_lvl 10 --log clt.log
NOTE: you may want to use additional or different command line parameters. For example you may want to save the polygraph stdout/stderr to a file for later reference.
We usually monitor experiments using the 'polymon' program. In order to use 'polymon' you must must use the --notify option to polyclt and polysrv. You must also run the udp2tcpd deamon on the host that is receiving the notification messages.
Polygraph includes a set of scripts, called ReportGen that you can use to display the results.
Use label_results to label logfiles with a single name.
% cd /usr/local/polygraph-2.7.4/ReportGen ./label_results mytest1 /where/ever/clt.*.log
Use make_report to make graphs and an HTML page describing the results:
% ./make_report mytest1
Use netscape or another browser to view the report. You may need to copy the files to an HTTP server.