The Fourth Cache-Off

The Official Report

December 18, 2001

The Measurement Factory, Inc.
Alex Rousskov, Matthew Weaver, Duane Wessels
team@measurement-factory.com

We held a week-long benchmarking ``Cache-off'' for caching proxies in the middle of November, 2001. Using the Web Polygraph benchmark, we tested 14 caching proxies from 10 different organizations. In this report, we summarize performance data collected during these tests and analyze the results.

Table of Contents

1. Introduction
    1.1 Timeline
    1.2 Cache-off Participants
    1.3 Terminology
    1.4 How (not) to Read this Report
    1.5 Where to find more information
2. Executive Summaries
    2.1 Unavailable Data
3. Performance Details
    3.1 PolyMix-4 Details
    3.2 WebAxe-4 Details
    3.3 Downtime test
4. Product Configurations
5. Comments
    5.1 Polyteam Comments
    5.2 Vendor Comments
6. The Rules
7. Web Polygraph
    7.1 The Cache-off Workload: PolyMix-4
8. Benchmarking Environment
    8.1 Location
    8.2 Schedule
    8.3 Polygraph Machines
    8.4 Time Synchronization
    8.5 Network Configurations
    8.6 Numbers
9. Test Sequence
    9.1 PolyMix-4
    9.2 WebAxe-4
    9.3 Downtime Test
    9.4 MSL Test
10. Cache-Off Controversies
    10.1 Delays at US customs
    10.2 Yet another pricing problem
    10.3 One entry was not ready
    10.4 Two Hit Ratios
    10.5 Phase synchronization bug

1. Introduction

The Cache-off addresses a need in the web caching community for high quality, independent verification of product performance. This event represents a snapshot in time of the caching industry. The results presented here are all taken during a one-week period. In most cases, a product's performance will change over time as the vendor makes improvements, fixes bugs, and adds new features.

We strive for fairness in our testing. Decisions regarding the rules and testing environment are made well in advance with input from Cache-off participants. Any company or organization willing to test the performance of their product(s) is given the opportunity to participate in this event. We describe the actual rules and testing environment later, in the Rules and succeeding sections.

A number of well-known caching companies have chosen to not participate in this Cache-off, for one reason or another. We can assure you that all of these companies were fully aware that these tests were taking place. We encourage you to ask their sales representatives why they do not participate in public benchmarks, and to be very suspicious of results produced by the company itself.

For the first time, we are testing caches in both ``forward'' (client-side) and ``reverse'' (server-side) configurations. The PolyMix-4 workload simulates traffic that a forward cache receives. Similarly, the WebAxe-4 workload simulates the traffic for a reverse proxy, also known as a surrogate or server accelerator.

1.1 Timeline

Preparations for the fourth Cache-off began with an organizational meeting in May, 2001. Representatives from 6 companies attended this meeting with the intention to participate in the Cache-off. During the meeting, we prioritized the features to be added for PolyMix-4. The meeting attendees also told us that it was important for them to publish benchmarking results for their products configured as surrogates. Thus, we added WebAxe-4 to the sequence of tests to run at the fourth Cache-off.

The first version of Web Polygraph designed for this Cache-off was released in July 2001. The version used at the Cache-off is dated September 2001. Since then, the code and workloads remained frozen until the start of the Cache-off on November 12th. Participants had plenty of time to practice and prepare for the competition.

1.2 Cache-off Participants

The following companies and organizations brought products to the fourth Cache-off:

Even though they registered, Arlotto Comnet, Inc decided to not participate.

1.3 Terminology

Throughout this report we use a few terms that have specific meaning for the Cache-off. A vendor is an organization that has a caching product. To simplify the terminology, all commercial, non-profit, virtual, etc. organizations are labeled as ``vendors.'' Some vendors are actually two or more companies working together, under an O.E.M. agreement for example. A vendor is allowed to bring more than one product to the Cache-off. Each product or entry that a vendor brings counts as one participant. We have one bench (``harness'') for every participant. PolyMix-4 and WebAxe-4 are the names of the workloads that we use for these tests. As with previous tests, these are standardized workloads that we develop in cooperation with vendors. We'll talk more about PolyMix-4 and WebAxe-4 later in this report.

1.4 How (not) to Read this Report

We strongly caution against drawing hasty conclusions from these benchmarking results. Our report contains a lot of performance numbers and configuration information; take advantage of it. The products that we test differ greatly, and it is tempting to draw conclusions about participants based on a single performance graph, or one column in a table. We believe such conclusions will virtually always be wrong. Here are a few recommendations to prevent misinterpretation of the results.

  1. Always read the Polyteam and Vendor Comments sections.
  2. Compare several performance factors: throughput, response time, hit ratio, etc. Weigh each factor based on your preferences.
  3. Do not overlook pricing information and price/performance analysis.

Our benchmark addresses only the performance aspects of caching proxies. Any given product has numerous features that are not addressed here. For example, we think that manageability, reliability, and support are very important attributes that should be considered in your buying decisions.

1.5 Where to find more information

The Measurement Factory maintains the Official Results Site, where this report and the detailed Polygraph log files from the Cache-off are stored. All information at the Official Results Site is freely available.

There are no other official sources of Cache-off results.

Documentation, sources, independent test results, discussion mailing lists, and other information related to Polygraph benchmark are available at the Web Polygraph site.

We discuss only major performance measurements in this report. If you'd like more details about a particular entry, consult these resources:

The links above are also useful if you are afraid of being influenced by our interpretation of the results. We still recommend reading the report afterwards (as a ``second opinion'') because not all test rules and performance matters will be clear from the raw data.

2. Executive Summaries

The ``Executive Summary'' tables below summarize the performance results. We provide an in-depth analysis of the measurements in the next section, titled ``Performance Details.'' We present the summary information in three separate tables:

The first table contains the PolyMix-4 and downtime test results from 12 products.

Baseline PolyMix-4 Results
Product Total
Price

(US$)
Peak
Throughput
Response Time
(sec)
Savings
(%)
$1,000
can buy
Minimum
Downtime

(minutes)
Cache
Age

(hour)
req/sec Mb/sec Hit All Miss Xact Bwidth hit/sec req/sec
Aratech C3100 14900 1500 83 0.101 1.40 2.70 51.18 33.60 52 101 3.07 17.45
BBGCache 7595 700 38 0.053 1.32 2.64 51.92 34.52 48 92 2.50 15.61
Chamomile 2500 300 16 0.307 1.68 2.88 46.79 29.21 56 120 n/a 10.48
iCache  400 3250 800 44 0.081 1.30 2.66 52.79 38.01 130 246 1.01 10.11
iCache 2500 26950 2700 144 0.021 1.26 2.67 52.93 38.73 53 100 3.24 15.40
iMimic 9895 1100 60 0.140 1.49 3.00 52.58 38.34 59 111 n/a 31.44
Kotetu 2000 200 11 0.199 1.53 2.75 47.86 29.11 48 100 4.35 26.73
Pyramid 4950 1276 69 0.041 1.30 2.66 51.88 37.31 134 258 1.85 15.04
Stratacache E-55 3895 500 27 0.024 1.26 2.66 53.20 39.13 68 128 n/a 6.44
Stratacache F-120 8695 1200 65 0.028 1.31 2.66 51.29 37.03 71 138 2.15 8.39
Swell 1000 2549 130 7 0.025 1.64 2.69 39.41 31.69 20 51 1.96 35.86
TNCLABS 9000 1250 67 0.017 1.29 2.65 51.78 37.54 72 139 1.49 8.12

The second table contains PolyMix-4 and downtime test results for two Stratacache Dart products. These are listed separately and do not appear in the bar charts because they have limited user licenses and cannot be sold in small quantities. We feel that this complicates the performance/price analysis (see detailed discussion in the third Cache-off report). Stratacache calls their limited-license products ``microcaches'' because they were originally designed for small networks with 10 or fewer network-attached devices (e.g., a residential or library setting). The name also fits well with products' tiny size and set-top box appearance. Additional information on the Microcaches is available in the Stratacache vendor comment section. We invited Stratacache to bring a couple of these boxes for testing because they may indicate a new niche in the caching market.

PolyMix-4 Results for Microcaches
Product Total
Price
(US$)
Peak
Throughput
Response Time
(sec)
Savings
(%)
$1,000
can buy
Minimum
Downtime
(minutes)
Cache
Age
(hour)
req/sec Mb/sec Hit All Miss Xact Bwidth hit/sec req/sec
Stratacache D-10 699 120 7 0.047 1.34 2.76 52.45 35.06 90 172 1.42 24.27
Stratacache D-25 999 325 17 0.017 1.29 2.65 51.58 38.04 168 326 n/a 9.31

The final summary table contains the WebAxe-4 results. As you can see, only two products elected to run this test. This is disappointing given the feedback we received from vendors during the April planning meeting.

WebAxe-4 Results
Product Total
Price
(US$)
Peak
Throughput
Response Time
(sec)
Savings
(%)
$1,000
can buy
req/sec Mb/sec Hit All Miss Xact Bwidth hit/sec req/sec
Chamomile 2500 1000 46 0.244 0.38 0.69 69.30 52.00 277 400
Swell 1000 2549 260 12 0.242 0.35 0.64 70.74 56.28 72 102

Column headings in the above tables are links to bar charts that compare the corresponding measurement. Row headings are labels with short names for the tested products. These labels also have links to pages with configuration and performance details for each product.

The ``Total Price'' represents the list price of the product. This includes both hardware and software. Furthermore, if the vendor uses advanced networking equipment, that too is included in the Total Price. However, Ethernet switches providing basic layer two connectivity are not included in the list price. This is a departure from previous tests where networking equipment always contributed to the price. Our rationale is that most organizations already have networking equipment in place and would not consider it's cost when purchasing caching proxies. If, however, a product requires layer four features to achieve higher performance, the cost of such equipment should be included. None of the products mentioned in this report use advanced networking features. They all use basic layer two Ethernet switches.

NOTE: It is likely that some vendors will lower their prices after seeing their competitor's results in this report. Be sure to read the ``Vendor Comments'' for pricing changes and other important information.

The ``Peak Throughput'' column depicts the highest tested request rate for each product. PolyMix-4 and WebAxe-4 have a number of different phases, each with a different, or varying throughput. In the summary tables, we report the response rate during the 4 hour top2 phase, when the load is at its peak and the cache is more likely to be in a steady state.

The ``Response Time'' group has three columns. We report the mean response time for cache hits and misses separately to emphasize performance differences on the two most important request paths. The ``All'' column depicts mean response time for all request classes.

The ``Savings'' column shows the percentage of transactions (Xact heading) and bytes (Bwidth heading) that the product served as cache hits.

The ``$1000 can buy'' columns shows performance/price ratios. We use two performance measurements: hit rate, or number of cache hits per second (the ``hit/sec'' column), and request rate (the ``req/sec'' column). Both measurements are normalized by Total Price (in thousands of dollars). In other words, the data shows ``how many hits or requests per second do I get for a thousand dollars?'' Some participants feel that hit performance/price is a more important measurement than overall throughput. For example, a product with poor hit ratio may still score well on overall throughput. On the other hand, the hit throughput measurement can be misleading because the hit ratio you achieve on a production system may be significantly different than for these tests.

The ``Minimum Downtime'' columns in the PolyMix-4 tables contain the results of the downtime test. Here we report how long it takes the product to serve a cache miss after suffering a power outage.

The ``Cache Age'' column estimates the cache capacity in terms of hours of PolyMix-4 peak fill (i.e., cachable miss) traffic. For example, a reading of 10 means that, at the peak request rate, the cache becomes full after 10 hours, and must begin replacing objects. We believe that, in a production environment, caches should be large enough to hold 2-3 days worth of traffic. Unfortunately, in this cache-off environment, products can get away with a cache age of 5-6 hours -- just enough to store the working set window.

Regarding errors, all published performance tests finished with less than 0.1% of failed transactions. Note that the rules disqualify a run with more than 1.0% of errors.

2.1 Unavailable Data

You will notice that some of the table entries are filled with ``n/a'' to indicate unavailable data. The Chamomile, Stratacache E-55, and iMimic entries were unable to complete the downtime test. As tested, their systems require manual intervention to boot up.

3. Performance Details

This section gives a detailed analysis of major performance measurements.

3.1 PolyMix-4 Details

The PolyMix-4 workload has several phases. For the baseline presentation, we have selected the top2 phase. Top2 is the second 4hour phase with peak request rate. The first peak phase, top1, often yields unstable results. The second top phase is usually more stable.

The bar charts below are based on data averaged across the top2 phase. Averages are meaningful in situations where performance does not change with time, or when changes are smooth and predictable. We encourage the reader to check individual entry reports for the exceptional behavior where averages may be less meaningful.

As with any benchmark, Polygraph introduces its own overheads and measurement errors. We believe that margin of error for most results discussed here is within 10%. In most cases, however, the reader should pay attention to patterns and relative differences in product performance rather than absolute figures.

Depending on the version of the report you have selected, the entries appear in either alphabetical or numerical order for each metric

Normalized Throughput

Presenting throughput results in a way that pleases and makes sense to everybody is a daunting task. Due to tremendous differences in request rates, a simple graph with raw request rates from the ``Executive Summary'' table is not very informative. Moreover, comparing throughput of a large, $65K system to a small, $3K PC is usually not interesting. Product prices do vary a lot.

To begin your analysis, you might first pick out products that are in your price range:

Product Prices

You may also want to pick out products that meet your demands for HTTP traffic:

Raw Throughput

We prefer to normalize the throughput results by some universal measure of product complexity and ability. Several measures have been proposed, including product price, rack space, and disk spindles. Price normalization is imperfect because the true price is difficult to determine, especially for free software products. Rack space normalization presents problems for entries that were not tested in a rack-mountable configuration. Normalizing by the number of disks neglects the differences in RAM sizes and disk throughput or capacity. We select price as a normalizer.

We also need to choose which throughput metric to normalize: overall throughput, or cache hit throughput? Overall throughput is useful for capacity planning, if you know how many HTTP requests per second your users generate. However, it does not account for the fact that we are measuring caches here. A non-caching proxy could score well on normalized overall throughput, yet lose in all other categories. The normalized hit throughput, on the other hand, uses the rate at which the tested product can deliver cache hits.

As it turns out, whether we select hit throughput or overall throughput doesn't significantly change the ranking of tested products. Most entries' positions are the same on both charts.

To emphasize the importance of caching traffic, we select the normalized hit rate graph for the baseline presentation:

Normalized Hit Rate

The normalized graph not only provides a fair comparison but answers an important question: ``How many hits per second can one thousand dollars buy?''

There appears to be no strong correlation between performance/price ratio and absolute throughput (or price): Products showing good return on a dollar can be found on both ends of the throughput scale.


Note that it doesn't make sense to normalize all of the performance metrics by product price. For example, hit ratios and response times should approach some ``perfect'' level regardless of the product's cost. Furthermore, the hit ratio and response time measurements are limited by the workload characteristics. We shouldn't look too closely at the absolute values for these metrics. Rather, we should care more about how close a tested product comes to the ``ideal'' value.

Hit Ratio

Hit ratio is a standard measurement of a cache's performance. PolyMix-4 offers a hit ratio of about 55% -- a cache cannot achieve a higher hit ratio in these tests. However, due to various overload conditions, insufficient disk space, deficiencies of object replacement policy, and other reasons, the actual or measured cache hit ratio may be smaller than the offered level.

Document Hit Ratio

The ``Document Hit Ratio'' chart shows how a cache maintains cache hit ratio under highest load. Almost all of the products achieve a hit ratio higher than 50%.

The two primary reasons for a less-than-ideal hit ratio are excessive load and insufficient disk space. For most caching products, disk I/O is the bottleneck. Bypassing the disks for some requests allows the proxy to absorb more load. If the cache size is smaller than the working set window, the proxy won't be able to store all responses that Polygraph expects to result in cache hits. We can actually estimate the age at which a particular product begins purging cached objects, which we do in the next section.

Cache ``Age''

To estimate the maximum age of cached objects, we divide cache capacity (as specified by the vendor) by the fill rate during the top2 phase. The latter is the rate of cachable misses as measured by the Polygraph client. Raw fill stream measurements can be found in the individual entry reports. We believe that our formula yields a ``good enough,'' albeit not precise, approximation of real world measurements.

Cache Age

Many cache administrators believe that a production cache should store about 2-3 days of traffic. Due to the differences between the ``accelerated'' benchmarking environment and real-world conditions, the 2-3 days rule of thumb probably corresponds to some 10 hours of cache age.

The cache capacity requirement depends on your environment. When configuring a caching system based on our performance reports, make sure you get enough disk storage to keep sufficiently ``old'' traffic. You may need to increase the price and re-compute performance/price ratios if a product you are considering does not have enough storage. You should also check that the product is actually available with the additional disk space. These adjustments may significantly affect the choice of a price-aware buyer.

Response Time

To simulate real-world conditions, PolyMix-4 introduces an artificial delay on the server side. The server delays are normally distributed with a 2.5 sec mean and 1 sec deviation. These delays play a crucial role in creating a reasonable number of concurrent ``sessions'' in the cache.

To simulate WAN server side connections, we introduce packet delays (80 msec round trip) and packet loss (0.05%). These delays increase miss response times and, more importantly, reward caches for using persistent connections (TCP connection setup phase includes sending several packets that also incur the delay).

The delays, along with the hit ratio, affect transaction response time. The ideal mean response time for this test is impossible to calculate precisely because the model is too complex. We estimate the ideal mean response time at about 1.3 seconds. Mean response time in a no-proxy environment is about 2.8 seconds.

Absolute response time figures are important in understanding the benchmark environment, but are of little value when comparing the Cache-off results with a given real-world setup. Indeed, every particular cache deployment will have different hit ratios and server-side delays. Thus, while providing the mean response time measurements as a reference, we select the ``Response Time Improvement'' graph for the baseline presentation:

Mean Response Time Improvement

The above graph shows the relative reduction of mean response time achieved by the cache compared to a no-proxy (direct) test, or: ``How much faster will an average reply be if a cache is deployed?'' It shows the (direct - proxied)/direct ratio for mean response times. We hope that the ratios reported here will be close to the real-world performance of the tested products.

Hit Ratios affect, but do not define response times. In an ideal scenario, it takes a negligible amount of time to deliver a cache hit to the client. Fast cache hits decrease average response times. In the same unrealistic scenario, it takes only about 2.6 seconds to deliver a cache miss. In practice, both hits and misses may incur significant overheads.

The hit and miss response time charts show that hits are primarily responsible for the differences in overall response time measurements.

3.2 WebAxe-4 Details

Unfortunately, there is not much to say about the WebAxe-4 results since only two entries took the test. We won't insult you with bar charts showing just the two products. Please refer to the WebAxe-4 summary table in the previous section.

As you can see, both products have about the same price, but the Chamomile entry supports a significantly higher throughput than the Swell 1000.

The Swell 1000 products performs only slightly better than Chamomile in terms of response time (hits and misses), byte hit ratio, and bandwidth savings.

Although the Chamomile entry has six SCSI disks, they were not used for the WebAxe-4 test. Instead, the responses were cached only in memory. Of course, this is possible because the WebAxe-4 workload has a working set size of only 1GB.

3.3 Downtime test

The downtime test is designed to estimate the time it takes a product to recover from an unexpected condition such as power outage or software failure. Polygraph measures the time until the first miss, which approximates the minimum downtime of the cache.

Minimum Downtime

Polygraph can also measure the time until the first hit. However, from a user's point of view, the time until the first miss is somewhat more important. As soon as the caching system is able to deliver misses, the user is able to access the Web again. Delivering hits is important to reduce outgoing bandwidth usage and from a quality-of-service point of view. All tested caches were able to deliver hits immediately or shortly after forwarding the first miss.

Chamomile, Stratacache E-55, and iMimic entries were not able to complete their downtime tests. These products required manual intervention to reboot (i.e., pushing the power button) because their BIOSes do not allow for an automatic boot after the power has been turned back on. Such intervention is prohibited by the Cache-off rules.

The precision of this test is around five seconds.

4. Product Configurations

Here are the configuration details for all tested products.

Label Full product name Price
(US$)
Avail
able
(mm/yy)
CPU
(MHz)
RAM
(MB)
Cache disks
(n · GB)
Rack
Space
(RU)
Cache
(GB)
Software
Aratech C3100 Aratech C3100 14900 01/02 1000 1024 5 · 36 3 180 Jaguar2000
BBGCache BBG B1100 7595 01/02 1000 1024 2 · 36 3 72 Jaguar2000
Chamomile Chamomile 2500 11/01 450 1024 6 · 09 3 24 Chamomile
iCache  400 CinTel iCache 400 3250 11/01 850 512 2 · 30 2 57 iMimic DataReactorCore on FreeBSD 4.4
iCache 2500 CinTel iCache 2500 26950 11/01 1500 2048 8 · 36 4 276 iMimic DataReactorCore on FreeBSD 4.4
iMimic iMimic DataReactor 2100 9895 11/01 1462 1024 4 · 60 n/m 233 iMimic DataReactorCore on Red Hat Linux 7.2
Kotetu NAIST Kotetu v1.5 2000 11/01 550 512 6 · 08 n/m 50 Linux 2.4.8 + Kotetu v1.5
Pyramid i-Cache C 4950 11/01 700 1024 3 · 46 1 132 iMimic DataReactorCore on FreeBSD 4.4
Stratacache D-10 Stratacache Dart D-10 699 11/01 300 128 1 · 20 n/m 20 iMimic DataReactorCore on FreeBSD 4.4
Stratacache D-25 Stratacache Dart D-25 999 11/01 850 256 1 · 20 n/m 20 iMimic DataReactorCore on FreeBSD 4.4
Stratacache E-55 Stratacache Express E-55 3895 11/01 933 512 1 · 36 n/m 20 iMimic DataReactorCore on FreeBSD 4.4
Stratacache F-120 Stratacache Flyer F-120 8695 11/01 1000 1024 2 · 36 1 72 iMimic DataReactorCore on FreeBSD 4.4
Swell 1000 Swell Technology Tsunami 1000 + Squid-2.5 2549 12/01 1000 2048 2 · 20 1 40 Linux-2.4.13 + Squid-2.5
TNCLABS TNCLABS Cachework CE1200 9000 11/01 933 1024 2 · 36 1 69 iMimic DataReactorCore on FreeBSD 4.4

The ``Cache disks'' and ``Cache'' columns refer to configurations for the PolyMix-4 test. The two products that took the WebAxe-4 test were configured to use less disk space.

The iCache 2500 product was configured to use a GBit network card. All other products were using 100Mbit NICs.

Details about each product configuration, including networking gear specs and cache tuning parameters, are available on individual product pages, linked from row headings in the table above.

5. Comments

5.1 Polyteam Comments

5.2 Vendor Comments

It is a Polyteam tradition to give cache-off participants a chance to comment on the results after they have seen the review draft. The comments below are verbatim vendor submissions. Polyteam has not verified any of the claims, promises, or speculations that these comments may contain.

AraNetwork Technologies
http://www.aranetwork.com

ARA Networks participated in the 4th Cacheoff with JAGUAR2000, which was introduced last event. Since then, JAGUAR2000 has constantly developed for better performance and versatile functions. In this event, JAGUAR2000 showed the best peak throughput among 100 M peers. This proven performance is the result of ARA's constant endeavors, and thanks to it, JAGUAR2000 is highly regarded with its distinguished performance and functions meeting various needs of market.

ARA Networks tries to prove technical quantum leap with JAGUAR3000 cache engine, which will be tested out of race. JAGUAR3000, developed with MCT (Minimal Context-switching Thread, ARA's own architecture) that provides the optimized thread programming, is a high-performance cache engine. It utilizes the efficiency of SMP to the fullest extent. JAGUAR3000 also shows good scalability, and can be ported easily to various operating systems (Solaris, Linux, FreeBSD, Windows).

With the introduction of JAGUAR3000, JAGUAR2000 will be provided at much lowered price than estimated. ARA Networks will deliver the Cache-off entry performing 1500req/sec at around $9000. By interworking with already developed Streaming Media Cache or Dynamic Cache, it will be used for faster delivery of diverse contents. In addition, we will add further advanced features to JAGUAR2000 and develop derivatives equipped with JAGUAR2000 engine.

We'd more than appreciate your constant interest in ARA Networks, which advances every year through annual Cache-Off.

Broadband Gateway, Inc.
http://www.wdb.co.kr
http://www.cachenet.com (English supported Web site)

Broadband Gateway is a Japan-based CDN service provider. We are currently delivering CDN services to a wide range of customer in Japan and planning to market ITM solutions including BBGCache. BBGCache newly added to our product portfolio is a high-end Web caching solution for ISPs and enterprises. Designed on an optimal hardware architecture, BBGCache is about to hit the Japanese market with a variety of functions and enhanced scalability. In addition to the diversified caching functions basically provided, it has been already adopted in CDN services to provide special functions required in the CDN system.

BBGCache will be widely deployed as a differentiated solution to deliver value-added features such as virus scanning and contents filtering. Since it has employed Ploymix 4 and there was a miscommunication in class definition, its estimated price in the summary should be much lowered. BBGCache will be priced at around $3,000 when it is unveiled in Japan in January 2002. The solution will be applauded for its excellence in price, performance and functionality.

JST and NAIST
http://iplab.aist-nara.ac.jp/~eiji-ka/chamomile
Eiji Kawai <eiji-ka@is.aist-nara.ac.jp>

Chamomile is a WWW proxy system developed as a research product from scratch. It employs a multi-threaded architecture and highly optimized utilization of main memory to cache WWW objects. It works on UNIX variants, such as FreeBSD, Solaris, and Linux. Among them, we are developing it on Linux (kernel version 2.4) because of its low kernel overhead and efficient manipulation of threads. All the cache-off tests are done on Red Hat 7.1 with linux-2.4.13 kernel.

The original research goal of Chamomile was to achieve high performance especially as a reverse proxy and it was developed just for that purpose at the early stage. One of the key technologies there is its object replacement algorithm for memory cache. Chamomile achieved high hit ratio at the webaxe-4 test using just only main memory to cache objects. Of course, currently, Chamomile also works fine as a forward proxy. It leaves, however, much room for performance improvement, because it simply stores each web object into a single file on a file system. This weak point will be resolved by our future work.

As a comment of the cache-off test, we have to account for the host configuration at first. Because our server host did not reach Boulder in time, we could not try any tests with it. Fortunately, the polyteam let us use their server host (slightly old but very fine!!) and carry out polymix-4 and webaxe-4 tests for once respectively. In fact, our original host had two Athlon-MP 1.2GHz CPUs, 1GB of memory, and four 10krpm Ultra3-SCSI drives. The major difference between our original host configuration and that of the borrowed one is its CPU power.

Finally, we would like to greatly thank the polyteam for their kind efforts to complete our tests. Without their help, we could not publish any results. For additional information about chamomile, please contact Eiji Kawai.

CinTel Co., Ltd.
http://www.cintel.co.kr/
bjkim@cintel.co.kr

Cintel Co., Ltd. is pleased to be able to demonstrate the excellence and performance of its iCache Web caching products at this important event.

Cintel iCache was among the top performer in its Peak Throughput, Hit Ratio, Minimum Downtime Improvement, Response Time Improvement and almost the other part. And, the competitive price for this caching server combined with outstanding overall performance metrics provide an excellent resource to a greater spectrum of individuals and businesses for optimal caching solution with an affordable blend of power and manageability. This new generation cache server with superior performance will be a standard for cache solutions targeted at small to medium size businesses, enterprise networks.

For more information on the entire line of Cintel Cache Appliance Solutions, visit http://www.cintel.co.kr. And if you have any question on our proud product iCache, please e-mail us.

CinTel thanks Polyteam for their dedicated efforts in developing the Polygraph benchmarking software and coordinating the Cacheoff event to a success.

iMimic, Inc.
http://www.imimic.com
North America
    David Devine
    VP - North American Business Development
    972.572.1060
    david@imimic.com
International
    Filip Vandenbussche
    VP - International Business Development
    713.586.5541
    filip@imimic.com

We at iMimic are delighted by the metrics posted by our OEM partners:

Building on the success of past cache-offs, iMimic has again proven that our DataReactor Core software provides the best performance available at all points in the hardware spectrum. One OEM achieved 120 requests per second from an appliance little bigger than a cellphone; another captured the overall performance crown at 2700 requests per second, nearly twice the speed of the nearest competitor. In the bandwidth savings category, our OEMs won the top 7 places, and the top 6 in Hit Throughput per $1,000.

With our own entry, iMimic set a new record for performance on Linux, beating the previous record (also held by us) by 50% and increasing the gap between our systems and the closest Linux competitor to almost an order of magnitude difference in throughput. Our entry used a standard Red Hat Linux 7.2 installation and only one out of two processors available in the unit. This configuration is an ideal system for integrating value-added edge services, via the DataReactor platform or running on the operating system itself, such as SSL offloading, filtering, compression, or HTML/WML conversion. This single-box solution offers extremely high performance for proxy caching while still leaving resources available for other services.

iMimic would like to thank The Measurement Factory team for a very well-planned event.

NAIST
http://infonet.aist-nara.ac.jp/products/kotetu/

KOTETU team of NAIST (Nara Institute of Science and Technology) would like to thank TMF for the opportunity to evaluate our system.

KOTETU is one of products from our study. Our goal is to solve problems in WWW caching and to discover secrets of caching systems through its implementation. KOTETU is an open source caching system for work group or department caching service. It is designed to run upon generic UNIX systems without special OS, filesystem and device. The Cache-Off is good place to evaluate the system.

By a trouble in transportation, our equipments arrival was delayed. In wednesday afternoon, we gave up to wait that because no enough time to test if they would arrive thursday or friday, and built an another entry machine using a TMF's memorial machine. At that time, our left time was short (2 days and several hours). We installed OS and KOTETU, and tune them before bench marking. We had not enough time to fine optimization for the machine. However, the results in this Cache-Off is better in that situations.

The machine is little older. You can find the machine in the report of 3rd Cache-Off. Since it is used in few years, its price in today market is not clear. TMF and we discuss the price of the machine and estimate as $2,000. The price of software is zero because KOTETU is an open software.

PYRAMID COMPUTER Systeme GmbH
http://www.pyramid.de
Claudia Steinberg
Product Manager Pyramid Solutions
+0049.761.4514.827
+0049.761.4514.700

Money and rack space - are any two things more important to a cache buyer? With breakthrough price/performance and performance/rackspace metrics, the Pyramid iCache C - the only European cache to be tested - establishes itself as second to none in these areas. We were pleased to see that our iCache C achieved the highest throughput per $1,000, highest hit throughput per $1,000, and highest throughput per rack space unit.

A full rack of our iCache C product is capable of handling 53,592 Polymix-4 requests per second, unachievable by any other participant. Such high performance demonstrates the iCache's readiness for even the most demanding network environments.

Like the rest of our iCache line, the iCache C is carefully engineered for excellent performance, high reliability, and a low total cost of ownership. To this end, the iCache C provided fast recovery from power interruptions, response time improvement within 3% of the overall winner, and a document hit ratio within 2% of the overall winner.

Pyramid has experience in a wide variety of cache deployments. Our one-disk model is well-suited for kiosks, stores, gas stations, and other CDN points of presence. For more information on the iCache C or any of our other models, please contact us or visit our website.

We'd like to thank The Measurement Factory for a well-organized and fair event.

Stratacache
http://www.stratacache.com
info@stratacache.com
(800) 244-8915 (US)
+1 (937) 224-0485 (international)

Stratacache focuses on developing high performance caching, streaming media and content distribution appliance products for a broad range of industries. As you can see in the Cache-Off report, we provide the key technology components that can be used in a cost effective, distributed site caching or CDN infrastructure. Stratacache is also pioneering the use of Microcaching (the Stratacache Dart series) for small office or remote branch office sites (libraries, bank branches, kiosk clusters, Internet cafes, etc).

Unlike some of our competitors, our small enterprise products (the Stratacache Express and Stratacache Flyer units) are based on higher performance SCSI Ultra-3 disks. The use of SCSI does not necessarily show a performance difference in the Polygraph HTTP benchmark, but we have found that when serving streaming media content from Real Networks, Microsoft, Quicktime or MPEG 1, 2 or 4, that SCSI is of substantial benefit. Details on our large enterprise and carrier products, including the Stratacache Meteor, Metroliner and Superliner are available on our web site and additional Polygraph performance reports on these products are also available.

Stratacache would also like to take this opportunity to thank The Measurement Factory Team for their continued development of the Web Polygraph benchmark. The caching industry owes a debt of gratitude to this team for their work in building a benchmark that helps this industry continue to evolve.

We would also like to point out that if you are considering purchasing caching appliance products from companies such as Cisco, Network Appliance, Infolibria, Cacheflow, Inktomi, Dell, IBM, Compaq, 3COM, F5, or HP, that NONE of these companies wanted you to be able to judge the performance and capabilities of their products in this open testing environment. Please understand that the Polygraph benchmark is an open testing platform that allows you as a customer to test a caching product yourself and make sure that any vendors marketing claims meet the real performance capabilities you desire. If you want to know how product A compares with Product B but don't have the time to work with the Polygraph benchmark, The Measurement Factory also holds private tests for customers where you can choose the products that you wish to compare and get a non-biased report from an independent third party.

Swell Technology
http://www.swelltech.com
info@swelltech.com
+1 (512) 506-9394

We would first like to thank the Measurement Factory for their hard work and dedication providing an unbiased benchmarking event for web caching vendors. This years event was the smoothest run yet, with few controversies and even fewer Polygraph bugs to be worked out at the last minute.

This year, we decided to continue our tradition of testing the latest Squid version available. So our test software was a daily snapshot of Squid from a few days before the event, a version destined to be version 2.5. Due to a few issues with the load-shedding features of this Squid version, our server unfortunately exhibited a rather low hit ratio. Older variants of Squid do not exhibit this problem, as shown by previous Swell cache-off entries. We will, of course, fix this problem before shipping systems based on this new Squid version.

Otherwise, we are pleased with the results. Squid continues to improve in performance, while adding new features and greater stability under extreme conditions. The proven compatibility and large feature set of Linux and Squid, combined with our easy to use web based management tools and great support makes the Tsunami server line an excellent caching value. Additional new features in the Tsunami system, including NNTP news caching, transparent FTP caching, our unique transparent bridging features, and cluster-capable management tools makes it a featureful and cost effective alternative to expensive proprietary solutions.

Finally, we would like to extend our gratitude to the Squid development team who make our presence at these events possible by building the software around which we have based our business.

TNCLABS, Inc.
http://www.tnclabs.com
See Eng Huat
Managing Director, Asia Pacific Office
TNC Labs
65.461.6099

TNC Labs prides itself on providing excellent performance at good values across all of our product lines. Our cache-off entry, the CE-1200, is no different: blistering performance (1st place out of all entrants in hit response time), for a reasonable price (3rd place in price/performance, out of all 12 entries). We have introduced a range of caches appropriate to needs of all sizes, from the 1-disk CE500 to the 10-disk CE5500.

Established in 1993, TNC Labs is a leading provider of a wide range of Internet access and data communication products and solutions specifically designed for remote offices, mobile professionals, small and medium enterprises, and multi-national companies. Through its team of experienced research and development engineers and software specialists, TNC Labs develops and manufactures a complete range of LAN/WAN products including the integrated Internet server, PCMCIA cards, LAN cards, modems and hubs. The newly added caching technology achieves the best combination of price-performance, response time and hit ratios.

"Our award-winning caches allow organizations to optimize their bandwidth usage and solve network bottlenecks and inefficiencies so that their existing infrastructure can be used to generate greater revenue" said See Eng Huat, Managing Director of TNC Labs. "Thanks to the efficiency of iMimic's DataReactor software, we can offer excellent performance at a fraction of the prices charged by some competitors. Our American headquarters location keeps us in touch with the cutting edge of the global caching marketplace."

For more information, please contact us or visit our website.

TNC Labs extends its thanks to The Measurement Factory team for a fair and well-run event. Only vendor-independent benchmarks enable customers to make the best decisions.

6. The Rules

The majority of the rules were defined and agreed upon in conjunction with participants during the May 2001 organizational meeting in Denver, Colorado. Most of the rules are the same as from previous testing events. The core set of rules are available in the documentation of the fourth Cache-off. Here, we will highlight some of the more interesting and controversial provisions.

Product Availability

Tested products must be available for sale to the public at the prices given in this report within two months after the start of testing. This rule is difficult, if not impossible to enforce. Nonetheless, if you discover that one of the products described here is not being offered, please let us know.

TCP Maximum Segment Lifetime

Tested products must use a TCP MSL value of at least 30 seconds. A lower setting can improve performance by recycling recently-used TCP ports at a higher rate. We verify MSL settings before running performance tests.

Minimum Performance Requirements

At least 28% of responses must be cache hits during the top2 phase of PolyMix-4. Some products may be able to trade lower hit ratio for higher throughput. PolyMix-4 and WebAxe-4 results with document hit ratio less than half of the ideal ratio are disqualified. Similarly, measured mean response time must not exceed no-proxy response time (about 2.8 seconds) by more than 100%.

Entry Limits

There is no limit on the number of products that a single vendor can bring to be tested.

Publication of Results after the Cache-off

Companies that participate in the cache-off can publish new results two months after this report is released. This rule is in place to prevent a company from testing a ``token'' product at the cache-off and then publishing results for different products right after finding out how their competition performs.

Non-participating companies must wait three months. The rationale for this rule is similar. Some people feel that a company has an advantage if they can skip the cache-off, learn about their competition, and then publish a result from the same test shortly after.

We are in the process of re-evaluating these rules in order to get more products tested and more results published.

Referencing Cache-off Results

Any work that is derived from, or uses any of the cache-off results, or this report, must include the following reference to our official site:

A. Rousskov, M. Weaver, and D. Wessels, The Fourth Cache-off. Raw data and independent analysis at <http://www.measurement-factory.com/results/>.

7. Web Polygraph

Web Polygraph is a high-performance proxy benchmark. Polygraph is capable of generating a whole spectrum of Web proxy workloads that either approximate real-world traffic patterns, or are designed to stress a particular proxy component. Developed with the Cache-off needs in mind, Polygraph is able to generate complex, high request rate workloads with negligible overhead. Web Polygraph has been successfully used to debug, tune, and benchmark most caching products.

The Polygraph distribution includes two programs: polyclt and polysrv. Poly-client (-server) emits a stream of HTTP requests (responses) with given properties. The requested resources are called objects. URLs generated by Poly-client are built around object identifiers or oids. In short, oids determine many properties of the corresponding response, including response content length and cachability. These properties are usually preserved for a given object. For example, the response for an object with a given oid will have the same content length and cachability status regardless of the number of earlier requests for that object.

While it runs, Polygraph collects and stores many statistics, including: response rate, response time and size histograms, achieved hit ratio, and number of transaction errors. Some measurements are aggregated at five second intervals, while others are aggregated over the duration of the whole phase.

For the Cache-off tests, we used version 2.7.4 of Web Polygraph. Web Polygraph is available to anyone at no charge in source code format.

7.1 The Cache-off Workload: PolyMix-4

The PolyMix environment has been modeling the following Web traffic characteristics since PolyMix-3:

These features were added for PolyMix-4:

With PolyMix-4, the clients emit requests with hostnames in the URLs. Previously, they always used IP addresses. This change is important because it makes the workload more realistic. We don't expect, however, the use of DNS names to have a significant effect on performance because DNS responses are cachable. If nothing else, it proves that the tested product properly supports DNS lookups.

We also added support for HEAD and POST requests to PolyMix-4. Again, this is primarily to ensure that the tested product supports these request methods. A product that does not support POST, for example, could easily pass the PolyMix-3 test, but would be worthless to most anyone trying to deploy that product in a production environment.

Another problem with PolyMix-3 is that the offered byte hit ratios are always larger than the offered document hit ratio. This is just the opposite of what we observe in real proxy traces. Production caches typically report byte hit ratio numbers that are at least 10% lower than document hit ratio numbers. For PolyMix-4, we've added a discriminator that causes BHR to be less than DHR. This change may has important performance implications on the workload and caches because it affects the size distribution of hits.

Finally, we've added aborted transactions to further increase the realism of PolyMix. Tested products must be able to deal with unexpected termination of both requests and responses. These aborted transactions do not contribute to the error counts.

The following figure shows the various phases and offered load levels for a PolyMix-4 test. Not counting the fill phase, the test takes about 12 hours. Filling the cache usually takes an additional 3-12 hours, depending on the product.

One of the rules of PolyMix-4 is that the request rate during the fill phase must not be greater than the peak rate (as used in top1 and top2). Otherwise, participants are free to choose virtually any fill rate the like. Usually, the selected fill rate is at least 50% of the peak request rate. We do not present the fill rate parameters in this report, but they can be derived from the logs.

Most measurements discussed in this report are taken from the top2 phase when the proxy is more likely to be in a steady state.

Reply Sizes

Object reply size distributions are different for different content types (see the table below). Reply sizes, as observed from the client side, range from 300 bytes to 5 MB with an overall mean of about 6.8 KB and a median of 4 KB. The server size distribution has 9KB mean and a median of about 5KB. The client- versus server-side difference is due to the fact that smaller objects are more popular. The reply size depends only on the oid. Thus, the same object always has the same reply size, regardless of the number of requests for that object.

Consult individual product reports for the actual size distributions measured at the cache-off. Also, prior to the Cache-off we ran some tests and logged the individual reply sizes. You can see the corresponding histogram.

Cachable and Uncachable Replies

Polygraph servers mark some of their responses as uncachable. The particular probability varies with content types (see the table below). Overall, the workload results in about 80% of all responses being cachable. The real world cachability varies from location to location. We have chosen 80% as a typical value that is close to many common environments.

A cachable response includes the following HTTP header field:

	Cache-Control: public

An uncachable response includes the following HTTP header fields:

	Cache-Control: private,no-cache
	Pragma: no-cache

Object cachability depends only on the oid. The same oid is always cachable, or always uncachable.

Life-cycle model

Web Polygraph is capable of simulating realistic (complex) object expiration and modification conditions using Expires: and Last-Modified: HTTP headers. Each object is assigned a ``birthday'' time. An object goes through modification cycles of a given length. Modification and expiration times are randomly selected within each cycle. The corresponding parameters for the model are drawn from the user-specified distributions.

The Life-cycle model configuration in PolyMix-4 does not utilize all the available features. We restrict the settings to reduce the possibility that a cache serves a stale response. While stale objects are common in real traffic, caching vendors strongly believe that allowing them into the benchmark sends the wrong message to buyers.

Consecutively, all Polygraph responses in PolyMix-4 carry modification and expiration information, and that information is correct. The real-world settings would be significantly different, but it is difficult to accurately estimate the influence of these settings on cache performance.

Content Types

PolyMix-4 defines a mixture of content types. Each content type has the following properties:

The approximate parameters for the first four properties are given in the table below. For exact definitions, see the workload files.

Type Percentage Reply Size Cachability Expiration
Image 65.0% exp(4.5KB) 80% logn(30day, 7day)
HTML 15.0% exp(8.5KB) 90% logn(7day, 1day)
Download 0.5% logn(300KB,300KB) 95% logn(0.5year, 30day)
Other 19.5% logn(25KB,10KB) 72% unif(1day, 1year)
Latency and Packet Loss

PolyMix-4 uses the same latency and packet loss parameters that we used for PolyMix-3. The Polygraph client and server machines are configured to use FreeBSD's DummyNet feature.

We configure Polygraph servers with 40 millisecond delays (per packet, incoming and outgoing), and with a 0.05% probability of dropping a packet. Server think times are normally distributed with a 2.5 second mean and a 1 second standard deviation. Note that the server think time does not depend on the oid. Instead, it is randomly chosen for every request.

We do not use packet delays or packet loss on Polygraph clients.

If-Modified-Since Requests

Conditional HTTP requests with an If-Modified-Since (IMS) header represent significant portion of the Web traffic (10%-20%). Generation of If-Modified-Since requests was significantly improved compared to earlier workloads. PolyMix-4 robots no longer use fake dates for the IMS header. Instead, the last modification time (LMT) is used if a short ``304 Not Modified'' response is thought and the previous LMT is used if a complete ``200 OK'' response is required. The portion of ``200 OK'' responses to IMS requests is controlled by the workload parameter and is set to 66%. Overall, 15% of PolyMix-4 robot requests are IMS requests.

A cache may receive If-Modified-Since requests for the objects that are not in the cache and even for the objects that have never been seen by a cache. Caches should be able to deal with this perhaps somewhat unusual, but nevertheless real situation.

Cache Hits and Misses

PolyMix-4 workload has a 58% offered hit ratio. In the workload definition, this is actually specified through the recurrence ratio (i.e., the probability of revisiting a Web object). The recurrence ratio must account for uncachable responses and special requests. In PolyMix-4, a recurrence ratio of 72% yields an offered hit ratio of 58%. Note that to simplify analysis, only ``basic'' requests are counted when hit ratio is computed; special requests (If-Modified-Since and Reload) are ignored because in many cases there is no reliable way to detect whether the response was served as a cache hit.

Polygraph enforces the desired hit ratio by requesting objects that have been requested before, and should have been cached. There is no guarantee, however, that these objects are in the cache. Thus, our parameter (58%) is an upper limit. The hit ratio achieved by a proxy may be lower if it does not store some cachable objects, or purges previously cached objects before the latter are revisited. Various HTTP race conditions also make it difficult, if not impractical, to achieve ideal hit ratios.

Object Popularity

PolyMix-4 introduces a ``hot subset'' simulation into the popularity model. At any given time, a 1% subset of the URL working set is dedicated to receive 10% of all requests. As the working set slides with time, the hot subset may jump to a new location so that all hot objects stay within the working set. This model is designed to simulate realistic Internet conditions, including ``flash crowds.'' We have not yet fully analyzed the effect of this hot subset model.

Simulated Robots and Servers

A single Polygraph client machine supports many simulated robots. A robot can emulate various types of Web clients, from a human surfer to a busy peer cache. All robots in PolyMix-4 are configured identically, except that each has its own IP address. We limit the number of robots (and hence IP aliases) to 1000 per client machine.

A PolyMix-4 robot requests objects using a Poisson-like stream, except for embedded objects (images on HTML pages) that are requested simulating cache-less browser behavior. A limit on the number of simultaneously open connections is also supported, and may affect the request stream.

PolyMix-4 servers are configured identically, except that each has its own IP address.

Persistent Connections

Polygraph supports persistent connections on both client and server sides. PolyMix-4 robots close an ``active'' persistent connection right after receiving the N-th reply, where N is drawn from a Zipf(64) distribution. The robots will close an ``idle'' persistent connection if the per-robot connection limit has been reached and connections to other servers must be opened. The latter mimics browser behavior.

PolyMix-4 servers use a Zipf(16) distribution to close active connections. The servers also timeout idle persistent connection after 15 sec of inactivity, just like many real servers would do.

Other details

A detailed treatment of many PolyMix-4 features is available on the Polygraph Web site, along with the copies of workload configuration files.

8. Benchmarking Environment

8.1 Location

We rented a former Pearl Street storefront in Boulder, Colorado for the Cache-off event. The building was quite cozy and the upper floor became quite warm with all the machines running.

8.2 Schedule

Testing took place from Monday, November 12 through Friday, November 16, with some tests finishing on Saturday. As described in the rules, participants are guaranteed at least 55 hours of testing time. Vendors had access to the cache-off facility from 9 AM until 8 PM each day. Tests were often scheduled to run overnight.

8.3 Polygraph Machines

For this Cache-off, we used our new PolyI tool to manage the tests. PolyI has a Web interface that easily allows us to create, start, monitor, stop the tests and generate the reports. The tool also handles distribution of FreeBSD kernels to clients and servers, as well as time synchronization and results archival tasks. While PolyI reduces test management work to a few mouse clicks, it is not required to run Polygraph tests or to reproduce the Cache-off results. We describe the internals of our benchmarking environment below.

We rented 100 PC's for use as Polygraph clients and servers. These machines are a variety of HP Vectra's and no-name clones. Each had at least a 650 MHz Pentium III CPU, 256 MB of RAM, and an Intel Etherexpress PRO/100+ Ethernet card.

We use FreeBSD-4.3 as the base operating system for the Polygraph clients and servers. We make a number of changes to kernel parameters in order to support PolyMix-4. We provide participants with a custom-built FreeBSD distribution to simplify the installation process for them and reduce the chance of configuration mistakes. This software is available to the public from our web page.

The number of Polygraph machines varies for product under test. Peak request rates vary a lot among caching products. Thus, each participant informed us how many Polygraph client-server pairs they need to drive their cache at its maximum capacity.

During the cache-off, we never use more than 500 requests per second per machine for official tests.

Each bench has a monitoring PC connected to the harness network. This PC is used to start Polygraph runs, display run-time statistics, collect logs after the completion of a run, and generate Polygraph reports.

8.4 Time Synchronization

We run the ntpd time server daemon on all Polygraph machines and the monitoring PCs. The monitoring PCs are synchronized periodically with a designated reference clock. We run ntpd on all machines rather than just synchronizing clients and servers before each test as was done during the first cache-off. While running ntpd could introduce small CPU overhead, we are concerned that without periodic synchronization, local clocks may drift apart during these long (15+ hours) tests.

8.5 Network Configurations

Each test bench consists of Polygraph machines, the monitoring PC, the participants proxy cache(s), and a network to tie them together. The networking equipment falls under the participant's domain. That is, each participant is responsible for providing the networking equipment need to connect the Polygraph machines to the caches.

A new feature for PolyMix-4 is that the Polygraph agent IP addresses are bound to the system's loopback interfaces. This nifty trick avoids the problems with huge ARP tables that we had for PolyMix-3 tests. It also requires a number of static routes on clients, servers, and proxies so that each knows how to reach the agent addresses.

The following figure shows a typical bench configuration, with a flat network:

Each Polygraph machine requires a fast Ethernet port, so the participant must have enough ports to connect all of Polygraph machines within participant's cluster. The monitoring PC must have IP connectivity to all clients and servers at all times.

We run bidirectional netperf tests between each client-server pair to measure the raw TCP throughput. We also selectively execute Polygraph ``no-proxy'' runs to ensure that clusters can generate enough throughput to sufficiently drive the cache under test.

Before running any PolyMix-4 tests, we always test network throughput with Netperf. All netperf tests showed satisfactory levels of raw network performance.

All ``no-proxy'' tests were successful, delivering desired throughput and negligible response time overheads.

8.6 Numbers

PCs rented: 100 + 3% spares
Vendors: 10
Humans: 13
Products tested: 14
Floor space: 5000 ft2
15-amp power circuits: 18
6-outlet power strips: 50
Ethernet cables: approx 200
Games of ``Bust-A-Move'' played: at least 100

9. Test Sequence

This section describes the official testing sequence. The complete sequence was executed at least once against all cache-off entries.

9.1 PolyMix-4

PolyMix-4 is the main performance test which generates the vast majority of the reported numbers. This test is discussed in the ``Cache-off Workload'' Section.

Note that the cache is filled as a part of the PolyMix-4 workload. Depending on the product, this can take anywhere from four to 20 hours.

9.2 WebAxe-4

WebAxe-4 is another performance test that we added for this Cache-off. It is designed to simulate traffic for a caching proxy in a surrogate role (i.e., as an HTTP server accelerator). In general, WebAxe-4 is an easier test because the working set size is much smaller (only 1GB) and the recurrence ratio is very high (90%).

9.3 Downtime Test

The ``Downtime Test'' is performed only after a successful PolyMix-4 run. We use one client-server pair. During the first 10 minutes of the test, Polygraph creates a 3-10 req/sec load through the proxy. The power to all participant devices, including networking equipment, is then manually turned off. After about 5 seconds of ``downtime,'' the power is turned back on, and the measurement phase begins. We measure the times until the first miss and the first hit. Polygraph continues to emit the same light load during the entire test. The precision of this test is around 5 seconds.

It is important to note that the cache(s) and networking gear are plugged into power strips. We turn off the power strips and not the equipment boxes to simulate realistic conditions of an unexpected software, hardware, or power failure. Vendors are not allowed to assist the reboot process. UPS devices of any kind are not allowed during this test.

We realize that the downtime test and execution rules are simple, if not primitive. However, even this test provides very useful data to cache administrators. Depending on the installation environment and reported cache performance, one can decide whether to invest in UPS systems and/or redundant configurations. We will work on improving the workload for this test.

9.4 MSL Test

As we described earlier, all entries must have a Maximum Segment Lifetime (MSL) of 30 seconds, producing a TIME-WAIT state of 60 seconds. Any product which fails this test is disqualified.

To determine the MSL on each product, we probe its TCP stack and monitor connection requests. If the system accepts a new connection with the same sequence number in under 60 seconds, it fails the test. The msl_test program is included in the Polygraph source distribution.

All cache-off entries reported TIME-WAIT state of 59-60 seconds.

10. Cache-Off Controversies

Controversies are a regular feature of cache-off benchmarking, and this one is no exception. While we always do our best to learn from the past, inevitably new and unexpected situations arise. This time, there are three problems that are worth mentioning here.

10.1 Delays at US customs

The equipment belonging to two of the vendors did not arrive at the Cache-off location in time for testing. Apparently, their shipments suffered significant delays with U.S. customs. On Wednesday, JST and NAIST gave up on receiving their hardware and borrowed two old PC systems belonging to The Measurement Factory.

The pricing for the affected entries was done by the Measurement Factory using on-line resources for used and new computer parts. We tried to come up with a reasonable hardware price, but our estimation could not be precise. Please keep that in mind when looking at price/performance analysis for JST and NAIST entries.

10.2 Yet another pricing problem

The price of the BBGCache entry in the first draft of this report was based on the estimate made at the time of the cache-off. That draft was released exclusively to cache-off participants, except for the ARATech team that represented BBGCache. We waited for the final (and overdue) pricing confirmation from ARATech before providing their team with the draft. That was done to avoid an unfair situation were one participant could make price adjustments knowing competition prices. We hoped that since ARATech was not given access to the draft, other participants will not complain if ARATech team needs to change the price.

Our hopes did not materialized. ARATech did adjust the price for the BBGCache entry, and that adjustment caused changes in products order on price/performance graphs. Those changes in order caused complaints from other vendors. The complaints were based primarily on the fact that the Measurement Factory could not guarantee that the draft was not leaked to the ARATech team by a friendly cache-off participant.

After a series of long and heated discussions, affected participants agreed to accept a change in price, but only to the level where the change does not alter the original order of price/performance graphs.

In retrospect, we should not have released the draft until all participants have confirmed their prices, even though the last confirmations would delay the work on the report.

10.3 One entry was not ready

ARATech brought a third entry to the Cache-off, but did not start any official tests on it. ARATech intends to bring the product back to our labs January 7, 2002 for a retest. Major configuration parameters and price for the product will not be changed till then. The results of the retest will be posted on our Web site.

10.4 Two Hit Ratios

For this Cache-off, we improved Polygraph's report generation tool. One improvement was to measure hit ratios by comparing the number of transactions and bytes on client and server sides. Such a comparison yield more accurate ratios because it does not require client-side guesses about whether the transaction resulted in a hit. Such guesses are imprecise and can be done only for basic HTTP transactions. The old report generator (ReportGen) uses this imprecise technique to report hit ratios. The new tool reports hit ratios measured using both methods.

We have chosen the new, more precise, measurement algorithm for the baseline presentation in this report. This choice confused some of the participants since they became accustomed to the numbers reported by the old tool. We felt that our choice was justified because the new measurements are more precise and because the old measurements are still available in new reports.

10.5 Phase synchronization bug

During the Cache-off we observed and fixed a bug relating to phase synchronization. The phase synchronization algorithm could malfunction if polysrv receives responses in the opposite order than polyclt transmits them. When this happens, the polygraph agents might become confused and never synchronize their phases. To the best of our knowledge, the bug affected one test on one entry. The test was repeated.