The Measurement Factory, Inc.
Alex Rousskov, Matthew Weaver, Duane Wessels
team@measurement-factory.com
We held a week-long benchmarking ``Cache-off'' for caching proxies in the middle of November, 2001. Using the Web Polygraph benchmark, we tested 14 caching proxies from 10 different organizations. In this report, we summarize performance data collected during these tests and analyze the results.
1. Introduction
1.1 Timeline
1.2 Cache-off Participants
1.3 Terminology
1.4 How (not) to Read this Report
1.5 Where to find more information
2. Executive Summaries
2.1 Unavailable Data
3. Performance Details
3.1 PolyMix-4 Details
3.2 WebAxe-4 Details
3.3 Downtime test
4. Product Configurations
5. Comments
5.1 Polyteam Comments
5.2 Vendor Comments
6. The Rules
7. Web Polygraph
7.1 The Cache-off Workload: PolyMix-4
8. Benchmarking Environment
8.1 Location
8.2 Schedule
8.3 Polygraph Machines
8.4 Time Synchronization
8.5 Network Configurations
8.6 Numbers
9. Test Sequence
9.1 PolyMix-4
9.2 WebAxe-4
9.3 Downtime Test
9.4 MSL Test
10. Cache-Off Controversies
10.1 Delays at US customs
10.2 Yet another pricing problem
10.3 One entry was not ready
10.4 Two Hit Ratios
10.5 Phase synchronization bug
The Cache-off addresses a need in the web caching community for high quality, independent verification of product performance. This event represents a snapshot in time of the caching industry. The results presented here are all taken during a one-week period. In most cases, a product's performance will change over time as the vendor makes improvements, fixes bugs, and adds new features.
We strive for fairness in our testing. Decisions regarding the rules and testing environment are made well in advance with input from Cache-off participants. Any company or organization willing to test the performance of their product(s) is given the opportunity to participate in this event. We describe the actual rules and testing environment later, in the Rules and succeeding sections.
A number of well-known caching companies have chosen to not participate in this Cache-off, for one reason or another. We can assure you that all of these companies were fully aware that these tests were taking place. We encourage you to ask their sales representatives why they do not participate in public benchmarks, and to be very suspicious of results produced by the company itself.
For the first time, we are testing caches in both ``forward'' (client-side) and ``reverse'' (server-side) configurations. The PolyMix-4 workload simulates traffic that a forward cache receives. Similarly, the WebAxe-4 workload simulates the traffic for a reverse proxy, also known as a surrogate or server accelerator.
Preparations for the fourth Cache-off began with an organizational meeting in May, 2001. Representatives from 6 companies attended this meeting with the intention to participate in the Cache-off. During the meeting, we prioritized the features to be added for PolyMix-4. The meeting attendees also told us that it was important for them to publish benchmarking results for their products configured as surrogates. Thus, we added WebAxe-4 to the sequence of tests to run at the fourth Cache-off.
The first version of Web Polygraph designed for this Cache-off was released in July 2001. The version used at the Cache-off is dated September 2001. Since then, the code and workloads remained frozen until the start of the Cache-off on November 12th. Participants had plenty of time to practice and prepare for the competition.
The following companies and organizations brought products to the fourth Cache-off:
Even though they registered, Arlotto Comnet, Inc decided to not participate.
Throughout this report we use a few terms that have specific meaning for the Cache-off. A vendor is an organization that has a caching product. To simplify the terminology, all commercial, non-profit, virtual, etc. organizations are labeled as ``vendors.'' Some vendors are actually two or more companies working together, under an O.E.M. agreement for example. A vendor is allowed to bring more than one product to the Cache-off. Each product or entry that a vendor brings counts as one participant. We have one bench (``harness'') for every participant. PolyMix-4 and WebAxe-4 are the names of the workloads that we use for these tests. As with previous tests, these are standardized workloads that we develop in cooperation with vendors. We'll talk more about PolyMix-4 and WebAxe-4 later in this report.
We strongly caution against drawing hasty conclusions from these benchmarking results. Our report contains a lot of performance numbers and configuration information; take advantage of it. The products that we test differ greatly, and it is tempting to draw conclusions about participants based on a single performance graph, or one column in a table. We believe such conclusions will virtually always be wrong. Here are a few recommendations to prevent misinterpretation of the results.
Our benchmark addresses only the performance aspects of caching proxies. Any given product has numerous features that are not addressed here. For example, we think that manageability, reliability, and support are very important attributes that should be considered in your buying decisions.
The Measurement Factory maintains the Official Results Site, where this report and the detailed Polygraph log files from the Cache-off are stored. All information at the Official Results Site is freely available.
There are no other official sources of Cache-off results.
Documentation, sources, independent test results, discussion mailing lists, and other information related to Polygraph benchmark are available at the Web Polygraph site.
We discuss only major performance measurements in this report. If you'd like more details about a particular entry, consult these resources:
The links above are also useful if you are afraid of being influenced by our interpretation of the results. We still recommend reading the report afterwards (as a ``second opinion'') because not all test rules and performance matters will be clear from the raw data.
The ``Executive Summary'' tables below summarize the performance results. We provide an in-depth analysis of the measurements in the next section, titled ``Performance Details.'' We present the summary information in three separate tables:
The first table contains the PolyMix-4 and downtime test results from 12 products.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The second table contains PolyMix-4 and downtime test results for two Stratacache Dart products. These are listed separately and do not appear in the bar charts because they have limited user licenses and cannot be sold in small quantities. We feel that this complicates the performance/price analysis (see detailed discussion in the third Cache-off report). Stratacache calls their limited-license products ``microcaches'' because they were originally designed for small networks with 10 or fewer network-attached devices (e.g., a residential or library setting). The name also fits well with products' tiny size and set-top box appearance. Additional information on the Microcaches is available in the Stratacache vendor comment section. We invited Stratacache to bring a couple of these boxes for testing because they may indicate a new niche in the caching market.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The final summary table contains the WebAxe-4 results. As you can see, only two products elected to run this test. This is disappointing given the feedback we received from vendors during the April planning meeting.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
Column headings in the above tables are links to bar charts that compare the corresponding measurement. Row headings are labels with short names for the tested products. These labels also have links to pages with configuration and performance details for each product.
The ``Total Price'' represents the list price of the product. This includes both hardware and software. Furthermore, if the vendor uses advanced networking equipment, that too is included in the Total Price. However, Ethernet switches providing basic layer two connectivity are not included in the list price. This is a departure from previous tests where networking equipment always contributed to the price. Our rationale is that most organizations already have networking equipment in place and would not consider it's cost when purchasing caching proxies. If, however, a product requires layer four features to achieve higher performance, the cost of such equipment should be included. None of the products mentioned in this report use advanced networking features. They all use basic layer two Ethernet switches.
NOTE: It is likely that some vendors will lower their prices after seeing their competitor's results in this report. Be sure to read the ``Vendor Comments'' for pricing changes and other important information.
The ``Peak Throughput'' column depicts the highest tested request rate for each product. PolyMix-4 and WebAxe-4 have a number of different phases, each with a different, or varying throughput. In the summary tables, we report the response rate during the 4 hour top2 phase, when the load is at its peak and the cache is more likely to be in a steady state.
The ``Response Time'' group has three columns. We report the mean response time for cache hits and misses separately to emphasize performance differences on the two most important request paths. The ``All'' column depicts mean response time for all request classes.
The ``Savings'' column shows the percentage of transactions (Xact heading) and bytes (Bwidth heading) that the product served as cache hits.
The ``$1000 can buy'' columns shows performance/price ratios. We use two performance measurements: hit rate, or number of cache hits per second (the ``hit/sec'' column), and request rate (the ``req/sec'' column). Both measurements are normalized by Total Price (in thousands of dollars). In other words, the data shows ``how many hits or requests per second do I get for a thousand dollars?'' Some participants feel that hit performance/price is a more important measurement than overall throughput. For example, a product with poor hit ratio may still score well on overall throughput. On the other hand, the hit throughput measurement can be misleading because the hit ratio you achieve on a production system may be significantly different than for these tests.
The ``Minimum Downtime'' columns in the PolyMix-4 tables contain the results of the downtime test. Here we report how long it takes the product to serve a cache miss after suffering a power outage.
The ``Cache Age'' column estimates the cache capacity in terms of hours of PolyMix-4 peak fill (i.e., cachable miss) traffic. For example, a reading of 10 means that, at the peak request rate, the cache becomes full after 10 hours, and must begin replacing objects. We believe that, in a production environment, caches should be large enough to hold 2-3 days worth of traffic. Unfortunately, in this cache-off environment, products can get away with a cache age of 5-6 hours -- just enough to store the working set window.
Regarding errors, all published performance tests finished with less than 0.1% of failed transactions. Note that the rules disqualify a run with more than 1.0% of errors.
You will notice that some of the table entries are filled with ``n/a'' to indicate unavailable data. The Chamomile, Stratacache E-55, and iMimic entries were unable to complete the downtime test. As tested, their systems require manual intervention to boot up.
This section gives a detailed analysis of major performance measurements.
The PolyMix-4 workload has several phases. For the baseline presentation, we have selected the top2 phase. Top2 is the second 4hour phase with peak request rate. The first peak phase, top1, often yields unstable results. The second top phase is usually more stable.
The bar charts below are based on data averaged across the top2 phase. Averages are meaningful in situations where performance does not change with time, or when changes are smooth and predictable. We encourage the reader to check individual entry reports for the exceptional behavior where averages may be less meaningful.
As with any benchmark, Polygraph introduces its own overheads and measurement errors. We believe that margin of error for most results discussed here is within 10%. In most cases, however, the reader should pay attention to patterns and relative differences in product performance rather than absolute figures.
Depending on the version of the report you have selected, the entries appear in either alphabetical or numerical order for each metric
Presenting throughput results in a way that pleases and makes sense to everybody is a daunting task. Due to tremendous differences in request rates, a simple graph with raw request rates from the ``Executive Summary'' table is not very informative. Moreover, comparing throughput of a large, $65K system to a small, $3K PC is usually not interesting. Product prices do vary a lot.
To begin your analysis, you might first pick out products that are in your price range:

You may also want to pick out products that meet your demands for HTTP traffic:

We prefer to normalize the throughput results by some universal measure of product complexity and ability. Several measures have been proposed, including product price, rack space, and disk spindles. Price normalization is imperfect because the true price is difficult to determine, especially for free software products. Rack space normalization presents problems for entries that were not tested in a rack-mountable configuration. Normalizing by the number of disks neglects the differences in RAM sizes and disk throughput or capacity. We select price as a normalizer.
We also need to choose which throughput metric to normalize: overall throughput, or cache hit throughput? Overall throughput is useful for capacity planning, if you know how many HTTP requests per second your users generate. However, it does not account for the fact that we are measuring caches here. A non-caching proxy could score well on normalized overall throughput, yet lose in all other categories. The normalized hit throughput, on the other hand, uses the rate at which the tested product can deliver cache hits.
As it turns out, whether we select hit throughput or overall throughput doesn't significantly change the ranking of tested products. Most entries' positions are the same on both charts.
To emphasize the importance of caching traffic, we select the normalized hit rate graph for the baseline presentation:

The normalized graph not only provides a fair comparison but answers an important question: ``How many hits per second can one thousand dollars buy?''
There appears to be no strong correlation between performance/price ratio and absolute throughput (or price): Products showing good return on a dollar can be found on both ends of the throughput scale.
Note that it doesn't make sense to normalize all of the performance metrics by product price. For example, hit ratios and response times should approach some ``perfect'' level regardless of the product's cost. Furthermore, the hit ratio and response time measurements are limited by the workload characteristics. We shouldn't look too closely at the absolute values for these metrics. Rather, we should care more about how close a tested product comes to the ``ideal'' value.
Hit ratio is a standard measurement of a cache's performance. PolyMix-4 offers a hit ratio of about 55% -- a cache cannot achieve a higher hit ratio in these tests. However, due to various overload conditions, insufficient disk space, deficiencies of object replacement policy, and other reasons, the actual or measured cache hit ratio may be smaller than the offered level.

The ``Document Hit Ratio'' chart shows how a cache maintains cache hit ratio under highest load. Almost all of the products achieve a hit ratio higher than 50%.
The two primary reasons for a less-than-ideal hit ratio are excessive load and insufficient disk space. For most caching products, disk I/O is the bottleneck. Bypassing the disks for some requests allows the proxy to absorb more load. If the cache size is smaller than the working set window, the proxy won't be able to store all responses that Polygraph expects to result in cache hits. We can actually estimate the age at which a particular product begins purging cached objects, which we do in the next section.
To estimate the maximum age of cached objects, we divide cache capacity (as specified by the vendor) by the fill rate during the top2 phase. The latter is the rate of cachable misses as measured by the Polygraph client. Raw fill stream measurements can be found in the individual entry reports. We believe that our formula yields a ``good enough,'' albeit not precise, approximation of real world measurements.

Many cache administrators believe that a production cache should store about 2-3 days of traffic. Due to the differences between the ``accelerated'' benchmarking environment and real-world conditions, the 2-3 days rule of thumb probably corresponds to some 10 hours of cache age.
The cache capacity requirement depends on your environment. When configuring a caching system based on our performance reports, make sure you get enough disk storage to keep sufficiently ``old'' traffic. You may need to increase the price and re-compute performance/price ratios if a product you are considering does not have enough storage. You should also check that the product is actually available with the additional disk space. These adjustments may significantly affect the choice of a price-aware buyer.
To simulate real-world conditions, PolyMix-4 introduces an artificial delay on the server side. The server delays are normally distributed with a 2.5 sec mean and 1 sec deviation. These delays play a crucial role in creating a reasonable number of concurrent ``sessions'' in the cache.
To simulate WAN server side connections, we introduce packet delays (80 msec round trip) and packet loss (0.05%). These delays increase miss response times and, more importantly, reward caches for using persistent connections (TCP connection setup phase includes sending several packets that also incur the delay).
The delays, along with the hit ratio, affect transaction response time. The ideal mean response time for this test is impossible to calculate precisely because the model is too complex. We estimate the ideal mean response time at about 1.3 seconds. Mean response time in a no-proxy environment is about 2.8 seconds.
Absolute response time figures are important in understanding the benchmark environment, but are of little value when comparing the Cache-off results with a given real-world setup. Indeed, every particular cache deployment will have different hit ratios and server-side delays. Thus, while providing the mean response time measurements as a reference, we select the ``Response Time Improvement'' graph for the baseline presentation:

The above graph shows the relative reduction of mean response time achieved by the cache compared to a no-proxy (direct) test, or: ``How much faster will an average reply be if a cache is deployed?'' It shows the (direct - proxied)/direct ratio for mean response times. We hope that the ratios reported here will be close to the real-world performance of the tested products.
Hit Ratios affect, but do not define response times. In an ideal scenario, it takes a negligible amount of time to deliver a cache hit to the client. Fast cache hits decrease average response times. In the same unrealistic scenario, it takes only about 2.6 seconds to deliver a cache miss. In practice, both hits and misses may incur significant overheads.
The hit and miss response time charts show that hits are primarily responsible for the differences in overall response time measurements.
Unfortunately, there is not much to say about the WebAxe-4 results since only two entries took the test. We won't insult you with bar charts showing just the two products. Please refer to the WebAxe-4 summary table in the previous section.
As you can see, both products have about the same price, but the Chamomile entry supports a significantly higher throughput than the Swell 1000.
The Swell 1000 products performs only slightly better than Chamomile in terms of response time (hits and misses), byte hit ratio, and bandwidth savings.
Although the Chamomile entry has six SCSI disks, they were not used for the WebAxe-4 test. Instead, the responses were cached only in memory. Of course, this is possible because the WebAxe-4 workload has a working set size of only 1GB.
The downtime test is designed to estimate the time it takes a product to recover from an unexpected condition such as power outage or software failure. Polygraph measures the time until the first miss, which approximates the minimum downtime of the cache.

Polygraph can also measure the time until the first hit. However, from a user's point of view, the time until the first miss is somewhat more important. As soon as the caching system is able to deliver misses, the user is able to access the Web again. Delivering hits is important to reduce outgoing bandwidth usage and from a quality-of-service point of view. All tested caches were able to deliver hits immediately or shortly after forwarding the first miss.
Chamomile, Stratacache E-55, and iMimic entries were not able to complete their downtime tests. These products required manual intervention to reboot (i.e., pushing the power button) because their BIOSes do not allow for an automatic boot after the power has been turned back on. Such intervention is prohibited by the Cache-off rules.
The precision of this test is around five seconds.
Here are the configuration details for all tested products.
|
The ``Cache disks'' and ``Cache'' columns refer to configurations for the PolyMix-4 test. The two products that took the WebAxe-4 test were configured to use less disk space.
The iCache 2500 product was configured to use a GBit network card. All other products were using 100Mbit NICs.
Details about each product configuration, including networking gear specs and cache tuning parameters, are available on individual product pages, linked from row headings in the table above.
It is a Polyteam tradition to give cache-off participants a chance to comment on the results after they have seen the review draft. The comments below are verbatim vendor submissions. Polyteam has not verified any of the claims, promises, or speculations that these comments may contain.
ARA Networks participated in the 4th Cacheoff with JAGUAR2000, which was introduced last event. Since then, JAGUAR2000 has constantly developed for better performance and versatile functions. In this event, JAGUAR2000 showed the best peak throughput among 100 M peers. This proven performance is the result of ARA's constant endeavors, and thanks to it, JAGUAR2000 is highly regarded with its distinguished performance and functions meeting various needs of market.
ARA Networks tries to prove technical quantum leap with JAGUAR3000 cache engine, which will be tested out of race. JAGUAR3000, developed with MCT (Minimal Context-switching Thread, ARA's own architecture) that provides the optimized thread programming, is a high-performance cache engine. It utilizes the efficiency of SMP to the fullest extent. JAGUAR3000 also shows good scalability, and can be ported easily to various operating systems (Solaris, Linux, FreeBSD, Windows).
With the introduction of JAGUAR3000, JAGUAR2000 will be provided at much lowered price than estimated. ARA Networks will deliver the Cache-off entry performing 1500req/sec at around $9000. By interworking with already developed Streaming Media Cache or Dynamic Cache, it will be used for faster delivery of diverse contents. In addition, we will add further advanced features to JAGUAR2000 and develop derivatives equipped with JAGUAR2000 engine.
We'd more than appreciate your constant interest in ARA Networks, which advances every year through annual Cache-Off.
Broadband Gateway is a Japan-based CDN service provider. We are currently delivering CDN services to a wide range of customer in Japan and planning to market ITM solutions including BBGCache. BBGCache newly added to our product portfolio is a high-end Web caching solution for ISPs and enterprises. Designed on an optimal hardware architecture, BBGCache is about to hit the Japanese market with a variety of functions and enhanced scalability. In addition to the diversified caching functions basically provided, it has been already adopted in CDN services to provide special functions required in the CDN system.
BBGCache will be widely deployed as a differentiated solution to deliver value-added features such as virus scanning and contents filtering. Since it has employed Ploymix 4 and there was a miscommunication in class definition, its estimated price in the summary should be much lowered. BBGCache will be priced at around $3,000 when it is unveiled in Japan in January 2002. The solution will be applauded for its excellence in price, performance and functionality.
Chamomile is a WWW proxy system developed as a research product from scratch. It employs a multi-threaded architecture and highly optimized utilization of main memory to cache WWW objects. It works on UNIX variants, such as FreeBSD, Solaris, and Linux. Among them, we are developing it on Linux (kernel version 2.4) because of its low kernel overhead and efficient manipulation of threads. All the cache-off tests are done on Red Hat 7.1 with linux-2.4.13 kernel.
The original research goal of Chamomile was to achieve high performance especially as a reverse proxy and it was developed just for that purpose at the early stage. One of the key technologies there is its object replacement algorithm for memory cache. Chamomile achieved high hit ratio at the webaxe-4 test using just only main memory to cache objects. Of course, currently, Chamomile also works fine as a forward proxy. It leaves, however, much room for performance improvement, because it simply stores each web object into a single file on a file system. This weak point will be resolved by our future work.
As a comment of the cache-off test, we have to account for the host configuration at first. Because our server host did not reach Boulder in time, we could not try any tests with it. Fortunately, the polyteam let us use their server host (slightly old but very fine!!) and carry out polymix-4 and webaxe-4 tests for once respectively. In fact, our original host had two Athlon-MP 1.2GHz CPUs, 1GB of memory, and four 10krpm Ultra3-SCSI drives. The major difference between our original host configuration and that of the borrowed one is its CPU power.
Finally, we would like to greatly thank the polyteam for their kind efforts to complete our tests. Without their help, we could not publish any results. For additional information about chamomile, please contact Eiji Kawai.
Cintel Co., Ltd. is pleased to be able to demonstrate the excellence and performance of its iCache Web caching products at this important event.
Cintel iCache was among the top performer in its Peak Throughput, Hit Ratio, Minimum Downtime Improvement, Response Time Improvement and almost the other part. And, the competitive price for this caching server combined with outstanding overall performance metrics provide an excellent resource to a greater spectrum of individuals and businesses for optimal caching solution with an affordable blend of power and manageability. This new generation cache server with superior performance will be a standard for cache solutions targeted at small to medium size businesses, enterprise networks.
For more information on the entire line of Cintel Cache Appliance Solutions, visit http://www.cintel.co.kr. And if you have any question on our proud product iCache, please e-mail us.
CinTel thanks Polyteam for their dedicated efforts in developing the Polygraph benchmarking software and coordinating the Cacheoff event to a success.
We at iMimic are delighted by the metrics posted by our OEM partners:
- 1st Place (and 6 out of top 7): Highest Request Rate
- 1st Place (and 6 out of top 7): Highest Bandwidth
- 1st-6th Place: Highest Hit Throughput Per Price
- 1st-5th Place: Highest Bandwidth Per Price
- 1st-4th Place: Document Hit Ratio
- 1st-7th Place: Bandwidth Savings
- 1st-6th Place: Mean Response Time
- 1st-6th Place: Hit Response Time
- 1st-6th Place: Response Time Improvement
- 1st-4th Place: Throughput Per Rack Unit
- 1st-5th Place: Throughput Per Disk
Building on the success of past cache-offs, iMimic has again proven that our DataReactor Core software provides the best performance available at all points in the hardware spectrum. One OEM achieved 120 requests per second from an appliance little bigger than a cellphone; another captured the overall performance crown at 2700 requests per second, nearly twice the speed of the nearest competitor. In the bandwidth savings category, our OEMs won the top 7 places, and the top 6 in Hit Throughput per $1,000.
With our own entry, iMimic set a new record for performance on Linux, beating the previous record (also held by us) by 50% and increasing the gap between our systems and the closest Linux competitor to almost an order of magnitude difference in throughput. Our entry used a standard Red Hat Linux 7.2 installation and only one out of two processors available in the unit. This configuration is an ideal system for integrating value-added edge services, via the DataReactor platform or running on the operating system itself, such as SSL offloading, filtering, compression, or HTML/WML conversion. This single-box solution offers extremely high performance for proxy caching while still leaving resources available for other services.
iMimic would like to thank The Measurement Factory team for a very well-planned event.
KOTETU team of NAIST (Nara Institute of Science and Technology) would like to thank TMF for the opportunity to evaluate our system.
KOTETU is one of products from our study. Our goal is to solve problems in WWW caching and to discover secrets of caching systems through its implementation. KOTETU is an open source caching system for work group or department caching service. It is designed to run upon generic UNIX systems without special OS, filesystem and device. The Cache-Off is good place to evaluate the system.
By a trouble in transportation, our equipments arrival was delayed. In wednesday afternoon, we gave up to wait that because no enough time to test if they would arrive thursday or friday, and built an another entry machine using a TMF's memorial machine. At that time, our left time was short (2 days and several hours). We installed OS and KOTETU, and tune them before bench marking. We had not enough time to fine optimization for the machine. However, the results in this Cache-Off is better in that situations.
The machine is little older. You can find the machine in the report of 3rd Cache-Off. Since it is used in few years, its price in today market is not clear. TMF and we discuss the price of the machine and estimate as $2,000. The price of software is zero because KOTETU is an open software.
Money and rack space - are any two things more important to a cache buyer? With breakthrough price/performance and performance/rackspace metrics, the Pyramid iCache C - the only European cache to be tested - establishes itself as second to none in these areas. We were pleased to see that our iCache C achieved the highest throughput per $1,000, highest hit throughput per $1,000, and highest throughput per rack space unit.
A full rack of our iCache C product is capable of handling 53,592 Polymix-4 requests per second, unachievable by any other participant. Such high performance demonstrates the iCache's readiness for even the most demanding network environments.
Like the rest of our iCache line, the iCache C is carefully engineered for excellent performance, high reliability, and a low total cost of ownership. To this end, the iCache C provided fast recovery from power interruptions, response time improvement within 3% of the overall winner, and a document hit ratio within 2% of the overall winner.
Pyramid has experience in a wide variety of cache deployments. Our one-disk model is well-suited for kiosks, stores, gas stations, and other CDN points of presence. For more information on the iCache C or any of our other models, please contact us or visit our website.
We'd like to thank The Measurement Factory for a well-organized and fair event.
Stratacache focuses on developing high performance caching, streaming media and content distribution appliance products for a broad range of industries. As you can see in the Cache-Off report, we provide the key technology components that can be used in a cost effective, distributed site caching or CDN infrastructure. Stratacache is also pioneering the use of Microcaching (the Stratacache Dart series) for small office or remote branch office sites (libraries, bank branches, kiosk clusters, Internet cafes, etc).
Unlike some of our competitors, our small enterprise products (the Stratacache Express and Stratacache Flyer units) are based on higher performance SCSI Ultra-3 disks. The use of SCSI does not necessarily show a performance difference in the Polygraph HTTP benchmark, but we have found that when serving streaming media content from Real Networks, Microsoft, Quicktime or MPEG 1, 2 or 4, that SCSI is of substantial benefit. Details on our large enterprise and carrier products, including the Stratacache Meteor, Metroliner and Superliner are available on our web site and additional Polygraph performance reports on these products are also available.
Stratacache would also like to take this opportunity to thank The Measurement Factory Team for their continued development of the Web Polygraph benchmark. The caching industry owes a debt of gratitude to this team for their work in building a benchmark that helps this industry continue to evolve.
We would also like to point out that if you are considering purchasing caching appliance products from companies such as Cisco, Network Appliance, Infolibria, Cacheflow, Inktomi, Dell, IBM, Compaq, 3COM, F5, or HP, that NONE of these companies wanted you to be able to judge the performance and capabilities of their products in this open testing environment. Please understand that the Polygraph benchmark is an open testing platform that allows you as a customer to test a caching product yourself and make sure that any vendors marketing claims meet the real performance capabilities you desire. If you want to know how product A compares with Product B but don't have the time to work with the Polygraph benchmark, The Measurement Factory also holds private tests for customers where you can choose the products that you wish to compare and get a non-biased report from an independent third party.
We would first like to thank the Measurement Factory for their hard work and dedication providing an unbiased benchmarking event for web caching vendors. This years event was the smoothest run yet, with few controversies and even fewer Polygraph bugs to be worked out at the last minute.
This year, we decided to continue our tradition of testing the latest Squid version available. So our test software was a daily snapshot of Squid from a few days before the event, a version destined to be version 2.5. Due to a few issues with the load-shedding features of this Squid version, our server unfortunately exhibited a rather low hit ratio. Older variants of Squid do not exhibit this problem, as shown by previous Swell cache-off entries. We will, of course, fix this problem before shipping systems based on this new Squid version.
Otherwise, we are pleased with the results. Squid continues to improve in performance, while adding new features and greater stability under extreme conditions. The proven compatibility and large feature set of Linux and Squid, combined with our easy to use web based management tools and great support makes the Tsunami server line an excellent caching value. Additional new features in the Tsunami system, including NNTP news caching, transparent FTP caching, our unique transparent bridging features, and cluster-capable management tools makes it a featureful and cost effective alternative to expensive proprietary solutions.
Finally, we would like to extend our gratitude to the Squid development team who make our presence at these events possible by building the software around which we have based our business.
TNC Labs prides itself on providing excellent performance at good values across all of our product lines. Our cache-off entry, the CE-1200, is no different: blistering performance (1st place out of all entrants in hit response time), for a reasonable price (3rd place in price/performance, out of all 12 entries). We have introduced a range of caches appropriate to needs of all sizes, from the 1-disk CE500 to the 10-disk CE5500.
Established in 1993, TNC Labs is a leading provider of a wide range of Internet access and data communication products and solutions specifically designed for remote offices, mobile professionals, small and medium enterprises, and multi-national companies. Through its team of experienced research and development engineers and software specialists, TNC Labs develops and manufactures a complete range of LAN/WAN products including the integrated Internet server, PCMCIA cards, LAN cards, modems and hubs. The newly added caching technology achieves the best combination of price-performance, response time and hit ratios.
"Our award-winning caches allow organizations to optimize their bandwidth usage and solve network bottlenecks and inefficiencies so that their existing infrastructure can be used to generate greater revenue" said See Eng Huat, Managing Director of TNC Labs. "Thanks to the efficiency of iMimic's DataReactor software, we can offer excellent performance at a fraction of the prices charged by some competitors. Our American headquarters location keeps us in touch with the cutting edge of the global caching marketplace."
For more information, please contact us or visit our website.
TNC Labs extends its thanks to The Measurement Factory team for a fair and well-run event. Only vendor-independent benchmarks enable customers to make the best decisions.
The majority of the rules were defined and agreed upon in conjunction with participants during the May 2001 organizational meeting in Denver, Colorado. Most of the rules are the same as from previous testing events. The core set of rules are available in the documentation of the fourth Cache-off. Here, we will highlight some of the more interesting and controversial provisions.
Tested products must be available for sale to the public at the prices given in this report within two months after the start of testing. This rule is difficult, if not impossible to enforce. Nonetheless, if you discover that one of the products described here is not being offered, please let us know.
Tested products must use a TCP MSL value of at least 30 seconds. A lower setting can improve performance by recycling recently-used TCP ports at a higher rate. We verify MSL settings before running performance tests.
At least 28% of responses must be cache hits during the top2 phase of PolyMix-4. Some products may be able to trade lower hit ratio for higher throughput. PolyMix-4 and WebAxe-4 results with document hit ratio less than half of the ideal ratio are disqualified. Similarly, measured mean response time must not exceed no-proxy response time (about 2.8 seconds) by more than 100%.
There is no limit on the number of products that a single vendor can bring to be tested.
Companies that participate in the cache-off can publish new results two months after this report is released. This rule is in place to prevent a company from testing a ``token'' product at the cache-off and then publishing results for different products right after finding out how their competition performs.
Non-participating companies must wait three months. The rationale for this rule is similar. Some people feel that a company has an advantage if they can skip the cache-off, learn about their competition, and then publish a result from the same test shortly after.
We are in the process of re-evaluating these rules in order to get more products tested and more results published.
Any work that is derived from, or uses any of the cache-off results, or this report, must include the following reference to our official site:
A. Rousskov, M. Weaver, and D. Wessels, The Fourth Cache-off. Raw data and independent analysis at <http://www.measurement-factory.com/results/>.
Web Polygraph is a high-performance proxy benchmark. Polygraph is capable of generating a whole spectrum of Web proxy workloads that either approximate real-world traffic patterns, or are designed to stress a particular proxy component. Developed with the Cache-off needs in mind, Polygraph is able to generate complex, high request rate workloads with negligible overhead. Web Polygraph has been successfully used to debug, tune, and benchmark most caching products.
The Polygraph distribution includes two programs: polyclt and polysrv. Poly-client (-server) emits a stream of HTTP requests (responses) with given properties. The requested resources are called objects. URLs generated by Poly-client are built around object identifiers or oids. In short, oids determine many properties of the corresponding response, including response content length and cachability. These properties are usually preserved for a given object. For example, the response for an object with a given oid will have the same content length and cachability status regardless of the number of earlier requests for that object.
While it runs, Polygraph collects and stores many statistics, including: response rate, response time and size histograms, achieved hit ratio, and number of transaction errors. Some measurements are aggregated at five second intervals, while others are aggregated over the duration of the whole phase.
For the Cache-off tests, we used version 2.7.4 of Web Polygraph. Web Polygraph is available to anyone at no charge in source code format.
The PolyMix environment has been modeling the following Web traffic characteristics since PolyMix-3:
These features were added for PolyMix-4:
With PolyMix-4, the clients emit requests with hostnames in the URLs. Previously, they always used IP addresses. This change is important because it makes the workload more realistic. We don't expect, however, the use of DNS names to have a significant effect on performance because DNS responses are cachable. If nothing else, it proves that the tested product properly supports DNS lookups.
We also added support for HEAD and POST requests to PolyMix-4. Again, this is primarily to ensure that the tested product supports these request methods. A product that does not support POST, for example, could easily pass the PolyMix-3 test, but would be worthless to most anyone trying to deploy that product in a production environment.
Another problem with PolyMix-3 is that the offered byte hit ratios are always larger than the offered document hit ratio. This is just the opposite of what we observe in real proxy traces. Production caches typically report byte hit ratio numbers that are at least 10% lower than document hit ratio numbers. For PolyMix-4, we've added a discriminator that causes BHR to be less than DHR. This change may has important performance implications on the workload and caches because it affects the size distribution of hits.
Finally, we've added aborted transactions to further increase the realism of PolyMix. Tested products must be able to deal with unexpected termination of both requests and responses. These aborted transactions do not contribute to the error counts.
The following figure shows the various phases and offered load levels for a PolyMix-4 test. Not counting the fill phase, the test takes about 12 hours. Filling the cache usually takes an additional 3-12 hours, depending on the product.
One of the rules of PolyMix-4 is that the request rate during the fill phase must not be greater than the peak rate (as used in top1 and top2). Otherwise, participants are free to choose virtually any fill rate the like. Usually, the selected fill rate is at least 50% of the peak request rate. We do not present the fill rate parameters in this report, but they can be derived from the logs.
Most measurements discussed in this report are taken from the top2 phase when the proxy is more likely to be in a steady state.
Object reply size distributions are different for different content types (see the table below). Reply sizes, as observed from the client side, range from 300 bytes to 5 MB with an overall mean of about 6.8 KB and a median of 4 KB. The server size distribution has 9KB mean and a median of about 5KB. The client- versus server-side difference is due to the fact that smaller objects are more popular. The reply size depends only on the oid. Thus, the same object always has the same reply size, regardless of the number of requests for that object.
Consult individual product reports for the actual size distributions measured at the cache-off. Also, prior to the Cache-off we ran some tests and logged the individual reply sizes. You can see the corresponding histogram.
Polygraph servers mark some of their responses as uncachable. The particular probability varies with content types (see the table below). Overall, the workload results in about 80% of all responses being cachable. The real world cachability varies from location to location. We have chosen 80% as a typical value that is close to many common environments.
A cachable response includes the following HTTP header field:
Cache-Control: public
An uncachable response includes the following HTTP header fields:
Cache-Control: private,no-cache Pragma: no-cache
Object cachability depends only on the oid. The same oid is always cachable, or always uncachable.
Web Polygraph is capable of simulating realistic (complex) object expiration and modification conditions using Expires: and Last-Modified: HTTP headers. Each object is assigned a ``birthday'' time. An object goes through modification cycles of a given length. Modification and expiration times are randomly selected within each cycle. The corresponding parameters for the model are drawn from the user-specified distributions.
The Life-cycle model configuration in PolyMix-4 does not utilize all the available features. We restrict the settings to reduce the possibility that a cache serves a stale response. While stale objects are common in real traffic, caching vendors strongly believe that allowing them into the benchmark sends the wrong message to buyers.
Consecutively, all Polygraph responses in PolyMix-4 carry modification and expiration information, and that information is correct. The real-world settings would be significantly different, but it is difficult to accurately estimate the influence of these settings on cache performance.
PolyMix-4 defines a mixture of content types. Each content type has the following properties:
| Type | Percentage | Reply Size | Cachability | Expiration |
|---|---|---|---|---|
| Image | 65.0% | exp(4.5KB) | 80% | logn(30day, 7day) |
| HTML | 15.0% | exp(8.5KB) | 90% | logn(7day, 1day) |
| Download | 0.5% | logn(300KB,300KB) | 95% | logn(0.5year, 30day) |
| Other | 19.5% | logn(25KB,10KB) | 72% | unif(1day, 1year) |
PolyMix-4 uses the same latency and packet loss parameters that we used for PolyMix-3. The Polygraph client and server machines are configured to use FreeBSD's DummyNet feature.
We configure Polygraph servers with 40 millisecond delays (per packet, incoming and outgoing), and with a 0.05% probability of dropping a packet. Server think times are normally distributed with a 2.5 second mean and a 1 second standard deviation. Note that the server think time does not depend on the oid. Instead, it is randomly chosen for every request.
We do not use packet delays or packet loss on Polygraph clients.
Conditional HTTP requests with an If-Modified-Since (IMS) header represent significant portion of the Web traffic (10%-20%). Generation of If-Modified-Since requests was significantly improved compared to earlier workloads. PolyMix-4 robots no longer use fake dates for the IMS header. Instead, the last modification time (LMT) is used if a short ``304 Not Modified'' response is thought and the previous LMT is used if a complete ``200 OK'' response is required. The portion of ``200 OK'' responses to IMS requests is controlled by the workload parameter and is set to 66%. Overall, 15% of PolyMix-4 robot requests are IMS requests.
A cache may receive If-Modified-Since requests for the objects that are not in the cache and even for the objects that have never been seen by a cache. Caches should be able to deal with this perhaps somewhat unusual, but nevertheless real situation.
PolyMix-4 workload has a 58% offered hit ratio. In the workload definition, this is actually specified through the recurrence ratio (i.e., the probability of revisiting a Web object). The recurrence ratio must account for uncachable responses and special requests. In PolyMix-4, a recurrence ratio of 72% yields an offered hit ratio of 58%. Note that to simplify analysis, only ``basic'' requests are counted when hit ratio is computed; special requests (If-Modified-Since and Reload) are ignored because in many cases there is no reliable way to detect whether the response was served as a cache hit.
Polygraph enforces the desired hit ratio by requesting objects that have been requested before, and should have been cached. There is no guarantee, however, that these objects are in the cache. Thus, our parameter (58%) is an upper limit. The hit ratio achieved by a proxy may be lower if it does not store some cachable objects, or purges previously cached objects before the latter are revisited. Various HTTP race conditions also make it difficult, if not impractical, to achieve ideal hit ratios.
PolyMix-4 introduces a ``hot subset'' simulation into the popularity model. At any given time, a 1% subset of the URL working set is dedicated to receive 10% of all requests. As the working set slides with time, the hot subset may jump to a new location so that all hot objects stay within the working set. This model is designed to simulate realistic Internet conditions, including ``flash crowds.'' We have not yet fully analyzed the effect of this hot subset model.
A single Polygraph client machine supports many simulated robots. A robot can emulate various types of Web clients, from a human surfer to a busy peer cache. All robots in PolyMix-4 are configured identically, except that each has its own IP address. We limit the number of robots (and hence IP aliases) to 1000 per client machine.
A PolyMix-4 robot requests objects using a Poisson-like stream, except for embedded objects (images on HTML pages) that are requested simulating cache-less browser behavior. A limit on the number of simultaneously open connections is also supported, and may affect the request stream.
PolyMix-4 servers are configured identically, except that each has its own IP address.
Polygraph supports persistent connections on both client and server sides. PolyMix-4 robots close an ``active'' persistent connection right after receiving the N-th reply, where N is drawn from a Zipf(64) distribution. The robots will close an ``idle'' persistent connection if the per-robot connection limit has been reached and connections to other servers must be opened. The latter mimics browser behavior.
PolyMix-4 servers use a Zipf(16) distribution to close active connections. The servers also timeout idle persistent connection after 15 sec of inactivity, just like many real servers would do.
A detailed treatment of many PolyMix-4 features is available on the Polygraph Web site, along with the copies of workload configuration files.
We rented a former Pearl Street storefront in Boulder, Colorado for the Cache-off event. The building was quite cozy and the upper floor became quite warm with all the machines running.
Testing took place from Monday, November 12 through Friday, November 16, with some tests finishing on Saturday. As described in the rules, participants are guaranteed at least 55 hours of testing time. Vendors had access to the cache-off facility from 9 AM until 8 PM each day. Tests were often scheduled to run overnight.
For this Cache-off, we used our new PolyI tool to manage the tests. PolyI has a Web interface that easily allows us to create, start, monitor, stop the tests and generate the reports. The tool also handles distribution of FreeBSD kernels to clients and servers, as well as time synchronization and results archival tasks. While PolyI reduces test management work to a few mouse clicks, it is not required to run Polygraph tests or to reproduce the Cache-off results. We describe the internals of our benchmarking environment below.
We rented 100 PC's for use as Polygraph clients and servers. These machines are a variety of HP Vectra's and no-name clones. Each had at least a 650 MHz Pentium III CPU, 256 MB of RAM, and an Intel Etherexpress PRO/100+ Ethernet card.
We use FreeBSD-4.3 as the base operating system for the Polygraph clients and servers. We make a number of changes to kernel parameters in order to support PolyMix-4. We provide participants with a custom-built FreeBSD distribution to simplify the installation process for them and reduce the chance of configuration mistakes. This software is available to the public from our web page.
The number of Polygraph machines varies for product under test. Peak request rates vary a lot among caching products. Thus, each participant informed us how many Polygraph client-server pairs they need to drive their cache at its maximum capacity.
During the cache-off, we never use more than 500 requests per second per machine for official tests.
Each bench has a monitoring PC connected to the harness network. This PC is used to start Polygraph runs, display run-time statistics, collect logs after the completion of a run, and generate Polygraph reports.
We run the ntpd time server daemon on all Polygraph machines and the monitoring PCs. The monitoring PCs are synchronized periodically with a designated reference clock. We run ntpd on all machines rather than just synchronizing clients and servers before each test as was done during the first cache-off. While running ntpd could introduce small CPU overhead, we are concerned that without periodic synchronization, local clocks may drift apart during these long (15+ hours) tests.
Each test bench consists of Polygraph machines, the monitoring PC, the participants proxy cache(s), and a network to tie them together. The networking equipment falls under the participant's domain. That is, each participant is responsible for providing the networking equipment need to connect the Polygraph machines to the caches.
A new feature for PolyMix-4 is that the Polygraph agent IP addresses are bound to the system's loopback interfaces. This nifty trick avoids the problems with huge ARP tables that we had for PolyMix-3 tests. It also requires a number of static routes on clients, servers, and proxies so that each knows how to reach the agent addresses.
The following figure shows a typical bench configuration, with a flat network:
Each Polygraph machine requires a fast Ethernet port, so the participant must have enough ports to connect all of Polygraph machines within participant's cluster. The monitoring PC must have IP connectivity to all clients and servers at all times.
We run bidirectional netperf tests between each client-server pair to measure the raw TCP throughput. We also selectively execute Polygraph ``no-proxy'' runs to ensure that clusters can generate enough throughput to sufficiently drive the cache under test.
Before running any PolyMix-4 tests, we always test network throughput with Netperf. All netperf tests showed satisfactory levels of raw network performance.
All ``no-proxy'' tests were successful, delivering desired throughput and negligible response time overheads.
| PCs rented: | 100 + 3% spares |
| Vendors: | 10 |
| Humans: | 13 |
| Products tested: | 14 |
| Floor space: | 5000 ft2 |
| 15-amp power circuits: | 18 |
| 6-outlet power strips: | 50 |
| Ethernet cables: | approx 200 |
| Games of ``Bust-A-Move'' played: | at least 100 |
This section describes the official testing sequence. The complete sequence was executed at least once against all cache-off entries.
PolyMix-4 is the main performance test which generates the vast majority of the reported numbers. This test is discussed in the ``Cache-off Workload'' Section.
Note that the cache is filled as a part of the PolyMix-4 workload. Depending on the product, this can take anywhere from four to 20 hours.
WebAxe-4 is another performance test that we added for this Cache-off. It is designed to simulate traffic for a caching proxy in a surrogate role (i.e., as an HTTP server accelerator). In general, WebAxe-4 is an easier test because the working set size is much smaller (only 1GB) and the recurrence ratio is very high (90%).
The ``Downtime Test'' is performed only after a successful PolyMix-4 run. We use one client-server pair. During the first 10 minutes of the test, Polygraph creates a 3-10 req/sec load through the proxy. The power to all participant devices, including networking equipment, is then manually turned off. After about 5 seconds of ``downtime,'' the power is turned back on, and the measurement phase begins. We measure the times until the first miss and the first hit. Polygraph continues to emit the same light load during the entire test. The precision of this test is around 5 seconds.
It is important to note that the cache(s) and networking gear are plugged into power strips. We turn off the power strips and not the equipment boxes to simulate realistic conditions of an unexpected software, hardware, or power failure. Vendors are not allowed to assist the reboot process. UPS devices of any kind are not allowed during this test.
We realize that the downtime test and execution rules are simple, if not primitive. However, even this test provides very useful data to cache administrators. Depending on the installation environment and reported cache performance, one can decide whether to invest in UPS systems and/or redundant configurations. We will work on improving the workload for this test.
As we described earlier, all entries must have a Maximum Segment Lifetime (MSL) of 30 seconds, producing a TIME-WAIT state of 60 seconds. Any product which fails this test is disqualified.
To determine the MSL on each product, we probe its TCP stack and monitor connection requests. If the system accepts a new connection with the same sequence number in under 60 seconds, it fails the test. The msl_test program is included in the Polygraph source distribution.
All cache-off entries reported TIME-WAIT state of 59-60 seconds.
Controversies are a regular feature of cache-off benchmarking, and this one is no exception. While we always do our best to learn from the past, inevitably new and unexpected situations arise. This time, there are three problems that are worth mentioning here.
The equipment belonging to two of the vendors did not arrive at the Cache-off location in time for testing. Apparently, their shipments suffered significant delays with U.S. customs. On Wednesday, JST and NAIST gave up on receiving their hardware and borrowed two old PC systems belonging to The Measurement Factory.
The pricing for the affected entries was done by the Measurement Factory using on-line resources for used and new computer parts. We tried to come up with a reasonable hardware price, but our estimation could not be precise. Please keep that in mind when looking at price/performance analysis for JST and NAIST entries.
The price of the BBGCache entry in the first draft of this report was based on the estimate made at the time of the cache-off. That draft was released exclusively to cache-off participants, except for the ARATech team that represented BBGCache. We waited for the final (and overdue) pricing confirmation from ARATech before providing their team with the draft. That was done to avoid an unfair situation were one participant could make price adjustments knowing competition prices. We hoped that since ARATech was not given access to the draft, other participants will not complain if ARATech team needs to change the price.
Our hopes did not materialized. ARATech did adjust the price for the BBGCache entry, and that adjustment caused changes in products order on price/performance graphs. Those changes in order caused complaints from other vendors. The complaints were based primarily on the fact that the Measurement Factory could not guarantee that the draft was not leaked to the ARATech team by a friendly cache-off participant.
After a series of long and heated discussions, affected participants agreed to accept a change in price, but only to the level where the change does not alter the original order of price/performance graphs.
In retrospect, we should not have released the draft until all participants have confirmed their prices, even though the last confirmations would delay the work on the report.
ARATech brought a third entry to the Cache-off, but did not start any official tests on it. ARATech intends to bring the product back to our labs January 7, 2002 for a retest. Major configuration parameters and price for the product will not be changed till then. The results of the retest will be posted on our Web site.
For this Cache-off, we improved Polygraph's report generation tool. One improvement was to measure hit ratios by comparing the number of transactions and bytes on client and server sides. Such a comparison yield more accurate ratios because it does not require client-side guesses about whether the transaction resulted in a hit. Such guesses are imprecise and can be done only for basic HTTP transactions. The old report generator (ReportGen) uses this imprecise technique to report hit ratios. The new tool reports hit ratios measured using both methods.
We have chosen the new, more precise, measurement algorithm for the baseline presentation in this report. This choice confused some of the participants since they became accustomed to the numbers reported by the old tool. We felt that our choice was justified because the new measurements are more precise and because the old measurements are still available in new reports.
During the Cache-off we observed and fixed a bug relating to phase synchronization. The phase synchronization algorithm could malfunction if polysrv receives responses in the opposite order than polyclt transmits them. When this happens, the polygraph agents might become confused and never synchronize their phases. To the best of our knowledge, the bug affected one test on one entry. The test was repeated.