9. Web Traffic Workloads and Characterization

9.1 Whats the average size of a document?

A few different answers to this question...

Polygraph's mean reply size

Current Polygraph workloads have a mean reply size of about 10.7KB. The estimated mean is shown at the beginning of polyclt output:

000.01| Content distribution on server PolyMix-3-srv:
        content        planned%         likely%          error%
          image           65.00           65.46            0.70
           HTML           15.00           14.80           -1.31
       download            0.50            0.49           -1.31
          other           19.50           19.25           -1.31
expected average cachability: 80.01%
expected average object size: 10771.11Bytes

Log file trends

For a data point on long-term trends, you can look at

Note that medians ( 4.5KB) are usually much lower than means (10.5KB) in our environment. Recent standard PolyMix workloads reflect that fact, but you can configure Polygraph to use different size distributions, of course.

John's Study...

by John Judge

I did a survey some years ago as part of paper:

J. Judge, H. W. P. Beadle, J. Chicharo, ``Sampling HTTP Response Packets for Prediction of Web Traffic Volume Statistics,'' in Proc. Globecom'98, Nov., 1998 ( postscript copy available).

I found that the mean size of a HTTP response varied considerably between different proxy server/networks sampled. I found a high of 13824 bytes (sampled from one of the NLANR caches) to a low of 6491 bytes from the Berkeley ``Home IP'' trace. The difference was not date related but did correlate slightly with the percentage of type 304 responses in the sample. In two of the traces the mean packet size varied noticeably over the day with a tendency towards larger mean packet size in the late evening/early morning.

I think browsing behaviour varies between user populations enough such that statistics, such as mean response packet size, varies between different proxy servers. It even looks like there is a dependency with the time of day.

