forked from idvorkin/measureitdotnet
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathUsersGuide.htm
667 lines (665 loc) · 35.7 KB
/
UsersGuide.htm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
<html>
<head>
<title>MeasureIt Users Guide</title>
<style type="text/css">
.style1
{
color: #CC0000;
}
</style>
</head>
<body>
<h2> MeasureIt Users' Guide </h2>
<h3> Table Of Contents</h3>
<ol>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#Simple">Simple Usage: Understanding the costs of runtime primitives</a></li>
<li><a href="#HowMeasurements">How Measurements are Actually Taken</a>
<ul>
<li><a href="#Limits">Limits of accuracy</a> </li>
</ul>
</li>
<li><a href="#Unpacking">Unpacking the Source</a>
<ul>
<li><a href="#Rebuilding">Rebuilding MeasureIt</a> </li>
</ul>
</li>
<li><a href="#Debugging">Debugging Optimized Code In Visual Studio (Validating a benchmark)
</a> </li>
<li><a href="#NewBenchmarks">Making New Benchmarks</a>
<ul>
<li><a href="#Scaling">Scaling Small Benchmarks</a> </li>
<li><a href="#LoopCounts">Modifying the Loop Counts </a> </li>
<li><a href="#State">Benchmarks with State</a></li>
</ul>
</li>
<li><a href="#PowerOptions">Power Options</a></li>
<li><a href="#Contributions">Contributions</a></li>
</ol>
<h3><a name="Introduction">
Introduction</a></h3>
<p>
Most programmers should be doing more performance measurements as they design
and measure code. It is impossible to create a high performance
application unless you understand the costs of the operations that you use.
Because the .NET framework is a virtual machine, the operations are more
'abstract' and thus it is even harder to reason about the cost of operations
from first principles.</p>
<p>
To solve this problem, it is often suggested that the document ion for
classes and methods include information on expensive the operations are.
Unfortunately, this problematic because
</p>
<ul>
<li>The set of classes and methods is large (and growing), making generating the
information expensive. </li>
<li>The cost is often not simple to describe. Often it depends on multiple
arguments. Often there is a large first time cost. Sometimes there is a
cache, which means random arguments have one costs, but repeated access to
arguments used in the recent past have a much lower cost. Sometimes the costs
is highly input dependent (e.g.. Sorting). </li>
<li>The costs can vary depending on the scale of the problem (doing operations on
small vectors is typically CPU bound, doing them on very large vectors, becomes
CPU cache bound). </li>
<li>The costs change over time due to changes in the code, or the underlying support
code (eg .NET Runtime or OS version). </li>
<li>The costs depend on configuration (did you NGEN, are you a Multi-Appdomain
scenario, is your scenario fully trusted or not)</li>
<li>The costs depend on CPU architecture, which also change over time (sometimes
dramatically)</li>
</ul>
<p>
These factors make it difficult to provide useful performance information as
'normal' documentation. As an alternative, the <b>MeasureIt</b> utility
attempts to make it really easy for you to measure the costs of operations that
are important for YOUR application. These can then be run the CPUs,
OS, runtimes, and software versions that are relevant to you (and you can see if
they vary or not). You can also 'mock up' code that closely
resembles the performance-critical code paths in your application to insure that
types of inputs fed to the methods of interest are 'representative' of your
application. You can then experiment with variations on your
design and find the most efficient implementation.</p>
<h3>
<a class="" name="Simple">Simple Usage: The Costs of Runtime Primitives</a></h3>
<p>
<b>MeasureIt</b> comes with a set of 'built-in' performance benchmarks that
measure the 'basics' of the runtime. These benchmarks measure the speed of
method call, field access, and other common operations. Running MeasureIt
without any parameters will run all of these built-in benchmarks.
For example, running the command</p>
<blockquote>
MeasureIt</blockquote>
<p>
Runs all the 'default' benchmarks. After completion it displays a
HTML page with a table like the table abbreviated below.
</p>
<dl>
<dd>
<p>
Scaled where EmptyStaticFunction = 1.0 (6.1 nsec = 1.0 units)
</p>
<table border="1" class="">
<tr>
<th class="">
Name</th>
<th class="">
Median</th>
<th class="">
Mean</th>
<th class="">
StdDev</th>
<th class="">
Min</th>
<th class="">
Max</th>
<th class="">
Samples</th>
</tr>
<tr>
<td class="">
NOTHING [count=1000]</td>
<td class="">
-0.438</td>
<td class="">
-0.252</td>
<td class="">
0.541</td>
<td class="">
-0.438</td>
<td class="">
1.371</td>
<td class="">
10</td>
</tr>
<tr>
<td class="">
MethodCalls: EmptyStaticFunction() [count=1000 scale=10.0]</td>
<td class="">
1.000</td>
<td class="">
0.949</td>
<td class="">
0.080</td>
<td class="">
0.851</td>
<td class="">
1.060</td>
<td class="">
10</td>
</tr>
<tr>
<td class="">
Loop 1K times [count=1000]</td>
<td class="">
343.472</td>
<td class="">
359.915</td>
<td class="">
29.826</td>
<td class="">
341.746</td>
<td class="">
441.198</td>
<td class="">
10</td>
</tr>
<tr>
<td class="">
MethodCalls: EmptyStaticFunction(arg1,...arg5) [count=1000 scale=10.0]</td>
<td class="">
1.556</td>
<td class="">
1.549</td>
<td class="">
0.099</td>
<td class="">
1.403</td>
<td class="">
1.788</td>
<td class="">
10</td>
</tr>
<tr>
<td class="">
MethodCalls: aClass.EmptyInstanceFunction() [count=1000 scale=10.0]</td>
<td class="">
1.148</td>
<td class="">
1.160</td>
<td class="">
0.091</td>
<td class="">
1.070</td>
<td class="">
1.334</td>
<td class="">
10</td>
</tr>
<tr>
<td class="">
MethodCalls: aClass.Interface() [count=1000 scale=10.0]</td>
<td class="">
1.597</td>
<td class="">
1.664</td>
<td class="">
0.089</td>
<td class="">
1.592</td>
<td class="">
1.843</td>
<td class="">
10</td>
</tr>
</table>
</dd>
</dl>
<p>
Often only a small number of benchmarks are of interest. To allow users to
control which benchmarks are run, benchmarks are grouped into 'areas' which can
be specified on the command line. MeasureIt /? will give you a list
of all the available areas that can be specified. You can then simply list
the ones of interest to limit which benchmarks are run. For example to run
just the benchmarks related to method calls and field access you can do</p>
<blockquote>
MeasureIt MethodCalls FieldAccess</blockquote>
<h4>
Best practices and Caveats</h4>
<p>
Fundamentally the <b>MeasureIt</b> harness measures wall clock time.
Thus in particular if you measure on a uniprocessor and there are other
background tasks running then they will interfere with the measurements (this is
not as problematic on a multi-processor). Generally it is not a big
problem because sensitive benchmarks tend to not block, and tend to complete in
the single quantum of CPU time given to a thread, so they tend to not be
interrupted. Still it is best, however to avoid other work on the
machine while the harness is running.
</p>
<p>
While MeasureIt does take several samples to highlight variability, it does take
all these samples in the same process and very close to each other in time.
Thus it is possible that the measurements may vary significantly when run in a
different process, or when spread out more in time within the process.
Thus be alert to this and if you see unexplained variability in different runs
or under different conditions, keep in mind that the variability metrics that
MeasureIt provides are <b>lower bound</b> on the true variability of the
measurement.
</p>
<h3>
<a class="" name="HowMeasurements">How Measurements are Actually Taken</a></h3>
<p>
MeasureIt goes to some trouble to minimize and highlight the most common sources
of measurement error. These include
</p>
<ol>
<li>'First Time' costs like Just In Time compilation, memory page faults, and
initialization code paths. This is avoided by running the benchmark once
before measurement begins</li>
<li>Timer Resolution. The timer MeasureIt uses typically has a resolution of <
1usec. Thus very small intervals (eg nsec) need to be 'magnified' by repeating
them many times to bring the actual time being measured to something more like
100usec or 1 msec. Thus benchmarks have a 'count' which represents the number
of times the benchmark is repeated for a single measurement. The value
presented, however represents the time consumed by a single iteration (thus you
can ignore the count unless you are doing a measurement error analysis).
</li>
<li>Overhead associated with starting and stopping the counter (and looping over the
benchmark) This is minimized by measuring an empty benchmark and subtracting
that value from every measurement as 'overhead'. However since there is
variability in this overhead, this can lead to negative times being reported for
very small intervals (< 10 usec). However well designed benchmarks should
never be measuring times this small (point (2))</li>
<li>Variability associated with loops. Very small loops (as might happen for
short benchmarks) can have execution times that vary depending on the exact
memory address of the loop. It is very difficult to control for this
variability. Instead a robust solution is to 'macro expand' the benchmark a
number times (typically 10 is enough), so that the time the benchmark takes is
large compared to the overhead of the loop itself. However it is useful to
report the number as if the benchmark was not cloned. This is called the
'scale' of the benchmark, and the reported number is the measured time (of a
single loop) divided by the scale. Like the count, it can be ignored unless
you are doing an measurement error analysis. </li>
<li>Benchmark variability: The benchmark is run a number of times (typically 10),
and mean, median, min, max and standard deviation are reported to highlight the
variability in the measurement. </li>
</ol>
<p>
To highlight variability, the first line in the output table is the 'NOTHING'
benchmark. It measures an empty benchmark body, and the time should be
zero if there were no measurement error. Thus the number in this line give
us some idea of the best accuracy we can hope to have. In the example
above we can see that this measurement varied from -.438 to +1.371, so numbers
smaller than that are dubious. However keep in mind that all
benchmarks with a scale are actually measuring a time that is bigger than is
actually reported. For example the 'EmptyStaticFunction' benchmark has a
scale of 10, which means that the reported number 1.0 is was really measuring 10
units of time. Thus errors of -.438 or + 1.37 are not 43% or 137%
errors but rather 4.3% to 13% errors. This is consistent with the
min and max values reported for 'EmptyStaticFunction' (which suggest 6% to 15%
error). Thus the 'NOTHING' benchmark gives you a good idea of how much
error an inherent part of the harness.
</p>
<p>
In addition to the error introduced by the harness itself, the benchmark will
have variability of its own. The standard way of dealing with this
is to find the mean (average) of the measurements or the median (middle value in
a sorted list). Generally the mean is a better measurement if you
use the number to compute an aggregate throughput for a large number of items.
The median is a better guess if you want a typical sample. The median also
has the advantage that it is more stable if the sample is noisy (eg has large
min or max values). If the median and the mean differ significantly
however, there is a good chance there is too much error for the data to be
valuable.
</p>
<h4>
<a class="" name="Limits">Limits of accuracy</a></h4>
<p>
The min and max statistics give you the lower and upper bounds of the expected
error of the measurements. The standard deviation is also a useful error
metric. It is very likely that the statistics for the error form
what statisticians call a 'normal distribution'. If this assumption
holds, then 68% of all measurements should be with one standard deviation of the
mean, and over 95% of all measurements should be within 2 standard deviations of
the mean. For example, since the 'EmptyStaticFunction' benchmark has
a standard deviation of .08, and a mean of .949, then our expectation is that
68% of all samples should be in the range of 0.948-0.08 = .868 and 0.948+0.08 =
1.028 and that 95% of all samples should be within 0.948-2*0.08 = .788 and
0.948+2*0.08 = 1.108. This is another useful way of quickly getting
a handle on expected error.
</p>
<h3>
<a class="" name="Unpacking">Unpacking the Source</a></h3>
<p>
In performance work, the details matter! On the display for the
benchmark it gives a short description of what the benchmark does, but if the
scenario is at all complicated, you will find yourself wondering exactly what is
being measured and under what conditions. Well, you can find out!
The MeasureIt.exe has embedded within it its complete source code.
To unpack it simply type
</p>
<ul>
<li>MeasureIt /edit</li>
</ul>
<p>
Which will unpack the source code into a 'MeasureIt.src' directory and launch
the Visual Studio .SLN file (If you have it installed). Even without
Visual Studio you can view the source code with notepad or another editor.
A good place to start browsing the source code is the _ReadMe.cs file, which
contains an overview of the rest of the source code. If you use Visual Studio
I strongly recommend that you download the code 'hyperlinking' feature mentioned
at the top of the _ReadMe.cs file. This will allow you to navigate the code
base more easily.
</p>
<p>
All the benchmarks themselves live in the 'MeasureIt.cs' file and you can find
the code associated with a particular benchmark by searching for its name. For
example the benchmark for 'FullTrustCall' is
</p>
<pre> timer1000.Measure("FullTrustCall()", 1, delegate
{
PinvokeClass.FullTrustCall(0, 0);
});</pre>
<p>
As you can see, a benchmark is simply a call to the 'Measure()' method.
The Measure() method takes three parameters:meters:</p>
<ol>
<li>The name of the benchmark. This can be any string. </li>
<li>The scale of the benchmark (1 in this case) .This is now many times the code of
interest is cloned within the body of the benchmark (do reduce measurement
error). The number reported by the harness is the actual time divided by this
number.</li>
<li>A delegate representing the code to be measured. Typically the delegate {}
syntax is used to place the code inline. </li>
</ol>
<p>
In this case we see that the benchmark consists of a single call to the
'FullTrustCall()' method. Searching for the definition we find
</p>
<pre> [DllImport("mscoree", EntryPoint = "GetXMLElement"), SuppressUnmanagedCodeSecurityAttribute]
public static extern void FullTrustCall(int x, int y); </pre>
<p>
Which gives you all the details of exactly what is mean by a full trust call
(basically using the SuppressUnmanagedCodeSecurityAttribute). It also
shows that the actual native call happens to be a method called GetXMLElement in
the mscoree DLL. There is a comment in this code that indicates that
this particular native method consists of exactly 2 instructions (determined by
steping through it). Thus all the information needed to understand
anomolous performance results is available.
</p>
<h4>
<a class="" name="Rebuilding">Rebuilding MeasureIt</a></h4>
<p>
<b>MeasureIt</b> is also set up to make changing it and rebuilding easy.
Inside Visual Studio this is as easy as invoking the build command (eg F6), but
can also be done easily for those without Visual Studio by simply executing the
'build.bat' file in the MeasureIt.src directory. The only
prerequisite for build.bat is that a current (V2.0 or later) verison of the
<a href="http://msdn2.microsoft.com/en-us/netframework/default.aspx">.NET
Runtime</a> be installed (Windows Vista comes with it built in).
In either case, the build will automatically update the original MeasureIt.exe
file (including incorperating the updates source into it) when it is rebuilt.
Thus with one keystroke you can update your oringinal MeasureIt.exe to
incorperate any changes you need.
</p>
<p>
The build currently has 12 'expected' warnings about links to files within
projects. This results from the technique used to incorperate the source
into the EXE and they can be ignored. </p>
<h3>
<a class="" name="Debugging">Debugging Optimized Code In Visual Studio
(Validating a benchmark)</a></h3>
<p>
When measuring a benchmark that does very little (eg a single method call or
other operation), it is likley that the just in time (JIT) compiler may
transform the benchmark in a way that would yield misleading results.
For example, to measure a single method call the benchmark might call a empty
method, but unless special measures are taken the JIT compiler will optimize the
method away completely. Thus it is often a good idea to look at the
actual machine code associated with small benchmarks to confirm that you
understand the transformations that the runtime and JIT have done. In a
debugger like Visual Studio, it should be as easy as setting a breakpoint in
your benchmark code and switching to the disassembly window (Debug -> Windows
->Disassembly). Unfortunately, the default options for Visual Studio are
designed to ease debugging, not to do performance investigations, so you need to
change two options to make this work</p>
<ol>
<li>Clear the checkbox : Tools -> Options
-> Debugging -> General -> Suppress JIT
Optimization. By default this box is checked, which means that even when
debugging code that should be optimized, the debugger tells the runtime not to.
The debugger does this so that optimizations don't interfere with the inspection
of local variables, but it also means that you are not looking at the code that
is actually run. I always uncheck this option because I strongly believe that
debuggers should strive to only inspect, and not change the program being
debugged. Unsetting this option has no effect on code that was compiled 'Debug'
since the runtime would not have optimized that code anyway.</li>
<li>Clear the checkbox Tools -> Options -> Debugging -> General -> Enable Just My
Code. The 'Just My Code' feature instructs the debugger not to show you code
that you did not write. Generally this is feature removes the 'clutter' of call
frames that are often not interesting to the application developer. However this
feature assumes that any code that is optimized can't be yours (it assumes your
code is compiled debug or suppressed JIT Optimizations is turned on). If you
allow JIT optimizations but don't turn off 'Just My Code' you will find that you
never hit any breakpoints because the debugger does not believe your code is
yours. </li>
</ol>
<p>
Once you have unset these options, they remain unset for ALL projects. Generally
this works out well, but it does mean that you may not get the Just My Code
feature when you want it. You may find yourself switching Just My Code on and
off as you go from debugging to performance evaluation and back (I personally am
happy leaving it off all the time).
</p>
<p>
Once you have set up Visual Studio, you can now set breakpoints in your
benchmark and view the disassembly to confirm that the benchmark is doing what
you think it should be. This technique is valuable any time a small
benchmark has unexpected performance characteristics.
</p>
<h3>
<a class="" name="NewBenchmarks">Making New Benchmarks</a></h3>
<p>
The real power of MeasureIt comes from the ease of adding new benchmarks.
In the simplest case, to add a new benchmark</p>
<ol>
<li>Add a 'area' to the program for your new benchmark by declaring a static method
of the form 'MeasureXXXX()' on the MeasureIt class. The XXX is the name of your
new area. If this method is public then this benchmark will be run by default
(when no area is specified on the command line). If the method is private, then
the area will only be run when the area is explictly mentioned on the command
line. </li>
<li>In this new method add one or more benchmarks by calling the 'Measure' method on
the 'timer1000' variable of the MeasureIt class. </li>
</ol>
<p>
That's it! For example, we might want to learn more about the the
costs of getting and setting environment variables. To do this we could
add the following code to the MeasureIt class.</p>
<pre> static public void MeasureEnvironmentVars()
{
timer1000.Measure("GetEnvironmentVariable('WinDir')", 1, delegate
{
Environment.GetEnvironmentVariabsteppinganomalousle("WinversionincorporatingoriginalincorporateincorporateDirlikely");
});
timerexplicitly1000.Measure("SetEnvironmentVariable('X', '1')", 1, delegate
{
Environment.SetEnvironmentVariable("X", "1");
});
}</pre>
<p>
The 'timer1000' variable is useful for benchmarks that take less than 100usec to
run. As its name suggests this timer will loop 1000 times over
the benchmark. Thus it will extend the time to something
like100msec, which makes measurement error small, but also will not make humans
impatient. As mentioned, before the 'Measure()' method takes a name
of a benchmark, the scale, and a delegate representing the body of the
benchmark. Often you can simply name the benchmark the literal code
being run, but otherwise any short, descriptive name will do. The
scale variable is used for benchmarks that only use 50 or less instructions and
will be explained in the next section. It should be 1 if the benchmark is
already large enough. Finally you have the delegate that represents
benchmark itself.</p>
<p>
When writing the benchmark code is is important to realize that the method will
be called many times by the harness. This works great for benchmarks that
are 'stateless', however can be a problem if the act of calling the benchmark
will perturb state such that would cause the performance to change (for example,
adding a entry to a table). That case is discussed in a later section.
</p>
<h4>
<a class="" name="Scaling">Scaling Small Benchmarks</a>
</h4>
<p>
If the benchmark is less 50 instructions or so, it is best to make the benchmark
bigger by cloning the benchmark a number of times in the delegate body being
measured. To keep the results easy to interpret, however, it
is important that the time measured is then divided by the number of times the
benchmark was cloned. That is the purpose of the 'scale' parameter of
the Measure() method. Here is an example used for method calls. 10
is passed for the scale because the benchmark is cloned 10 times.
</p>
<pre> timer1000.Measure("EmptyStaticFunction(arg1,...arg5)", <span
class="style1"><font color="#cc0000">10</font></span>, delegate
{
Class.EmptyStaticFunction5Arg(1, 2, 3, 4, 5);
Class.EmptyStaticFunction5Arg(1, 2, 3, 4, 5);
Class.EmptyStaticFunction5Arg(1, 2, 3, 4, 5);
Class.EmptyStaticFunction5Arg(1, 2, 3, 4, 5);
Class.EmptyStaticFunction5Arg(1, 2, 3, 4, 5);
Class.EmptyStaticFunction5Arg(1, 2, 3, 4, 5);
Class.EmptyStaticFunction5Arg(1, 2, 3, 4, 5);
Class.EmptyStaticFunction5Arg(1, 2, 3, 4, 5);
Class.EmptyStaticFunction5Arg(1, 2, 3, 4, 5);
Class.EmptyStaticFunction5Arg(1, 2, 3, 4, 5);
});
</pre>
<h4>
<a class="" name="LoopCounts">Modifying the Loop Counts</a></h4>
<p>
As its name suggests timer1000 runs the benchmark 1000 times for every
measurement. When the benchmark is larger than 100us, a loop count
of 1000 will make the benchmark run unnecessarily long. While you can
change loop count property on timer1000, it is generally not a good idea to do
so because it could easily lead to confusion about how many iterations are to be
done. Instead it is better to create new timers that are used when
different loop counts are needed. For example the following code creates a
new timer called timer1 that will only loop a single time for each measurement
and will take three (instead of the usual 10) samples for computing min, max,
and other statistics. This would be appropriate for benchmarks that take >
1sec to run.
</p>
<pre> timer1 = new MultiSampleCodeTimer(3, 1); // Take three samples, Loop count = 1
timer1.OnMeasure = logger.AddWithCount;</pre>
<p>
In the code above, in addition to creating the new timer, the timer also needs
to be hooked up to the HTML reporting infrastructure. This is what the
second line does. The 'OnMeasure' event is fired every time a timer has
new data, and the second line tells it to call the 'AddWithCount' method on
'logger' when this happens. The logger then insures that the data shows up
in the HTML report.
</p>
<p>
Using this technique you can change the loop count to whatever value makes sense
to keep long enough to avoid measurement error, but short enough to allow
efficient data collection. For very expensive benchmarks (> 1 min)
you may need to cut the number of measurements to only 1, but ideally you would
still do at least 3, so you can get some idea of the variability of the
benchmark.
</p>
<p>
Finally, by default timers 'prime' the benchmark before actually doing a
measurement. This insure that an 'first time' initialization is done and
does not skew the results. However when benchmarks are very long, (>
10 sec), this priming is expensive, and often not interesting. The
timers have a property called 'Prime' which by default is true but can be set to
false to avoid priming the benchmark and save time.
</p>
<h4>
<a class="" name="State">Benchmarks with State</a></h4>
<p>
Benchmarks really need to return the system to the same state they were in
before the benchmark started. However even some very simple scenarios
don't have this property. For this case we need a slightly more
complicated 'Measure' routine that knows how to 'reset' the state of the system
so that another run of the benchmark can be done. For example. the
following code finds the average cost of adding 1000 entries to a dictionary.
</p>
<pre> Dictionary<int, int> intDict = new Dictionary<int, int>();
timer1.Measure("dictAdd(intKey, intValue)", <span class="style1"><font
color="#cc0000">1000</font></span>, // Scaled by 1000 because we do 1000 Add()s
delegate // Benchmark
{
for (int i = 0; i < 1000; i++)
intDict.Add(i, i);
},
delegate // Reset routine
{
intDict.Clear();
});
</pre>
<p>
In this example, we pass two delegates to the Measure() routine. The first
delegate is the benchmark code itself, which adds 1000 entries to a list.
However because we pass 1000 as the 'scale' parameter to Measure() this
time will be divided by 1000, and thus report the average cost of adding a
single element. At the end of every measurement, the harness calls the
second delegate to 'reset' the state' In this case we just call the
Clear() method. It is true that there is some measurement error
because the cost of the loop is included in the measurement, but that error
should be small. The result is that this code will measure the
average time to add an element to add 1000 elements (in order).
</p>
<p>
It is worth noting how useful it is for someone using this benchmark to see the
actual code for the benchmark. Because we know that 'Dictionary' is
implemented as a hash table, the fact that the entries were added in order
probably does not skew our results (although we should check by doing some
variations). However if the Dictionary was implemented as a sorted
list, adding entries in order would be much faster than the 'random' case, and
our results would be skewed. As I mentioned before, in performance work,
details matter, and it is good to be able to see (and change) the details as the
need arises.
</p>
<h3>
<a class="" name="PowerOptions">Power Options</a></h3>
<p>
Modern computers have the capability of slowing the frequency of the CPU's clock
to save power. By default the operating system tries to
determine what the correct balence between power and performance is. Thus
it throttles the CPU when the computer is inactive and unthrottles it when it is
being used heavily. This adaptation however does take time
(typically 10s of Msecs), and always lags behind actual CPU usage.
While in general this adaptation is a good thing, it will lead to unreliable
benchmark numbers. Thus for benchmarking it is best to turn off this
power saving feature.</p>
<p>
By default, on VISTA and above operating system MeasureIt does exactly this, and
no additional user interaction is required. For older operating systems
this needs to be done manually by going to the Control Panel's Power options and
selecting the 'High Performance' power scheme.
</p>
<p>
While MeasureIt makes certain that its own benchmarks are run on the 'High
Performance' power scheme, other benchmarks will not have this capability.
Thus as a an additional service measureIt exposes the following options for
manually controlling the power scheme so that a non-measureIt benchmark can be
run under the correct power scheme.</p>
<ul>
<li>measureIt /setHighPerformance</li>
<li>measureIt /setBalenced</li>
<li>measureIt /setPowerSaver</li>
</ul>
<p>
The power options capability only work on VISTA (or Win2008) and above.
</p>
<h3>
<a class="" name="Contributions">Contributions/h3>
<p>
If you are interested in contributing back any improvements you made to
MeasureIt, you should contact me at <a href="mailto:[email protected]">
[email protected]</a> or leave a message at at
<a href="http://blogs.msdn.com/vancem/">my blog</a>.
</p>
<p>
</p>
</body>
</html>