A Monte Carlo simulation is done to determine an estimate of pi.
The area enclosed by a circle of radius 1 is pi. Based on this pi can be determined by a simple Monte Carlo simulation
- draw two random number x and y in the range [-1,+1]
- calculate
r = x*x + y*y
- increment a hit counter when
r <= 1
- repeat the above many times
- the estimate for pi is
4. * hit_count / try_count
The error of the estimate is given by the sampling error and will decrease with square root of the number of tries.
The random numbers are
pseudo-random
numbers produced by a
random number generator,
which is in fact a fully deterministic algorithm.
mcpi
uses the most simple
linear congruential generator of the form
X_n+1 = (69069 * X_n) mod 2^32
which can be implemented in C essentially as one liner (on a 32 bit system)
unsigned int seed = <some odd number>;
...
seed = 69069 * seed;
So it's just an unsigned int multiplication which overflows. The integer
number sequence covers the range [1,2^32-1]
and can be turned into a
floating point number sequence in the range [-1,+1]
by simple conversion
and re-scaling.
Such a sequence has still strong sequential correlations. To reduce them a simple shuffling technique with a pool of 128 numbers is used. The shuffling algorithm is simply
- take the upper 7 bits of the last generated raw random number
- use this as index in the shuffle array, use the stored number for output
- generate a new raw random number to replace the just delivered one
The simple seed = 69069 * seed
works fine in C, but can't be used in
languages with don't provide unsigned integer arithmetic. mcpi
uses
therefore double precision floating point arithmetic, which allows to
exactly perform integer operations in a portable way. This works
because the largest integer has 17+32=49 bits and the
IBM flaoting point format
as well as the
IEEE floating point format
can represent up to 53 bit integers exactly.
Taken all together mcpi
is essentially a double precision floating point
benchmark. Because a constant
random seed
is used the results are fully deterministic and should be the same for
all languages.
The codes are controlled by an input file with a first line in 3I10
format
and several following lines in 1I10
format
IDBGRR IDBGRN IDBGMC
NGO
NGO
...
0
where the first line enables (1) or disables (0) the debug printout of
IDBGRR
- generation of raw random numbers (integer; pre-shuffle)IDBGRN
- generation of final random numbers (float; post-shuffle)IDBGMC
- each throw at the circle with coordinates
and the following lines specify how many Monte Carlo tries NGO
should be done. A 0
or an end-of-file terminates the simulation.
See typical test run or
benchmark run input files.
Assembler - mcpi_asm.asm
The assembler implementation also uses double floating point arithmetic to generate the random numbers, so the code is still well comparable to the high level language implementations. However, the generation of the shuffle index and the modulo arithmetic utilize optimizations which are only possible in assembler.
Algol 60 - mcpi_a60.a60
mcpi_a60_*.JES
fails on tk4-
update 08 due to a compiler bug.
The code requires double precision floating point, which in IBM Algol 60
must be selected with the compiler option LONG
. Due to a bug in the
compiler this option is not recognized, single precision code is generated,
which is does not give proper results.
The bug is reported, see
turnkey-mvs posting.
A fix of the compiler is available from the maintainer, Tom Armstrong, see
turnkey-mvs/files/IEX10.zip, and must be installed before running mpci_a60*
jobs. This
fix will be included in tk4- update 09.
C - mcpi_cc.c
JCC
in the version coming with tk4- update 08
has a severe bug which leads to a wrong constant, as result the JCC runs give
wrong results.
The jobs directory contains three types of jobs for mcpi
named
mcpi_*_t.JES --> trace all random numbers are decisions for 10 rounds
mcpi_*_f.JES --> print summaries for larger statistics
Usually mcpi_*_t.JES
is used for a verification check and should produce
RR: 12345 852656805 : 0
RR: 852656805 3856269089 : 13711
RR: 3856269089 547813997 : 62014
RR: 547813997 2598048329 : 8809
RR: 2598048329 866408821 : 41780
... snip ...
RR: 1069324829 938992185 : 17196
RR: 938992185 1245056165 : 15100
RN: 0 852656805 0.19852463
RR: 1245056165 949059873 : 20022
RN: 25 3905624449 0.90934905
MC: -0.60295073 0.81869811 1.03381618 0
... snip ...
MC: 0.36922017 0.76149709 0.71620135 9
PI: 10 9 3.60000000 0.45840735 3782786201
mcpi_*_f.JES
should in general output the equivalent of
ntry nhit pi-est pi-err seed
PI: 100 77 3.08000000 0.06159265 3559066133
PI: 300 239 3.18666667 0.04507401 3212561425
PI: 1000 800 3.20000000 0.05840735 3843976237
PI: 3000 2371 3.16133333 0.01974068 3743766953
PI: 10000 7856 3.14240000 0.00080735 3545464689
PI: 30000 23534 3.13786667 0.00372599 1276758233
PI: 100000 78565 3.14260000 0.00100735 1236459481
PI: 300000 235566 3.14088000 0.00071265 1205541793
PI: 1000000 785254 3.14101600 0.00057665 1956597129
PI: 3000000 2355459 3.14061200 0.00098065 11667865
where the highest statistics point depends on the language.
An initial round of benchmark tests was done in December 2017
- on an Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz (Quad-Core with HT)
- using tk4- update 08
- staring hercules with
NUMCPU=2 MAXCPU=2 ./mvs
- using
CLASS=C
jobs, thus only one test job running at a time
The key result is the GO-step time of the mcpi_*_f
type jobs for different
compilers. The table is sorted from fastest to slowest results and shows
in the last column the time normalized to the fastest case (asm):
Compiler ID | job time | */asm |
---|---|---|
asm | 4.02 | 1.00 |
gcc | 7.40 | 1.84 |
pas | 9.93 | 2.47 |
forh | 10.86 | 2.70 |
forg | 13.08 | 3.25 |
pli | 19.69 | 4.89 |
sim | 36.83 | 9.16 |
forw | 41.35 | 10.29 |
a60 | 213.66 | 53.15 |
See also the benchmark summary for an overview table and a compiler ranking.
I got exposed to large scale Monte Carlo simulations in the mid 80's when I used the CERN software package GEANT-3 to study the performance of a TPC-like tracking detector. Soon after I embarked on some model studies on critical phenomena with percolation and Ising models.
When implementing the Swendsen-Wang algorithm on a DEC Alpha system I quickly realized that the random number generator consumes much of the total CPU time, so I searched for fast ones. However, in Metropolis sampling the quality of the random numbers can be essential. It turned out that LCG plus shuffle is fast but not good enough, especially for subtle quantities like the critical point.
Remembering these days lead to adding a simple Monte Carlo Simulation as test case, and determining pi is about as simple as one can get it. And LCG plus shuffle is for sure good enough for this purpose.