-
Notifications
You must be signed in to change notification settings - Fork 0
Fast parallel random number generator for GPUs
License
gbeliako/bcnrand
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
BCNRAND, last change 27/4/2014 This is the C language parallel version of the library bcnrand, which generates uniform random variates of type double on (0,1), using the linear Congruential Generator (LCG) by J. Borwein and D. Bailey. There are two generators, the basic and the combined. The basic generator is of a good quality and has the period of approx 10^15. Its advantage is that it is very fast, twice as fast if compared to the very basic (and poor quality) standard C function rand(). The combined generator is a bit slower (still faster than rand()), but it has the period of 10^23, and it passes all the tests in BigCrush suite of tests. The random numbers are not of cryptographic strength, but are well suited for numerical simulations, especially in parallel processes, where the sam sequence of random variate is obtained when using any number of threads. This is because of the valuable skip ahead feature. The seed is the starting position in the sequence of random variates. The implementation is in the .h file, which means that only include statement is needed to start using this generator. Random variates are generated in parallel on GPU (with Nvidia's CUDA). This is version 2 of this library. Version 2 no longer has the original BCN method, as the method based on Barrett reduction is superior. This version no longer uses double-double implementation of the seeding step, as a more efficient implementation was found. The random variates are generated by iterating z_{k+1} = 2^53 z_k mod 3^33. The actual random variate is computed by x = z_{k+1} * 3^-33. The period is approx 10^15. Random variates can be generated in parallel and in multiple threads, although this version of the code is not always guaranteed to be thread safe, see notes below. A combined LCG is also used to ensure even better statistical quality of the generated sequence. The combined generator is marginally slower than J. Borwein and D. Bailey's LCG, but gives a longer period and passes Birthday spacings and Closed pairs tests as well. The pseudorandom sequence passes the statistical tests of BigCrunch test of TestU01 suite. The description of the method is published in: Beliakov, G., Creighton, D., Johnstone, M. and Wilkin, T. 2013, Efficient implementation of Bailey and Borwein pseudo-random number generator based on normal numbers, Computer physics communications, vol. 184, no. 8, pp. 1999-2004. G. Beliakov, M. Johnstone, D. Creighton, T. Wilkin, 2012, An efficient implementation of Bailey and Borwein's algorithm for parallel random number generation on graphics processing units, Computing 94(5): 433-447. G. Beliakov, M. Johnstone, D. Creighton, T. Wilkin, Parallel random variates generator for GPUs based on normal numbers, ArXiv 1206.1187 http://arxiv.org/abs/1206.1187, see also http://www.deakin.edu.au/~gleb/bcn_random.html Usage: Follow the examples presented here. bcnrand offers two alternative methods for generating random variates BCN and Combined. BCN is the method based on our modified Barrett reduction. Combined is the method that used BCN and an auxiliary LCG The user desires to generate a sequence of pseudorandom variates starting from a chosen position (seed), using a number of parallel threads. Each thread generates part of the sequence. The random variates are either written to global memory, or used for any task, such as simulation. Step 1 is to generate an array of starting positions for each thread (seeds). It is accomplished by calling dim3 dimBlock(numThreadsPerBlock, 1, 1); dim3 dimGrid(numBlocks, 1, 1); Kernel_initGenerator<<< dimGrid, dimBlock >>>(d_SeedData, workPerThread, seed); workPerThread is a parameter that specifies the length of the subsequence of random variates for each thread. It is calculated as workPerThread = TotalNumOfElements/numBlocks/numThreadsPerBlock; NOTE that the parameter "seed" is the starting position of the first element of the sequence Step 2 is to generate random variates. It is accomplished by calling kernels with this code: //get starting seed uint64_t seed = (uint64_t)d_SeedData[blockIdx.x * blockDim.x * blockDim.y + tid]; for (int i = 0 ; i < WorkPerThread ; i++) { generated_value=bcnrandom_inline(&seed); } bcnrandom_inline can be replaced by randCombined(&seed, &seed1); See example kernels Kernel_CountValues, (Kernel_CountValues_Combined), which are used to calculate the number of random variates smaller than 0.9. These kernels are invoked by calling Kernel_CountValues<<<dimGrid, dimBlock, dimBlock.x * sizeof(unsigned int)>>>(d_OutputData, d_SeedData, workPerThread); where d_SeedData are precomputed at Step 1 seeds, and d_OutputData is the array of unsigned int of length numblocks. The complete code is in function InlineGeneration This kernel will use the combined generator, which has better statistical properties Kernel_CountValues_Combined<<<dimGrid, dimBlock, dimBlock.x * sizeof(unsigned int)>>>(d_OutputData, d_SeedData, d_SeedData1, workPerThread); To write generated values to global memory, see function TimeBarrettMethod. List of functions: bcnrandom_inline, randCombined - two methods called by kernels that actually perform generation steps Kernel_CountValues, Kernel_CountValues_Combined - example kernels that count the number of generated values smaler than 0.9. They call their respective methods from the list above, and multiply the returned values by 3^-33, then use the result Kernel_initGenerator - initialises the array of seeds (for each thread) Kernel_initGeneratorCombined - initialises the two arrays of seeds (for the bcn and the auxiliary generator) (for each thread) InlineGeneration - shows how to use the functions above Kernel_Opt - example kernel that writes generated values to an array in global memory TimeBarrettMethod - shows how to use the example kernels and times its execution, then prints the results Example and main file: See file bcnrand.cu This program is freeware. Please cite our work Beliakov, G., Creighton, D., Johnstone, M. and Wilkin, T. 2013, Efficient implementation of Bailey and Borwein pseudo-random number generator based on normal numbers, Computer physics communications, vol. 184, no. 8, pp. 1999-2004. G. Beliakov, M. Johnstone, D. Creighton, T. Wilkin, 2012, An efficient implementation of Bailey and Borwein's algorithm for parallel random number generation on graphics processing units, Computing 94(5): 433-447. http://arxiv.org/abs/1206.1187, http://www.deakin.edu.au/~gleb/bcn_random.html J. Borwein and D. Bailey's work is available from: http://crd.lbl.gov/~dhbailey/dhbpapers/normal-pseudo.pdf Copyright Gleb Beliakov, Tim Wilkin and Michael Johnstone, 2013
About
Fast parallel random number generator for GPUs
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published