Skip to content

gbeliako/bcnrand

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

    BCNRAND, last change 27/4/2014

	This is the C language parallel version of the library bcnrand, which  generates
	uniform random variates of type double on (0,1), using the linear
	Congruential Generator (LCG) by J. Borwein and D. Bailey. 

	There are two generators, the basic and the combined. The basic generator is of
	a good quality and has the period of approx 10^15. Its advantage is that it is
	very fast, twice as fast if compared to the very basic (and poor quality) standard 
	C function rand(). The combined generator is a bit slower (still faster than 
	rand()), but it has the period of 10^23, and it passes all the tests in BigCrush 
	suite of tests. The random numbers are not of cryptographic strength, but are well
	suited for numerical simulations, especially in parallel processes, where
	the sam sequence of random variate is obtained when using any number of threads.
	This is because of the valuable skip ahead feature. The seed is the starting 
	position in the sequence of random variates.

	The implementation is in the .h file, which means that only include statement 
	is needed to start using this generator.

	Random variates are generated in parallel on GPU (with Nvidia's CUDA).
	This is version 2 of this library. Version 2 no longer has the original 
	BCN method, as the method based on Barrett reduction is superior. This 
	version no longer uses double-double implementation of the seeding step, 
	as a more efficient implementation was found. 

 	The random variates are generated by iterating
	                       z_{k+1} = 2^53 z_k mod 3^33. 
	The actual random variate is computed by 
	                            x = z_{k+1} * 3^-33. 
	The period is approx 10^15. Random variates can be generated in parallel 
	and in multiple threads, although this version of the code is not always
	guaranteed to be thread safe, see notes below.

	A combined LCG is also used to ensure even better statistical quality
	of the generated sequence. The combined generator is marginally slower than
	J. Borwein and D. Bailey's LCG, but gives a longer period and passes
	Birthday spacings and Closed pairs tests as well.

	The pseudorandom sequence passes the statistical tests of BigCrunch 
	test of TestU01 suite. The description of the method is published in:

	Beliakov, G., Creighton, D., Johnstone, M. and Wilkin, T. 2013, Efficient 
	implementation  of Bailey and Borwein pseudo-random number generator based 
	on normal numbers, Computer physics communications, vol. 184, no. 8, pp. 1999-2004.

	G. Beliakov, M. Johnstone, D. Creighton, T. Wilkin, 2012, An efficient 
	implementation of  Bailey and Borwein's algorithm for parallel random number 
	generation on graphics processing units, Computing 94(5): 433-447.

	G. Beliakov, M. Johnstone, D. Creighton, T. Wilkin, Parallel random
	variates generator for GPUs based on normal numbers, ArXiv 1206.1187
	http://arxiv.org/abs/1206.1187,
	see also	http://www.deakin.edu.au/~gleb/bcn_random.html

	
	Usage: Follow the examples presented here.
	
	bcnrand offers two alternative methods for generating random variates 
	BCN and Combined. 

		BCN is the method based on our modified Barrett reduction.
		Combined is the method that used BCN and an auxiliary LCG 

		The user desires to generate a sequence of pseudorandom variates starting 
		from a chosen position (seed), using a number of parallel threads. Each 
		thread generates part of the sequence. The random variates are either 
		written to global memory, or used for any task, such as simulation.

     Step 1 is to generate an array of starting positions for each thread 
	 (seeds). It is accomplished by calling 
			dim3 dimBlock(numThreadsPerBlock, 1, 1);
			dim3 dimGrid(numBlocks, 1, 1);
			Kernel_initGenerator<<< dimGrid, dimBlock >>>(d_SeedData, workPerThread, seed);

		workPerThread is a parameter that specifies the length of the subsequence 
		of random variates for each thread.
		It is calculated as workPerThread = TotalNumOfElements/numBlocks/numThreadsPerBlock;
		NOTE that the parameter "seed" is the starting position of the first element of the sequence 

     Step 2 is to generate random variates. It is accomplished by calling kernels with this code:
			//get starting seed
			uint64_t seed = (uint64_t)d_SeedData[blockIdx.x * blockDim.x * blockDim.y + tid];
    			for (int i = 0 ; i < WorkPerThread ; i++)
   			{		
				generated_value=bcnrandom_inline(&seed);
        	}
		bcnrandom_inline can be replaced by randCombined(&seed, &seed1);	

		See example kernels Kernel_CountValues, (Kernel_CountValues_Combined), which 
		are used to calculate the number of random variates smaller than 0.9. These 
		kernels are invoked by calling
		
			Kernel_CountValues<<<dimGrid, dimBlock, dimBlock.x * sizeof(unsigned int)>>>(d_OutputData, d_SeedData, workPerThread);
		where d_SeedData are precomputed at Step 1 seeds, and d_OutputData is the array of 
		unsigned int of length numblocks. The complete code is in function InlineGeneration

		This kernel will use the combined generator, which has better statistical properties
		    Kernel_CountValues_Combined<<<dimGrid, dimBlock, dimBlock.x * sizeof(unsigned int)>>>(d_OutputData, d_SeedData, d_SeedData1, workPerThread);

	      To write generated values to global memory, see function TimeBarrettMethod. 


	List of functions:
		bcnrandom_inline, randCombined  - two methods called by kernels that actually 
		perform generation steps

		Kernel_CountValues, Kernel_CountValues_Combined - example kernels that count 
		the number of generated values smaler than 0.9. They call their respective methods from 
		the list above, and multiply the returned values by 3^-33, then use the result

		Kernel_initGenerator - initialises the array of seeds (for each thread)

		Kernel_initGeneratorCombined - initialises the two arrays of seeds 
		(for the bcn and the auxiliary generator) (for each thread)

		InlineGeneration - shows how to use the functions above

		Kernel_Opt   - example kernel that writes generated values to an array in global memory 

		TimeBarrettMethod - shows how to use the example kernels and times its execution, 
				then prints the results
			

	Example and main file:
		See file bcnrand.cu

	This program is freeware.


	Please cite our work
	Beliakov, G., Creighton, D., Johnstone, M. and Wilkin, T. 2013, Efficient implementation of Bailey and Borwein 
	pseudo-random number generator based on normal numbers, 
	Computer physics communications, vol. 184, no. 8, pp. 1999-2004.

	G. Beliakov, M. Johnstone, D. Creighton, T. Wilkin, 2012, An efficient implementation of 
	Bailey and Borwein's algorithm for parallel random number generation on graphics processing
	units, Computing 94(5): 433-447.

	http://arxiv.org/abs/1206.1187, http://www.deakin.edu.au/~gleb/bcn_random.html

	J. Borwein and D. Bailey's work is available from:
	http://crd.lbl.gov/~dhbailey/dhbpapers/normal-pseudo.pdf

	Copyright Gleb Beliakov, Tim Wilkin and Michael Johnstone, 2013






About

Fast parallel random number generator for GPUs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published