Skip to content

Commit a79e950

Browse files
author
William Song
committed
splash
1 parent 9353cac commit a79e950

File tree

294 files changed

+82709
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

294 files changed

+82709
-0
lines changed

splash2-1.0/.gitignore

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Object files
2+
*.o
3+
4+
# Libraries
5+
*.lib
6+
*.a
7+
8+
# Shared objects (inc. Windows DLLs)
9+
*.dll
10+
*.so
11+
*.so.*
12+
*.dylib
13+
14+
# Executables
15+
*.exe
16+
*.out
17+
*.app

splash2-1.0/README.SPLASH2

Lines changed: 349 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,349 @@
1+
Date: Oct 19, 1994
2+
3+
This is the directory for the second release of the Stanford Parallel
4+
Applications for Shared-Memory (SPLASH-2) programs. For further
5+
information contact [email protected].
6+
7+
PLEASE NOTE: Due to our limited resources, we will be unable to spend
8+
much time answering questions about the applications.
9+
10+
splash.tar contains the tared version of all the files. Grabbing this
11+
file will get you everything you need. We also keep the files
12+
individually untared for partial retrieval. The splash.tar file is not
13+
compressed, but the large files in it are. We attempted to compress the
14+
splash.tar file to reduce the file size further, but this resulted in
15+
a negative compression ratio.
16+
17+
18+
DIFFERENCES BETWEEN SPLASH AND SPLASH-2:
19+
----------------------------------------
20+
21+
The SPLASH-2 suite contains two types of codes: full applications and
22+
kernels. Each of the codes utilizes the Argonne National Laboratories
23+
(ANL) parmacs macros for parallel constructs. Unlike the codes in the
24+
original SPLASH release, each of the codes assumes the use of a
25+
"lightweight threads" model (which we hereafter refer to as the "threads"
26+
model) in which child processes share the same virtual address space as
27+
their parent process. In order for the codes to function correctly,
28+
the CREATE macro should call the proper Unix system routine (e.g. "sproc"
29+
in the Silicon Graphics IRIX operating system) instead of the "fork"
30+
routine that was used for SPLASH. The difference is that processes
31+
created with the Unix fork command receive their own private copies of
32+
all global variables. In the threads model, child processes share the
33+
same virtual address space, and hence all global data. Some of the
34+
codes function correctly when the Unix "fork" command is used for child
35+
process creation as well. Comments in the code header denote those
36+
applications which function correctly with "fork."
37+
38+
39+
MACROS:
40+
-------
41+
42+
Macros for the previous release of the SPLASH application suite can be
43+
obtained via anonymous ftp to www-flash.stanford.edu. The macros are
44+
contained in the pub/old_splash/splash/macros subdirectory. HOWEVER,
45+
THE MACRO FILES MUST BE MODIFIED IN ORDER TO BE USED WITH SPLASH-2 CODES.
46+
The CREATE macros must be changed so that they call the proper process
47+
creation routine (See DIFFERENCES section above) instead of "fork."
48+
49+
In this macros subdirectory, macros and sample makefiles are provided
50+
for three machines:
51+
52+
Encore Multimax (CMU Mach 2.5: C and Fortran)
53+
SGI 4D/240 (IRIX System V Release 3.3: C only)
54+
Alliant FX/8 (Alliant Rev. 5.0: C and Fortran)
55+
56+
These macros work for us with the above operating systems. Unfortunately,
57+
our limited resources prevent us from supporting them in any way or
58+
even fielding questions about them. If they don't work for you, please
59+
contact Argonne National Labs for a version that will. An e-mail address
60+
to try might be [email protected]. An excerpt from
61+
a message, received from Argonne, concerning obtaining the macros follows:
62+
63+
"The parmacs package is in the public domain. Approximately 15 people at
64+
Argonne (or associated with Argonne or students) have worked on the
65+
parmacs package at one time or another. The parmacs package is
66+
implemented via macros using the M4 macropreprocessor (standard on most
67+
Unix systems). Current distribution of the software is somewhat ad hoc.
68+
Most C versions can be obtained from netlib (send electronic mail to
69+
[email protected] with the message send index from parmacs). Fortran
70+
versions have been emailed directly or sent on tape. The primary
71+
documentation for the parmacs package is the book ``Portable Programs for
72+
Parallel Processors'' by Lusk, et al, Holt, Rinehart, and Winston 1987."
73+
74+
The makefiles provided in the individual program directories specify
75+
a null macro set that will turn the parallel programs into sequential
76+
ones. Note that we do not have a null macro set for FORTRAN.
77+
78+
79+
CODE ENHANCEMENTS:
80+
------------------
81+
82+
All of the codes are designed for shared address space multiprocessors
83+
with physically distributed main memory. For these types of machines,
84+
process migration and poor data distribution can decrease performance
85+
to suboptimal levels. In the applications, comments indicating potential
86+
enhancements can be found which will improve performance. Each potential
87+
enhancement is denoted by a comment beginning with "POSSIBLE ENHANCEMENT".
88+
The potential enhancements which we identify are:
89+
90+
(1) Data Distribution
91+
92+
Comments are placed in the code indicating where directives should
93+
be placed so that data can be migrated to the local memories of
94+
nodes, thus allowing for remote communication to be minimized.
95+
96+
(2) Process-to-Processor Assignment
97+
98+
Comments are placed in the code indicating where directives should
99+
be placed so that processes can be "pinned" to processors,
100+
preventing them from migrating from processor to processor.
101+
102+
In addition, to facilitate simulation studies, we note points in the
103+
codes where statistics gathering routines should be turned on so that
104+
cold-start and initialization effects can be avoided.
105+
106+
As previously mentioned, processes are assumed to be created through calls
107+
to a "threads" model creation routine. One important side effect is that
108+
this model causes all global variables to be shared (whereas the fork model
109+
causes all processes to get their own private copy of global variables).
110+
In order to mimic the behavior of global variables in the fork model, many
111+
of the applications provide arrays of structures that can be accessed by
112+
process ID, such as:
113+
114+
struct per_process_info {
115+
char pad[PAD_LENGTH];
116+
unsigned start_time;
117+
unsigned end_time;
118+
char pad[PAD_LENGTH];
119+
} PPI[MAX_PROCS];
120+
121+
In these structures, padding is inserted to ensure that the structure
122+
information associated with each process can be placed on a different
123+
page of memory, and can thus be explicitly migrated to that processor's
124+
local memory system. We follow this strategy for certain variables since
125+
these data really belong to a process and should be allocated in its local
126+
memory. A programming model that had the ability to declare global private
127+
data would have automatically ensured that these data were private, and
128+
that false sharing did not occur across different structures in the
129+
array. However, since the threads model does not provide this capability,
130+
it is provided by explicitly introducing arrays of structures with padding.
131+
The padding constants used in the programs (PAD_LENGTH in this example)
132+
can easily be changed to suit the particular characteristics of a given
133+
system. The actual data that is manipulated by individual applications
134+
(e.g. grid points, particle data, etc) is not padded, however.
135+
136+
Finally, for some applications we provide less-optimized versions of the
137+
codes. The less-optimized versions utilize data structures that lead to
138+
simpler implementations, but which do not allow for optimal data
139+
distribution (and can thus generate false-sharing).
140+
141+
142+
REPORT:
143+
-------
144+
145+
A report will be put together shortly describing the structure, function,
146+
and performance characteristics of each application. The report will be
147+
similar to the original SPLASH report (see the original report for the
148+
issues discussed). The report will provide quantitative data (for two
149+
different cache line size) for characteristics such as working set size
150+
and miss rates (local versus remote, etc.). In addition, the report
151+
will discuss cache behavior and synchronization behavior of the
152+
applications as well. In the mean time, each application directory has
153+
a README file that describes how to run each application. In addition,
154+
most applications have comments in their headers describing how to run
155+
each application.
156+
157+
158+
README FILES:
159+
-------------
160+
161+
Each application has an associated README file. It is VERY important to
162+
read these files carefully, as they discuss the important parameters to
163+
supply for each application, as well as other issues involved in running
164+
the programs. In each README file, we discuss the impact of explicitly
165+
distributing data on the Stanford DASH Multiprocessor. Unless otherwise
166+
specified, we assume that the default data distribution mechanism is
167+
through round-robin page allocation.
168+
169+
170+
PROBLEM SIZES:
171+
--------------
172+
173+
For each application, the README file describes a recommended problem
174+
size that is a reasonable base problem size that both can be simulated
175+
and is not too small for reality on a machine with up to 64 processors.
176+
For the purposes of studying algorithm performance, the parameters
177+
associated with each application can be varied. However, for the
178+
purposes of comparing machine architectures, the README files describe
179+
which parameters can be varied, and which should remain constant (or at
180+
their default values) for comparability. If the specific "base"
181+
parameters that are specified are not used, then results which are
182+
reported should explicitly state which parameters were changed, what
183+
their new values are, and address why they were changed.
184+
185+
186+
CORE PROGRAMS:
187+
--------------
188+
189+
Since the number of programs has increased over SPLASH, and since not
190+
everyone may be able to use all the programs in a given study, we
191+
identify some of the programs as "core" programs that should be used
192+
in most studies for comparability. In the currently available set, these
193+
core programs include:
194+
195+
(1) Ocean Simulation
196+
(2) Hierarchical Radiosity
197+
(3) Water Simulation with Spatial data structure
198+
(4) Barnes-Hut
199+
(5) FFT
200+
(6) Blocked Sparse Cholesky Factorization
201+
(7) Radix Sort
202+
203+
The less optimized versions of the programs, when provided, should be
204+
used only in addition to these.
205+
206+
207+
MAILING LIST:
208+
-------------
209+
210+
Please send a note to [email protected] if you have copied over
211+
the programs, so that we can put you on a mailing list for update reports.
212+
213+
214+
AUTHORSHIP:
215+
-----------
216+
217+
The applications provided in the SPLASH-2 suite were developed by a number
218+
of people. The report lists authors primarily responsible for the
219+
development of each application code. The codes were made ready for
220+
distribution and the README files were prepared by Steven Cameron Woo and
221+
Jaswinder Pal Singh.
222+
223+
224+
CODE CHANGES:
225+
-------------
226+
227+
If modifications are made to the codes which improve their performance,
228+
we would like to hear about them. Please send email to
229+
[email protected] detailing the changes.
230+
231+
232+
UPDATE REPORTS:
233+
---------------
234+
235+
Watch this file for information regarding changes to codes and additions
236+
to the application suite.
237+
238+
239+
CHANGES:
240+
-------
241+
242+
10-21-94: Ocean code, contiguous partitions, line 247 of slave1.C changed
243+
from
244+
245+
t2a[0][0] = hh3*t2a[0][0]+hh1*psi[procid][1][0][0];
246+
247+
to
248+
249+
t2a[0][0] = hh3*t2a[0][0]+hh1*t2c[0][0];
250+
251+
This change does not affect correctness; it is an optimization
252+
that was performed elsewhere in the code but overlooked here.
253+
254+
11-01-94: Barnes, file code_io.C, line 55 changed from
255+
256+
in_real(instr, tnow);
257+
258+
to
259+
260+
in_real(instr, &tnow);
261+
262+
11-01-94: Raytrace, file main.C, lines 216-223 changed from
263+
264+
if ((pid == 0) || (dostats))
265+
CLOCK(end);
266+
267+
gm->partime[0] = (end - begin) & 0x7FFFFFFF;
268+
if (pid == 0) gm->par_start_time = begin;
269+
270+
/* printf("Process %ld elapsed time %lu.\n", pid, lapsed); */
271+
272+
}
273+
274+
to
275+
276+
if ((pid == 0) || (dostats)) {
277+
CLOCK(end);
278+
gm->partime[pid] = (end - begin) & 0x7FFFFFFF;
279+
if (pid == 0) gm->par_start_time = begin;
280+
}
281+
282+
11-13-94: Raytrace, file memory.C
283+
284+
The use of the word MAIN_INITENV in a comment in memory.c causes
285+
m4 to expand this macro, and some implementations may get confused
286+
and generate the wrong C code.
287+
288+
11-13-94: Radiosity, file rad_main.C
289+
290+
rad_main.C uses the macro CREATE_LITE. All three instances of
291+
CREATE_LITE should be changed to CREATE.
292+
293+
11-13-94: Water-spatial and Water-nsquared, file makefile
294+
295+
makefiles were changed so that the compilation phases included the
296+
CFLAGS options instead of the CCOPTS options, which did not exist.
297+
298+
11-17-94: FMM, file particle.C
299+
300+
Comment regarding data distribution of particle_array data
301+
structure is incorrect. Round-robin allocation should be used.
302+
303+
11-18-94: OCEAN, contiguous partitions, files main.C and linkup.C
304+
305+
Eliminated a problem which caused non-doubleword aligned
306+
accesses to doublewords for the uniprocessor case.
307+
308+
main.C: Added lines 467-471:
309+
310+
if (nprocs%2 == 1) { /* To make sure that the actual data
311+
starts double word aligned, add an extra
312+
pointer */
313+
d_size += sizeof(double ***);
314+
}
315+
316+
Added same lines in file linkup.C at line numbers 100 and 159.
317+
318+
07-30-95: RADIX has been changed. A tree-structured parallel prefix
319+
computation is now used instead of a linear one.
320+
321+
LU had been modified. A comment describing how to distribute
322+
data (one of the POSSIBLE ENHANCEMENTS) was incorrect for the
323+
contiguous_blocks version of LU. Also, a modification was made
324+
that reduces false sharing at line 206 of lu.C:
325+
326+
last_malloc[i] = (double *) (((unsigned) last_malloc[i]) + PAGE_SIZE -
327+
((unsigned) last_malloc[i]) % PAGE_SIZE);
328+
329+
A subdirectory shmem_files was added under the codes directory.
330+
This directory contains a file that can be compiled on SGI machines
331+
which replaces the libsgi.a file distributed in the original SPLASH
332+
release.
333+
334+
09-26-95: Fixed a bug in LU. Line 201 was changed from
335+
336+
last_malloc[i] = (double *) G_MALLOC(proc_bytes[i])
337+
338+
to
339+
340+
last_malloc[i] = (double *) G_MALLOC(proc_bytes[i] + PAGE_SIZE)
341+
342+
Fixed similar bugs in WATER-NSQUARED and WATER-SPATIAL. Both
343+
codes needed a barrier added into the mdmain.C files. In both
344+
codes, the line
345+
346+
BARRIER(gl->start, NumProcs);
347+
348+
was added. In WATER-NSQUARED, it was added in mdmain.C at line
349+
84. In WATER-SPATIAL, it was added in mdmain.C at line 107.

splash2-1.0/README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
SPLASH-2_Benchmark
2+
=================
3+
4+
This repository is just another modified SPLASH-2 Benchmark source code repo, helping those who struggle to find sounded SPLASH-2 source code.
5+
6+
I have already patched the original SPLASH-2 benchmark using CAPSL patch. And now the benchmarks utilize pthread library.
7+
8+
The CAPSL Modified SPLASH-2 Home Page: http://www.capsl.udel.edu/splash/index.html
9+
10+
### Usage
11+
12+
1. enter the program directory, say `kernels/fft`
13+
2. `make`
14+
15+
### Build Dependency
16+
17+
+ m4
18+
+ gcc
19+
20+
### Customize the source
21+
22+
I suggest you read documentations attached in the repo. You can change the compiling procedure by modifying `Makefile.config` under `codes` directory.

0 commit comments

Comments
 (0)