-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Modular fp #119
Add Modular fp #119
Conversation
I found a few things: We are still using you can get rid of the calls to [I]shb-m1pro: MFC/src $ grep -iR 'dble' ./*
./post_process/m_global_parameters.f90: Re_trans = dble(trans)
./post_process/m_global_parameters.f90: + dble(ir - 1)*DLOG(R0mx/R0mn)/dble(Npt - 1)
./pre_process/m_global_parameters.fpp: Re_trans = dble(trans)
./pre_process/m_global_parameters.fpp: + dble(ir - 1)*DLOG(R0mx/R0mn)/dble(nb - 1)
./simulation/m_global_parameters.fpp: Re_trans = dble(trans)
./simulation/m_global_parameters.fpp: + dble(ir - 1)*DLOG(R0mx/R0mn)/dble(nb - 1) Please change these to lowercase exp/sqrt: [I]shb-m1pro: MFC/src $ grep -R 'EXP(' ./*
./post_process/m_global_parameters.f90: c3 = (CEXP(c2) - CEXP(-c2))/(CEXP(c2) + CEXP(-c2)) ! TANH(c2)
./post_process/m_global_parameters.f90: R0mn = 0.8_wp*EXP(-2.8_wp*sd)
./post_process/m_global_parameters.f90: R0mx = 0.2_wp*EXP(9.5_wp*sd) + 1._wp
./post_process/m_global_parameters.f90: R0(ir) = EXP(phi(ir))
./post_process/m_global_parameters.f90: tmp = EXP(-0.5_wp*(phi(ir)/sd)**2)/SQRT(2._wp*pi)/sd
./post_process/m_global_parameters.f90: tmp = EXP(-0.5_wp*(phi(1)/sd)**2)/SQRT(2._wp*pi)/sd
./post_process/m_global_parameters.f90: tmp = EXP(-0.5_wp*(phi(Npt)/sd)**2)/SQRT(2._wp*pi)/sd
./pre_process/m_global_parameters.fpp: c3 = (CEXP(c2) - CEXP(-c2))/(CEXP(c2) + CEXP(-c2)) ! TANH(c2)
./pre_process/m_global_parameters.fpp: R0mn = 0.8_wp*EXP(-2.8_wp*sd)
./pre_process/m_global_parameters.fpp: R0mx = 0.2_wp*EXP(9.5_wp*sd) + 1._wp
./pre_process/m_global_parameters.fpp: R0(ir) = EXP(phi(ir))
./pre_process/m_global_parameters.fpp: tmp = EXP(-0.5_wp*(phi(ir)/sd)**2)/SQRT(2._wp*pi)/sd
./pre_process/m_global_parameters.fpp: tmp = EXP(-0.5_wp*(phi(1)/sd)**2)/SQRT(2._wp*pi)/sd
./pre_process/m_global_parameters.fpp: tmp = EXP(-0.5_wp*(phi(nb)/sd)**2)/SQRT(2._wp*pi)/sd
./simulation/m_global_parameters.fpp: c3 = (CEXP(c2) - CEXP(-c2))/(CEXP(c2) + CEXP(-c2)) ! TANH(c2)
./simulation/m_global_parameters.fpp: R0mn = 0.8_wp*EXP(-2.8_wp*sd)
./simulation/m_global_parameters.fpp: R0mx = 0.2_wp*EXP(9.5_wp*sd) + 1._wp
./simulation/m_global_parameters.fpp: R0(ir) = EXP(phi(ir))
./simulation/m_global_parameters.fpp: tmp = EXP(-0.5_wp*(phi(ir)/sd)**2)/SQRT(2._wp*pi)/sd
./simulation/m_global_parameters.fpp: tmp = EXP(-0.5_wp*(phi(1)/sd)**2)/SQRT(2._wp*pi)/sd
./simulation/m_global_parameters.fpp: tmp = EXP(-0.5_wp*(phi(nb)/sd)**2)/SQRT(2._wp*pi)/sd And these: [I]shb-m1pro: MFC/src $ grep -R 'SQRT(' ./*
./common/m_helper.f90: ntmp = SQRT((4._wp*pi/3._wp)*nR3/vftmp)
./post_process/m_global_parameters.f90: uu = SQRT(pl0/rhol0)
./post_process/m_global_parameters.f90: phi_vn = (1._wp + SQRT(mu_v/mu_n)*(M_n/M_v)**(0.25_wp))**2 &
./post_process/m_global_parameters.f90: /(SQRT(8._wp)*SQRT(1._wp + M_v/M_n))
./post_process/m_global_parameters.f90: phi_nv = (1._wp + SQRT(mu_n/mu_v)*(M_v/M_n)**(0.25_wp))**2 &
./post_process/m_global_parameters.f90: /(SQRT(8._wp)*SQRT(1._wp + M_n/M_v))
./post_process/m_global_parameters.f90: omegaN = SQRT(3._wp*k_poly*Ca + 2._wp*(3._wp*k_poly - 1._wp)/(Web*R0))/R0
./post_process/m_global_parameters.f90: c2 = CSQRT(c1)
./post_process/m_global_parameters.f90: tmp = EXP(-0.5_wp*(phi(ir)/sd)**2)/SQRT(2._wp*pi)/sd
./post_process/m_global_parameters.f90: tmp = EXP(-0.5_wp*(phi(1)/sd)**2)/SQRT(2._wp*pi)/sd
./post_process/m_global_parameters.f90: tmp = EXP(-0.5_wp*(phi(Npt)/sd)**2)/SQRT(2._wp*pi)/sd
./pre_process/m_global_parameters.fpp: uu = SQRT(pl0/rhol0)
./pre_process/m_global_parameters.fpp: phi_vn = (1._wp + SQRT(mu_v/mu_n)*(M_n/M_v)**(0.25_wp))**2 &
./pre_process/m_global_parameters.fpp: /(SQRT(8._wp)*SQRT(1._wp + M_v/M_n))
./pre_process/m_global_parameters.fpp: phi_nv = (1._wp + SQRT(mu_n/mu_v)*(M_v/M_n)**(0.25_wp))**2 &
./pre_process/m_global_parameters.fpp: /(SQRT(8._wp)*SQRT(1._wp + M_n/M_v))
./pre_process/m_global_parameters.fpp: omegaN = SQRT(3._wp*k_poly*Ca + 2._wp*(3._wp*k_poly - 1._wp)/(Web*R0))/R0
./pre_process/m_global_parameters.fpp: c2 = CSQRT(c1)
./pre_process/m_global_parameters.fpp: tmp = EXP(-0.5_wp*(phi(ir)/sd)**2)/SQRT(2._wp*pi)/sd
./pre_process/m_global_parameters.fpp: tmp = EXP(-0.5_wp*(phi(1)/sd)**2)/SQRT(2._wp*pi)/sd
./pre_process/m_global_parameters.fpp: tmp = EXP(-0.5_wp*(phi(nb)/sd)**2)/SQRT(2._wp*pi)/sd
./pre_process/m_patches.f90: ! q_prim_vf(E_idx)%sf(i,j,0) = (1._wp * (10._wp ** 5)) + 1._wp/SQRT(x_cc(i)**2+y_cc(j)**2) &
./pre_process/m_patches.f90: ! radius_pressure = SQRT(x_cc(i)**2) ! 1D
./pre_process/m_patches.f90: ! radius_pressure = SQRT(x_cc(i)**2 + cart_y**2) ! 2D
./pre_process/m_patches.f90: ! radius_pressure = SQRT(x_cc(i)**2 + cart_y**2 + cart_z**2) ! 3D
./simulation/m_qbmm.fpp: c = SQRT(c)
./simulation/m_qbmm.fpp: fup(1) = bu - SQRT(c2)
./simulation/m_qbmm.fpp: fup(2) = bu + SQRT(c2)
./simulation/m_global_parameters.fpp: uu = SQRT(pl0/rhol0)
./simulation/m_global_parameters.fpp: phi_vn = (1._wp + SQRT(mu_v/mu_n)*(M_n/M_v)**(0.25_wp))**2 &
./simulation/m_global_parameters.fpp: /(SQRT(8._wp)*SQRT(1._wp + M_v/M_n))
./simulation/m_global_parameters.fpp: phi_nv = (1._wp + SQRT(mu_n/mu_v)*(M_v/M_n)**(0.25_wp))**2 &
./simulation/m_global_parameters.fpp: /(SQRT(8._wp)*SQRT(1._wp + M_n/M_v))
./simulation/m_global_parameters.fpp: omegaN = SQRT(3._wp*k_poly*Ca + 2._wp*(3._wp*k_poly - 1._wp)/(Web*R0))/R0
./simulation/m_global_parameters.fpp: c2 = CSQRT(c1)
./simulation/m_global_parameters.fpp: tmp = EXP(-0.5_wp*(phi(ir)/sd)**2)/SQRT(2._wp*pi)/sd
./simulation/m_global_parameters.fpp: tmp = EXP(-0.5_wp*(phi(1)/sd)**2)/SQRT(2._wp*pi)/sd
./simulation/m_global_parameters.fpp: tmp = EXP(-0.5_wp*(phi(nb)/sd)**2)/SQRT(2._wp*pi)/sd
./simulation/m_data_output.fpp: nbub = SQRT((4._wp*pi/3._wp)*nR3/alf)
./simulation/m_data_output.fpp: nbub = SQRT((4._wp*pi/3._wp)*nR3/alf)
./simulation/m_bubbles.fpp: c_liquid = SQRT(n_tait*(myP + B_tait)/(myRho*(1._wp - alf)))
./simulation/m_bubbles.fpp: f_cgas = SQRT(tmp + (fntait - 1._wp)*fH) |
@anshgupta1234 would you mind giving this PR a review as well, just to check for typos that might be around? I looked, but another set of eyes would be useful. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't find anything else. Looking good!
@sbryngelson What do you think about moving the contents of I wonder how NVFortran will handle precision selection because we specify
From https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-ref-guide/#cmdln-m-opts-optimize. It's unclear to me when and how this "promotion" happens. In any case, should this feature be made available to the build system? I feel like another way we could have implemented this would have been to use The benefit of the approach @belericant took is that you could elect not to use working precision for kernels that need double precision. It's modularity at the cost of verbosity. Note that we could still achieve this with the other approach by adding/removing compiler flags depending on what file is being compiled. One last remark, we may lose precision in a few places by being aggressive in the way we enforce working precision. Consider this example x_cc(i) = x_domain%beg + (5._wp * (10._wp ** -(1)))*dx*real(2*i + 1, wp)
|
@henryleberre I don't have particular preference about this, seems fine.
This current setup might be messing with @belericant 's tests, then, no? But to your main point: I think we will probably have to abandon any forced precision flags. The reasoning is that we now have fine-grained control, which is what I would like. I'm not sure even compiling each module with different flags would do the trick here because some "setup" routines like
@henryleberre If I understand correctly, there shouldn't be a meaningful difference as long as the computation is backward stable and not victim to catastrophic cancelation. |
is this ready for merge? @belericant |
we should also check performance on cpu and gpu, perhaps @wilfonba can pass along his script for that. i would also be curious to see the performance numbers on one of those cases using single precision on a V100 or A100 |
@wilfonba found issues exporting 3d runs in single precision but that isn't due to the modifications necessary to have modular working precision. I don't know if it makes more sense for that to be fixed in this PR or in a new PR. |
What happens if you use wp=single and precision (output) = 1 or 2, and if you use wp=double and precision = 1 or 2? Do any of these cases break? are they documented? @belericant @wilfonba |
@sbryngelson That's the logic I still need to add |
Ok I will merge once that is complete. |
Numerical Schlieren doesn't seem to work with single working precision. When I run with double working precision and save single precision silo files it's nearly uniformly 1 after the first 10 time steps, but if I run with single precision and save in single precision it's uniformly zero after the first 10 time steps. |
https://github.com/belericant/MFC/blob/4134a660e77c20e0639f5565e768768728caa39f/src/post_process/m_derived_variables.fpp#L573-L718 here is the relevant code. It could be sensitive to precision issues, as well as the value of |
@sbryngelson |
I don't know if these are reasonable or not, I haven't used the feature in a while. |
It's one of the example cases, so I'd assume they're reasonable, but I can run it longer to see |
…recision and silo precision on 1, 2, and 3 dimension.
@sbryngelson Silo output works for all combinations now. For a 3D shock droplet with 2 million cells on 2 V100's with case optimization enabled, I get the following times per time-step (over a 1000 time-step run) Double Precision: 3.288e-2 s/step (~109,500 steps/hr) for a 31.5% increase in steps/hr. I did have to comment out the calls to |
I've seen this recently as well. It doesn't build fftw when using GPUs (because I guess it finds cuFFT?) but it will build fftw for the post-processing code for CPU purposes.
I don't understand what you mean here. What options? We don't have silo golden files (though we do use h5dump to check for obviously incorrect hdf5/silo files).
This will have to be fixed of course.
Presumably, we would like the precision to be selected in the |
I have a check in m_checker that says single working precision and double precision silo files are incompatible because running a simulation in single precision and then requesting a double precision silo file doesn't make much sense. I could make it so that if single working precision is used and double silo precision is requested the single precision data is cast to double precision, but I think it makes more sense to just disallow that combination for practical purposes. I suppose allowing it would fix the testing problem.
I see how this would be desirable, but I don't think it's as straightforward as other case file options. All of the working precision differences appear to take effect at compile time, not run time, while all current case file options are run time. I'm guessing trying to change precision at run time would cause a bunch of run time errors, but I can check. (edit. since the case file goes through mfc.sh I think we could just have the toolchain catch this option and rebuild everything) I'll work on getting the test cases to pass given all the changes, and talk with Henry/Ansh about toolchain changes for testing. |
Came across a cool tool that takes math expressions and the expected ranges of each variable and outputs the best way to write the expression with the least loss of precision. May be useful if there are any calculations in the code that we might suspect to have problems with single precision. |
Seems like it could be useful! |
added ci added ci things2 things added single precision to test suite and generated golden files added working precision to case_dicts, testing, and documentation fixed code not passing using double working precision
Addresses issue #42