Skip to content

TimeDateStamp in object files should be a deterministic hash #21095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dkorpel opened this issue Mar 27, 2025 · 4 comments · May be fixed by #21191
Open

TimeDateStamp in object files should be a deterministic hash #21095

dkorpel opened this issue Mar 27, 2025 · 4 comments · May be fixed by #21191
Labels
Compiler:Backend glue code, optimizer, code generation

Comments

@dkorpel
Copy link
Contributor

dkorpel commented Mar 27, 2025

#20985 once again highlighted the importance of reproducible builds. While investigating that, I noticed time-based differences in the built executables. That's because the backend writes a timestamp to the header:

time_t f_timedat = 0;
time(&f_timedat);
uint symtable_offset;
if (bigobj)
{
header.Sig1 = IMAGE_FILE_MACHINE_UNKNOWN;
header.Sig2 = 0xFFFF;
header.Version = 2;
header.Machine = I64 ? IMAGE_FILE_MACHINE_AMD64 : IMAGE_FILE_MACHINE_I386;
header.NumberOfSections = scnhdr_cnt;
header.TimeDateStamp = cast(uint)f_timedat;

After looking into what that field is even used for, I found:

https://devblogs.microsoft.com/oldnewthing/20180103-00/?p=97705

One of the changes to the Windows engineering system begun in Windows 10 is the move toward reproducible builds. This means that if you start
with the exact same source code, then you should finish with the exact same binary code.

The timestamp is really a unique ID that tells the loader, “The exports of this DLL have not changed since the last time anybody bound to it.” And a hash is a reproducible unique ID.

I think dmd should follow this example. Not just for MSCoff, also for ELF and MachObj.

@dkorpel dkorpel added the Compiler:Backend glue code, optimizer, code generation label Mar 27, 2025
@abulgit
Copy link
Contributor

abulgit commented Apr 3, 2025

I think I understand what's going on. In mscoffobj.d, the code uses time(&f_timedat) to grab the current time and stick it in the object file header. That's why we get different binaries when compiling the same code at different times.

I'd like to fix this, but I'm not sure which way to go. Like, we can just use a fixed value like 0 for the timestamp. This definitely makes builds deterministic. I tried this thing and compiled same code at different times, and the object file was the same hash.

abulk@MSI MINGW64 ~/OneDrive/Desktop/hash
$ certutil -hashfile prog1.obj SHA256
SHA256 hash of prog1.obj:
52b8e353dd0a057a1ddd44af9c1bdb7154bf14a27ccda82b71c856a5f9e9d4ff
CertUtil: -hashfile command completed successfully.

abulk@MSI MINGW64 ~/OneDrive/Desktop/hash
$ certutil -hashfile prog2.obj SHA256
SHA256 hash of prog2.obj:
52b8e353dd0a057a1ddd44af9c1bdb7154bf14a27ccda82b71c856a5f9e9d4ff
CertUtil: -hashfile command completed successfully.

But is it the right approach to tackle this issue or maybe like we need to create some kind of hash based on what's in the object file?

I would love to hear your suggestions.

@Geod24
Copy link
Member

Geod24 commented Apr 3, 2025

Just use TimeStampInfo from the frontend: #11035

@abulgit
Copy link
Contributor

abulgit commented Apr 3, 2025

@Geod24 Did u mean like this?

    time_t f_timedat;
    const(char)* source_date_epoch = getenv("SOURCE_DATE_EPOCH");
    if (source_date_epoch)
    {
        // Convert from string to time_t value
        f_timedat = cast(time_t)strtoul(source_date_epoch, null, 10);
    }

@dkorpel
Copy link
Contributor Author

dkorpel commented Apr 3, 2025

That was my first idea as well, but I don't like how you have to set an environment variable to make builds reproducible even if you don't use __TIMESTAMP__ and the like. Moreover, you'd manually need to change that variable along with your source changes or the loader won't behave correctly:

The timestamp is really a unique ID that tells the loader, “The exports of this DLL have not changed since the last time anybody bound to it"

I'd rather follow Microsoft's example here and hash the binary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compiler:Backend glue code, optimizer, code generation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants