Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Developing alphabetical identifier for future MPID use #1178

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

esoteric-ephemera
Copy link
Collaborator

Defines a new AlphaID class that could eventually replace / contain the current MPID class.

From internal discussions, the benefit of the MPID system was brevity when the system was relatively new. "mp-149" is easy to remember whereas the current batch of MPIDs are > 3,000,000.

To replace / augment the current MPID system, we need an identifier that:

  • Can be sorted
  • Can mint an $N+1$ ID given that $N$ IDs have currently been assigned
  • Is easy to remember

From this, it was suggested that an alphabetical string (to avoid clashes with the current MPIDs, no numbers can be used) could be used instead. The integer value of this string would essentially be taken as base-26 representation, i.e.:

  • "a" = $0 \times 26^0 = 0$
  • "bc" = $1 \times 26^1 + 2 \times 26^0 = 28$
  • "aaft" = $0 \times 26^3 + 0 \times 26^2 + 5 \times 26^1 + 19 \times 26^0 = 149$

The current implementation supports these features, as well as addition/subtraction to obtain the sequentially next identifier, used when parsing. To make these easy to remember, we may want to set the pad length (the number of leading zeroes or "a" characters) to be at least 6, which would give us $26^6 = 308,915,776$ total task IDs (minus the ~3,100,000 that have currently been assigned).

Suggestions / discussion are welcome

@codecov-commenter
Copy link

codecov-commenter commented Jan 24, 2025

Codecov Report

Attention: Patch coverage is 94.11765% with 4 lines in your changes missing coverage. Please review.

Project coverage is 90.20%. Comparing base (200c970) to head (638453f).

Files with missing lines Patch % Lines
emmet-core/emmet/core/mpid.py 94.11% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1178      +/-   ##
==========================================
+ Coverage   90.18%   90.20%   +0.01%     
==========================================
  Files         147      147              
  Lines       14503    14570      +67     
==========================================
+ Hits        13080    13143      +63     
- Misses       1423     1427       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants