-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending Unicode superscript/subscript substitution to all formats #10591
Comments
My main worry is about the availability of the superscript glyphs in fonts. But I have no idea if this is a serious issue with modern fonts. |
Looking at this just a little, I find it extremely confusing. For example, there is a unicode code block for superscripts and subscripts. I experimented with fonts and found that quite a few of the fonts I use don't have the glyphs for superscripted letters, though a few do. |
Supercript digits 1-3 are in the Latin-1 Supplement block right after Basic Latin/ASCII. While the superscript/subscript digits are meant for general use most of the superscript letters and the few subscript letters are meant for phonetic transcription, as is evident from the many phonetic “special” letters among them. Also AFAIK not all Basic Latin letters have superscript equivalents, not to speak of other scripts, nor do they form regular upper/lower case pairs. It might possibly make sense to use superscript digits, which are well supported by many fonts, for footnote references in plain output, but note that you won’t find them if you search for regular digits, which IMO is a serious enough drawback to not do it. Anyway below is a (TSV) list of all the “Latin” superscript and subscript digits. Note that the first three are in another block and also out of order relative to the others and eachother! (Note how random support is in the font GitHub uses for code blocks! I use Noto Sans Mono in my terminal/Vim so I can see them all and more besides.) ² 2 U+00B2 SUPERSCRIPT TWO
³ 3 U+00B3 SUPERSCRIPT THREE
¹ 1 U+00B9 SUPERSCRIPT ONE
⁰ 0 U+2070 SUPERSCRIPT ZERO
⁴ 4 U+2074 SUPERSCRIPT FOUR
⁵ 5 U+2075 SUPERSCRIPT FIVE
⁶ 6 U+2076 SUPERSCRIPT SIX
⁷ 7 U+2077 SUPERSCRIPT SEVEN
⁸ 8 U+2078 SUPERSCRIPT EIGHT
⁹ 9 U+2079 SUPERSCRIPT NINE
₀ 0 U+2080 SUBSCRIPT ZERO
₁ 1 U+2081 SUBSCRIPT ONE
₂ 2 U+2082 SUBSCRIPT TWO
₃ 3 U+2083 SUBSCRIPT THREE
₄ 4 U+2084 SUBSCRIPT FOUR
₅ 5 U+2085 SUBSCRIPT FIVE
₆ 6 U+2086 SUBSCRIPT SIX
₇ 7 U+2087 SUBSCRIPT SEVEN
₈ 8 U+2088 SUBSCRIPT EIGHT
₉ 9 U+2089 SUBSCRIPT NINE |
Superscript minus U+207B ⁻ is also really useful for scientific notation, i.e. 4.3×10⁻⁵ |
Yes, Unicode added superscript/subscript characters for specific purposes over time, hence the variable font support. |
Note that we already do use unicode super/subscript digits in plain output. |
In #9437 I tried a related idea in HTML specifically and apart from the questionable value of adding Pandoc's 900th command line option the font coverage of superscript numbers in web-safe fonts (on my computer anyway) was spotty. |
If the goal is to have superscripts and subscripts that match the weight and size of the typeface, another way to achieve this is with the OpenType features For formats that allow you to activate font features, no modification to pandoc is necessary. For example, LaTeX has the ---
header-includes: |
```{=latex}
\usepackage{realscripts}
```
--- For other formats, you might need to use a filter to replace superscript/subscript with a class (defined in CSS) or custom style (defined in a reference document) that will activate the relevant OpenType feature. |
Pandoc has partial support for converting characters formatted as superscript or subscript to their Unicode equivalents, where possible:
pandoc/src/Text/Pandoc/Writers/Shared.hs
Line 443 in 1470b3a
This is applied to plain text only, but it would be helpful if it the list could include more characters (see http://unicode.org/reports/tr30/datafiles/SuperscriptFolding.txt and https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts) and if this functionality could be made available in all formats.
Using a native Unicode character better matches the weight and size of a typeface, while applied superscript formatting results in an overly light weight. You can see the difference here between added formatting (3a, 4o) and Unicode (3ª, 4º), and it would be very useful not to have to worry about encoding these differently. In addition, as noted in jgm/citeproc#147, Unicode superscripts are automatically converted to manual formatting, meaning they need to be replaced again if one cares about this.
This should probably be optional rather than modifying the default smart behaviour, since some fonts do not have a full set of Unicode superscripts.
The text was updated successfully, but these errors were encountered: