Skip to content

Commit

Permalink
improve auto-cleanup/add latex parsing
Browse files Browse the repository at this point in the history
  • Loading branch information
StrangeGirlMurph committed Feb 23, 2024
1 parent 0e09ffb commit f5f77da
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 12 deletions.
14 changes: 8 additions & 6 deletions docs/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,12 +91,14 @@ Whether or not to use the articles title instead of the selected text for the '{
Default: `false`

### Stop auto-cleanup of intros
Whether or not to stop auto-cleaning the articles intros for better readability. Many articles intros and mostly intros that contain any math are very poorly formatted the way the Wikipedia API returns them in plain-text. For example the intro of the [Total order](https://en.wikipedia.org/w/api.php?format=json&redirects=1&action=query&prop=extracts&exintro&explaintext&titles=Total%20order) with lots of weirdly placed empty spaces and linebreakes. Therefor every response gets cleaned up a bit for better readability with the following code:
Whether or not to stop auto-cleaning/parsing the articles intros for better readability. Many articles intros and mostly intros that contain any math are very poorly formatted the way the Wikipedia API returns them in plain-text. For example the intro of the [Total order](https://en.wikipedia.org/w/api.php?format=json&redirects=1&action=query&prop=extracts&exintro&explaintext&titles=Total%20order) with lots of weirdly placed empty spaces and linebreakes. Therefor every response gets cleaned up a bit for better readability with the following code:
```ts
intro.replaceAll("\\displaystyle ", "") // removes all occurrences of '\displaystyle'
.replaceAll("\n", "") // removes all line breaks
.replaceAll(/ +/g, " ") // reduces all spaces to a single space
.replaceAll(/\.\w/g, (e: string) => e.split("").join(" ")); // adds a space between any dot directly followed by a letter
intro
// turns all "{\displaystyle ... }" occurrences into a proper LaTeX equation.
.replaceAll(/{\\displaystyle [^\n]+}/g, (text: string) => "$" + text.slice(15,-1).trim() + "$")
.replaceAll("\n ", "") // removes all the unnecessary linebreakes
.replaceAll(/ \S /g, "") // removes the unicode characters that try to replace the LaTeX
.replaceAll(/ +/g, " ") // removes any left over whitespace
```


Expand All @@ -106,7 +108,7 @@ intro.replaceAll("\\displaystyle ", "") // removes all occurrences of '\displays

gets turned into

<pre style="white-space:pre-wrap;">In mathematics, a total order or linear order is a partial order in which any two elements are comparable. That is, a total order is a binary relation ≤ {\\leq } on some set X {X} , which satisfies the following for all a , b {a,b} and c {c} in X {X} : a ≤ a {a\\leq a} (reflexive). If a ≤ b {a\\leq b} and b ≤ c {b\\leq c} then a ≤ c {a\\leq c} (transitive). If a ≤ b {a\\leq b} and b ≤ a {b\\leq a} then a = b {a=b} (antisymmetric). a ≤ b {a\\leq b} or b ≤ a {b\\leq a} (strongly connected, formerly called total). Reflexivity (1.) already follows from connectedness (4.), but is required explicitly by many authors nevertheless, to indicate the kinship to partial orders. Total orders are sometimes also called simple, connex, or full orders. A set equipped with a total order is a totally ordered set; the terms simply ordered set, linearly ordered set, and loset are also used. The term chain is sometimes defined as a synonym of totally ordered set, but refers generally to some sort of totally ordered subsets of a given partially ordered set. An extension of a given partial order to a total order is called a linear extension of that partial order.</pre>
<pre style="white-space:pre-wrap;">In mathematics, a total order or linear order is a partial order in which any two elements are comparable. That is, a total order is a binary relation $\leq$ on some set $X$ , which satisfies the following for all $a,b$ and $c$ in $X$ :\n $a\leq a$ (reflexive).\nIf $a\leq b$ and $b\leq c$ then $a\leq c$ (transitive).\nIf $a\leq b$ and $b\leq a$ then $a=b$ (antisymmetric).\n $a\leq b$ or $b\leq a$ (strongly connected, formerly called total).Reflexivity (1.) already follows from connectedness (4.), but is required explicitly by many authors nevertheless, to indicate the kinship to partial orders.\nTotal orders are sometimes also called simple, connex, or full orders.A set equipped with a total order is a totally ordered set; the terms simply ordered set, linearly ordered set, and loset are also used. The term chain is sometimes defined as a synonym of totally ordered set, but refers generally to some sort of totally ordered subsets of a given partially ordered set.\nAn extension of a given partial order to a total order is called a linear extension of that partial order.</pre>


Default: `false`
Expand Down
4 changes: 2 additions & 2 deletions src/settings.ts
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ export class WikipediaSearchSettingTab extends PluginSettingTab {
}

addTemplateSettings(containerEl: HTMLElement) {
for (const [i, template] of this.settings.templates.entries()) {
for (let [i, template] of this.settings.templates.entries()) {
const isDefaultTemplate = i == 0;

let setting = new Setting(containerEl);
Expand All @@ -265,8 +265,8 @@ export class WikipediaSearchSettingTab extends PluginSettingTab {
setting.addText((text) => {
if (isDefaultTemplate) text.setDisabled(true);
return text
.setValue(isDefaultTemplate ? "Default Template" : template.name)
.setPlaceholder("Name")
.setValue(isDefaultTemplate ? "Default Template" : template.name)
.onChange(async (value) => {
template.name = value;
await this.plugin.saveSettings();
Expand Down
8 changes: 4 additions & 4 deletions src/utils/wikipediaAPI.ts
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,13 @@ export async function getArticleIntros(
if (!response.query) return [];

return sortResponsesByTitle(titles, Object.values(response.query.pages)).map((page: any) => {
const extract = page.extract.trim() ?? null;
const extract:string = page.extract.trim() ?? null;
if (extract && cleanup) {
return extract
.replaceAll("\\displaystyle ", "")
.replaceAll("\n", "")
.replaceAll(/{\\displaystyle [^\n]+}/g, (text: string) => "$" + text.slice(15,-1).trim() + "$")
.replaceAll("\n ", "")
.replaceAll(/ \S /g, "")
.replaceAll(/ +/g, " ")
.replaceAll(/\.\w/g, (text: string) => text.split("").join(" "));
}
return extract;
});
Expand Down

0 comments on commit f5f77da

Please sign in to comment.