You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ This repository includes "high-confidence" pKa data digitized from three referen
13
13
-**Perrin**: International Union of Pure and Applied Chemistry, DD Perrin. *Dissociation Constants of Organic Bases in Aqueous Solution*; Butterworths, 1965
14
14
-**Perrin Supplement**: International Union of Pure and Applied Chemistry, DD Perrin. *Dissociation Constants of Organic Bases in Aqueous Solution, Supplement*; Butterworths, 1972
15
15
16
-
With permission from the copyright holder, the International Union of Pure and Applied Chemistry (IUPAC), the reference books were scanned, converted to digital data, checked for accuracy, and curated for accessibility, interoperability, and reusability between the dates of Friday, Sept. 10, 2021 and Thursday, Sept. 15, 2022 by Jonathan Zheng ([email protected]). Further processing and curation was performed between Jan. 8, 2024 to Feb. 22, 2024 by Jonathan Zheng and Olivier Lafontant-Joseph ([email protected]). Further updates were committed on: July 10th, 2024, and March 11th, 2025.
16
+
With permission from the copyright holder, the International Union of Pure and Applied Chemistry (IUPAC), the reference books were scanned, converted to digital data, checked for accuracy, and curated for accessibility, interoperability, and reusability between the dates of Friday, Sept. 10, 2021 and Thursday, Sept. 15, 2022 by Jonathan Zheng ([email protected]). Further processing and curation was performed between Jan. 8, 2024 to Feb. 22, 2024 by Jonathan Zheng and Olivier Lafontant-Joseph ([email protected]). Further updates were committed on: July 10th, 2024, and March 21st, 2025.
This material is available for reuse under a [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license with the following attribution: Reproduced by permission of International Union of Pure and Applied Chemistry.
This GitHub repository serves as a working copy for the dataset. Please refer to the Zenodo DOI badge linked at the top of this README.
47
47
@@ -62,7 +62,7 @@ This GitHub repository serves as a working copy for the dataset. Please refer to
62
62
63
63
Further data was collected that is not included in this dataset. "Low-confidence" data (i.e. only one source translating IUPAC name to SMILES; or different name translators gave different results) were excluded, as well as entries whose names could not be programatically translated, and dissociation types that could not be unambiguously assigned (i.e. pKaH versus pKa). Raw scans of the reference books are also excluded from this data set.
Please email [email protected]or post an issue on the GitHub page if any errors are discovered.
66
66
67
67
## Methodological Information
68
68
This dataset was obtained by scanning the reference books listed above, and then using OCR (Amazon Textract) to extract the images into text. After some light processsing of the intermediate data files, scripts were then written to process the text into an organized tabular format that resembles the layout of the textbook, to aid members of the Green Group at MIT manually check the digital data for fidelity to the original reference data. The digital tables were then processed again to extract the information to a convenient form for computational use, and to attempt to translate the IUPAC names to SMILES strings. This final format was then manually and programatically checked. *More information can be found in the report attached in the repository.*
@@ -193,7 +193,7 @@ The following table is necessary for this dataset to be indexed by
193
193
- International Union of Pure and Applied Chemistry', E. P Serjeant and Boyd Dempsey. 'Ionisation Constants of Organic Acids in Aqueous Solution'; Oxford/Pergamon, 1979 (Oxford IUPAC chemical data series)<br/>
194
194
- International Union of Pure and Applied Chemistry, DD Perrin. *Dissociation Constants of Organic Bases in Aqueous Solution*; Butterworths, 1965<br/>
195
195
- International Union of Pure and Applied Chemistry, DD Perrin. *Dissociation Constants of Organic Bases in Aqueous Solution, Supplement*; Butterworths, 1972<br/>
196
-
With permission from the copyright holder, the International Union of Pure and Applied Chemistry (IUPAC), the reference books were scanned, converted to digital data, checked for accuracy, and curated for accessibility, interoperability, and reusability between the dates of Friday, Sept. 10, 2021 and Thursday, Sept. 15, 2022 by Jonathan Zheng ([email protected]). Further processing and curation was performed between Jan. 8, 2024 to Feb. 22, 2024 by Jonathan Zheng and Olivier Lafontant-Joseph ([email protected]). Further updates were committed on: July 10th, 2024, and March 11th, 2025.
196
+
With permission from the copyright holder, the International Union of Pure and Applied Chemistry (IUPAC), the reference books were scanned, converted to digital data, checked for accuracy, and curated for accessibility, interoperability, and reusability between the dates of Friday, Sept. 10, 2021 and Thursday, Sept. 15, 2022 by Jonathan Zheng ([email protected]). Further processing and curation was performed between Jan. 8, 2024 to Feb. 22, 2024 by Jonathan Zheng and Olivier Lafontant-Joseph ([email protected]). Further updates were committed on: July 10th, 2024, and March 21st, 2025.
0 commit comments