From 5b01e8e90ab314448f23ccc6b77f52d2a552a2d8 Mon Sep 17 00:00:00 2001 From: Rasmus Kronberg Date: Thu, 11 Apr 2024 16:16:02 +0300 Subject: [PATCH] clear outputs of cells --- AI4Spec_Tutorial1.ipynb | 1968 +++++++++++++++++---------------------- AI4Spec_Tutorial2.ipynb | 734 +-------------- AI4Spec_Tutorial3.ipynb | 539 +---------- 3 files changed, 902 insertions(+), 2339 deletions(-) diff --git a/AI4Spec_Tutorial1.ipynb b/AI4Spec_Tutorial1.ipynb index b02f827..4745df9 100644 --- a/AI4Spec_Tutorial1.ipynb +++ b/AI4Spec_Tutorial1.ipynb @@ -1,1130 +1,846 @@ { - "cells": [ - { - "cell_type": "markdown", - "source": [ - "# AI4Spec 1: Predicting individual energy levels with kernel methods\n", - "\n", - "In these tutorials exercises, we review different AI approaches to predicting ground or excited state energy levels simply based on the atomic structure of materials. Here we focus on molecules and molecular orbitals, but the sample principles apply to crystal structures and electronic bands.\n", - "\n", - "We start by predicting individual electronic states (Tutorial 1), proceed towards predicting multiple states at once (Tutorial 2) and finish by predicting entire spectral curves (Tutorial 3). The tutorials were prepared by Milica Todorović (University of Turku) and Kunal Ghosh (Aalto University).\n", - "\n", - "\n", - "\n", - "In the exercise below, our AI objective is to train a model to predict a single electronic energy level, HOMO for example. This is a classic example of supervised machine learning regression, where the objective (label) is a single floating-point number. Many AI models can address this task, from decision trees, via kernel-based learning to neural networks. Here, we demonstrate the use of Kernel Ridge Regression, a relatively simple regression approach that has performed well on numerous prediction tasks in physics and chemistry **{Cite Mathias Rupp}**\n", - "\n", - "We use a large dataset of molecular structures and their computed HOMO energies to train the AI. Once the model is trained, it can be used as a tool: one can input a new molecule (previously unseen by the model) and instantly receive a value for its HOMO, within the expected error of the trained model.\n", - "\n", - "\n", - "\n", - "\n", - "\n" - ], - "metadata": { - "id": "Fl2hw46ADkH4" - } - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XegnO6aTwrmX" - }, - "source": [ - "## 1. HOMO energy prediction with kernel ridge regression\n", - "\n", - "\n", - "Here we will machine-learn the relationship between molecular structure (represented by the Coulomb matrix CM) and their HOMO (Highest Occupied Molecular Orbital) energy using KRR.\n", - "\n", - "This tutorial shows step by step how to load the data, visualize them, select the hyperparameters, train the model and validate it. We use the QM7 dataset of 7000 small organic molecules. The HOMO energies of all molecules were pre-computed with first principles quantum mechanical methods (DFT) to obtain the target data that our model can be trained on. Detailed descriptions and results for a similar dataset (QM9) can be found in [A. Stuke, et al. \"Chemical diversity in molecular orbital energy predictions with kernel ridge regression.\" J. Chem. Phys. 150. 204121 (2019)](https://aip.scitation.org/doi/10.1063/1.5086105).\n", - "\n", - "\n", - "## Setup" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "qdphsPPRwwNF" - }, - "outputs": [], - "source": [ - "from sklearn.model_selection import GridSearchCV\n", - "from sklearn.model_selection import cross_validate\n", - "from sklearn.kernel_ridge import KernelRidge\n", - "from sklearn.metrics import r2_score\n", - "\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "import pandas as pd\n", - "import seaborn as sns\n", - "import json\n", - "import math, random\n", - "from scipy.sparse import load_npz\n", - "from matplotlib.colors import LinearSegmentedColormap\n", - "import time" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hi4ISia7wrmc" - }, - "source": [ - "### Load and visualize data\n", - "\n", - "At first, we load the data.\n", - "\n", - "The input data x is an array that contains all 7000 molecules of the QM7 dataset, represented by their Coulomb matrices, which were computed with the [Dscribe](https://www.sciencedirect.com/science/article/pii/S0010465519303042?via%3Dihub) package.\n", - "\n", - "The output data y is a list that contains the corresponding (pre-computed) HOMO energies." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "jTZ_1hpqLBK5", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "a15f3c92-4b6c-453a-e0d6-1a9e91e72112" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "--2023-11-09 13:22:58-- https://github.com/fullmetalfelix/ML-CSC-tutorial/raw/master/data/qm7/cm.npz\n", - "Resolving github.com (github.com)... 140.82.113.3\n", - "Connecting to github.com (github.com)|140.82.113.3|:443... connected.\n", - "HTTP request sent, awaiting response... 302 Found\n", - "Location: https://raw.githubusercontent.com/fullmetalfelix/ML-CSC-tutorial/master/data/qm7/cm.npz [following]\n", - "--2023-11-09 13:22:58-- https://raw.githubusercontent.com/fullmetalfelix/ML-CSC-tutorial/master/data/qm7/cm.npz\n", - "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...\n", - "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 3638980 (3.5M) [application/octet-stream]\n", - "Saving to: ‘cm.npz’\n", - "\n", - "cm.npz 100%[===================>] 3.47M --.-KB/s in 0.1s \n", - "\n", - "2023-11-09 13:22:58 (33.9 MB/s) - ‘cm.npz’ saved [3638980/3638980]\n", - "\n", - "--2023-11-09 13:22:58-- https://raw.githubusercontent.com/fullmetalfelix/ML-CSC-tutorial/master/data/qm7/HOMO.txt\n", - "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.110.133, ...\n", - "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 54610 (53K) [text/plain]\n", - "Saving to: ‘HOMO.txt’\n", - "\n", - "HOMO.txt 100%[===================>] 53.33K --.-KB/s in 0.02s \n", - "\n", - "2023-11-09 13:22:58 (3.18 MB/s) - ‘HOMO.txt’ saved [54610/54610]\n", - "\n" - ] - } - ], - "source": [ - "!wget https://github.com/fullmetalfelix/ML-CSC-tutorial/raw/master/data/qm7/cm.npz\n", - "!wget https://raw.githubusercontent.com/fullmetalfelix/ML-CSC-tutorial/master/data/qm7/HOMO.txt" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6tTcc8Mtwrmd", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "baf6dc4d-80fb-4b36-f3ba-026677ad11ee" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Number of molecules: 6926\n" - ] - } - ], - "source": [ - "x = load_npz(\"cm.npz\").toarray()\n", - "y = np.genfromtxt(\"HOMO.txt\")\n", - "\n", - "print(\"Number of molecules:\", len(y))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "yWa4pRHxwrmf" - }, - "source": [ - "Print the Coulomb matrix of a random molecule in the dataset." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "7CovKOElwrmg", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "68c822c8-6e05-431d-8223-791fc3851287" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "[36.858 26.966 14.341 11.895 7.709 7.345 5.506 5.493 2.845 2.279\n", - " 1.861 1.677 1.227 1.045 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 26.966 36.858 23.934 17.032 10.47 9.284 2.828\n", - " 2.83 5.483 2.795 2.79 2.176 1.669 1.281 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 14.341 23.934 36.858 28.805\n", - " 14.417 11.301 1.711 2.169 2.714 5.446 5.448 2.834 2.293 1.506\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 11.895\n", - " 17.032 28.805 53.359 30.553 20.365 1.581 1.825 2.501 3.319 3.39\n", - " 6.889 3.399 2.748 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 7.709 10.47 14.417 30.553 36.858 32.739 1.095 1.175\n", - " 1.761 1.83 2.343 2.928 5.455 3.124 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 7.345 9.284 11.301 20.365 32.739\n", - " 53.359 1.085 1.132 1.606 1.574 1.822 2.628 3.422 6.802 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 5.506 2.828\n", - " 1.711 1.581 1.095 1.085 0.5 0.538 0.407 0.269 0.241 0.227\n", - " 0.176 0.155 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 5.493 2.83 2.169 1.825 1.175 1.132 0.538 0.5 0.324\n", - " 0.412 0.284 0.262 0.187 0.163 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 2.845 5.483 2.714 2.501 1.761 1.606\n", - " 0.407 0.324 0.5 0.322 0.389 0.32 0.286 0.218 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 2.279 2.795 5.446\n", - " 3.319 1.83 1.574 0.269 0.412 0.322 0.5 0.566 0.396 0.291\n", - " 0.217 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 1.861 2.79 5.448 3.39 2.343 1.822 0.241 0.284 0.389 0.566\n", - " 0.5 0.337 0.454 0.229 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 1.677 2.176 2.834 6.889 2.928 2.628 0.227\n", - " 0.262 0.32 0.396 0.337 0.5 0.338 0.418 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 1.227 1.669 2.293 3.399\n", - " 5.455 3.422 0.176 0.187 0.286 0.291 0.454 0.338 0.5 0.343\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.045\n", - " 1.281 1.506 2.748 3.124 6.802 0.155 0.163 0.218 0.217 0.229\n", - " 0.418 0.343 0.5 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n" - ] - } - ], - "source": [ - "rand_mol = random.randint(0, len(y))\n", - "\n", - "print(x[rand_mol])" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BKo-SoN2wrmh" - }, - "source": [ - "Visualize the Coulomb matrix of the random molecule." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "GJu2RtUhwrmh", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 502 - }, - "outputId": "1466b332-185d-4ad7-e2f6-c3696ceed6d5" - }, - "outputs": [ - { - "output_type": "display_data", - "data": { - "text/plain": [ - "
" - ] - }, - "metadata": {} - }, - { - "output_type": "display_data", - "data": { - "text/plain": [ - "
" - ], - "image/png": "\n" - }, - "metadata": {} - } - ], - "source": [ - "shape = (23, 23)\n", - "mat = x[rand_mol].reshape(shape)\n", - "\n", - "plt.figure()\n", - "plt.figure(figsize = (6,6))\n", - "plt.imshow(mat, origin=\"upper\", cmap='rainbow', vmin=-15, vmax=75, interpolation='nearest')\n", - "plt.colorbar(fraction=0.046, pad=0.04).ax.tick_params(labelsize=20)\n", - "plt.axis('off')\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "n434DnNSwrmi" - }, - "source": [ - "Note that many of the outputs are zero in the Coulomb matrix. This is because the number of non-zero outputs depends on the size of the molecule, and all smaller molecules contain zero padding.\n", - "\n", - "Visualize the target data by plotting the distribution of HOMO energies in the dataset:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "YxjygJgJwrmj", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 510 - }, - "outputId": "c0683fc7-7a3d-490a-e012-bc6e7210c3a5" - }, - "outputs": [ - { - "output_type": "display_data", - "data": { - "text/plain": [ - "
" - ], - "image/png": "\n" - }, - "metadata": {} - }, - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Mean value of HOMO energies in QM7 dataset: -5.66 eV\n" - ] - } - ], - "source": [ - "plt.hist(y, bins=20, density=False, facecolor='blue')\n", - "plt.xlabel(\"Energy [eV]\")\n", - "plt.ylabel(\"Number of molecules\")\n", - "plt.title(\"Distribution of HOMO energies\")\n", - "plt.show()\n", - "\n", - "## mean value of distribution\n", - "print(\"Mean value of HOMO energies in QM7 dataset: %0.2f eV\" %np.mean(y))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "QpgK9ky-wrmk" - }, - "source": [ - "Before dividing the dataset into training and test set, we shuffle the data. This is because data are often stored in a logical order (e.g., certain types of molecules grouped one after each other). Simply taking the first part for training and the second for testing would not result in a well trained model, since the training set would not represent the test data well (and vice versa)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "6luJyqCKwrmk" - }, - "outputs": [], - "source": [ - "## shuffle the data\n", - "\n", - "c = list(zip(x, y))\n", - "random.shuffle(c)\n", - "\n", - "x, y = zip(*c)\n", - "\n", - "x = np.array(x)\n", - "y = np.array(y)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XQh-MAv5wrml" - }, - "source": [ - "Now, we divide the data into training and test set." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "wmnl8ZzAwrmm" - }, - "outputs": [], - "source": [ - "# decide how many samples to take from the database for training and testing\n", - "n_train = 1000\n", - "n_test = 1000\n", - "\n", - "# split data in training and test\n", - "# take first n_train molecules for training\n", - "x_train = x[0:n_train]\n", - "y_train = y[0:n_train]\n", - "\n", - "# take the next n_test data for testing\n", - "x_test = x[n_train:n_train + n_test]\n", - "y_test = y[n_train:n_train + n_test]" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "uEP7iSchwrmm" - }, - "source": [ - "Check that the training data resemble the test data well by plotting the distribution of HOMO energies for both sets. The distributions should be centered around the same mean value and have the same shape." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "w_H7kOBTwrmn", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 504 - }, - "outputId": "14677f4e-42d1-4de3-8360-dbadedd01a3d" - }, - "outputs": [ - { - "output_type": "display_data", - "data": { - "text/plain": [ - "
" - ], - "image/png": "\n" - }, - "metadata": {} - }, - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Mean value of HOMO energies in training set: -5.66 eV\n", - "Mean value of HOMO energies in test set: -5.66 eV\n" - ] - } - ], - "source": [ - "plt.hist(y_test, bins=20, density=False, alpha=0.5, facecolor='red', label='test set')\n", - "plt.hist(y_train, bins=20, density=False, alpha=0.5, facecolor='gray', label='training set')\n", - "plt.xlabel(\"Energy [eV]\")\n", - "plt.ylabel(\"Number of molecules\")\n", - "plt.legend()\n", - "plt.show()\n", - "\n", - "## mean value of distributions\n", - "print(\"Mean value of HOMO energies in training set: %0.2f eV\" %np.mean(y_train))\n", - "print(\"Mean value of HOMO energies in test set: %0.2f eV\" %np.mean(y_test))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0FxS6Kazwrmo" - }, - "source": [ - "### Training\n", - "\n", - "In the training phase we use a kernel function to measure the distance between all pairs of molecules (represented by their Coulomb matrices) in the training set. We here employ one of two kernels, the Gaussian kernel or the Laplacian kernel. Both kernels have two hyperparameters: $\\alpha$ controls the penalty term and $\\gamma$ the kernel width.\n", - "\n", - "To find the optimal hyperparameters, we will do a grid search, i.e. we test the performance of a model trained with values of the hyperparameters that are spaced out on a grid in search space and choose the hyperparameters that yield the best performance.\n", - "\n", - "The model performance is quantified by splitting off part of the training set as validation set. We validate the model by making predictions on this validation set. This check of the model performance can be further refined with cross-validation, where the roles of training and validation sets alternate. The ratio can be varied, for example in 5-fold cross-validation, the training set is split in 5 equal parts. The model is trained on 80% of the data and validated on the other 20%. Then the roles of training and validation set rotate until each part has served as validation set exactly once.\n", - "\n", - "For our implementation, we will use the scikit-learn module.\n", - "\n", - "A more detailed optional explanation, which is not required for the solution of the exercise, can be found in the hidden cell below." - ] - }, - { - "cell_type": "markdown", - "source": [ - "The Gaussian kernel is given by\n", - "\n", - "\\begin{equation}\n", - "k_{Gaussian}(\\boldsymbol{x},\\boldsymbol{x}')=e^{-\\frac{||{\\boldsymbol{x}-\\boldsymbol{x}'}||_2^2}{2\\gamma^2}},\n", - "\\end{equation}\n", - "\n", - "which employs the Euclidean distance as similarity measure. The parameter $\\gamma$ is defined as $\\frac{1}{2\\sigma^2}$, where $\\sigma$ is the standard deviation of the Gaussian kernel (kernel width). The Laplacian kernel is given by\n", - "\n", - "\\begin{equation}\n", - " k_{Laplacian}(\\boldsymbol{x},\\boldsymbol{x}')=e^{-\\frac{||{\\boldsymbol{x}-\\boldsymbol{x}'}||_1}{\\gamma}},\n", - "\\end{equation}\n", - "\n", - "which uses the 1-norm as similarity measure. Here, $\\gamma$ is defined as $\\frac{1}{\\sigma}$, where $\\sigma$ is the kernel width of the Laplacian kernel.\n", - "\n", - "In the KRR training phase with $N$ training molecules, the machine learns the relationship between the molecules (represented by their Coulomb matrix) and their corresponding (pre-computed) HOMO energies. It does so by employing a function $f(\\boldsymbol{x})$ that maps a training molecule $\\boldsymbol{x}$ to its reference HOMO energy:\n", - "\n", - "\\begin{equation}\n", - "f(\\boldsymbol{x}) = \\sum_{i=1}^N \\omega_i k(\\boldsymbol{x}, \\boldsymbol{x}_i) = HOMO^{ref},\n", - "\\end{equation}\n", - "\n", - "For a given training molecule $\\boldsymbol{x}$, the distance to each molecule in the training set is computed by employing the kernel function $k$ (either Gaussian or Laplacian). Each kernel contribution (distance) is then weighted by a regression weight $\\omega_i$. The above function is thus given by the weighted sum of kernel contributions (sum over $N$ training molecules). The purpose of training is to fit the regression weight $\\omega_i$ so that HOMO$_{ref}$ is matched for each training molecule. In practice, the machine solves the minimization problem\n", - "\n", - "\n", - "\\begin{equation}\n", - " \\underset{\\omega}{min} \\sum_{i=1}^N (f(\\boldsymbol{x}_i) - HOMO^{ref}_i)^2 + \\alpha \\boldsymbol{\\omega}^T \\mathbf{K} \\boldsymbol{\\omega}.\n", - "\\end{equation}\n", - "\n", - "for a vector $\\boldsymbol{\\omega} \\in \\mathbb{R}^N = (\\omega_1, \\omega_2, ..., \\omega_N)$ of regression weights. In KRR, the penalty term $ \\alpha \\boldsymbol{\\omega}^T \\mathbf{K} \\boldsymbol{\\omega}$ is added to the minimization problem in order to avoid over- and underfitting. Overfitting occurs when the model learns the training data too well, even the noise and other unimportant details. The model is unable to generalize on unseen data and therefore yields high prediction errors on the test data. Underfitting occurs when the model is too simple and does not learn the training data at all, and therefore is not able to predict test data well either. Both behaviours can be avoided by tuning the parameter $\\alpha \\in \\left[0,1\\right]$ to a reasonable value. This has do be done separately from training. Both the regularization parameter $\\alpha$ and the kernel width $\\gamma$ are so called hyperparameters. Hyperparameters cannot be learned during training and have to be selected beforehand. However, it is not always obvious how to choose these hyperparameters and it often requires intuition or rules of thumb. We here employ a cross-validated grid search in order to find the best values for these two hyperparameters.\n", - "\n", - "In grid search, a part of the training set is split off as validation set. We set up a grid of pre-defined hyperparameter values and train the machine on the remaining training set, for each possible combination of $\\alpha$ and $\\gamma$ values. We validate each possible combination by making predictions on the validation set. The two hyperparameter values that yield the best performance (lowest error) are then selected for the final model to make predictions on the test set.\n", - "\n", - "In cross-validation, the roles of training and validation sets alternate. As described above, a part from the training set is split off as validation set. After training one combination of hyperparameters on the remaining training set and validating on the validation set, the validation set becomes the training set and vice versa, and the model is trained on the new training set and validated on the new validation set for the same combination of hyperparameters. The ratio can be varied, for example in 5-fold cross-validation, the training set is split in 5 equal parts. For each combination of hyperparameters, the model is trained on 80% of the data and validated on the other 20%. Then the roles of training and validation set rotate until each part has served as validation set exactly once. The final validation error for one particular combination of hyperparameters is computed as the mean from all 5 errors on the 5 validation sets. The combination with lowest average error is chosen for the final model.\n", - "\n", - "The cross-validated grid search routine is implemented in scikit-learn." - ], - "metadata": { - "id": "IqRlAI8983CM" - } - }, - { - "cell_type": "markdown", - "source": [ - "### KRR Code" - ], - "metadata": { - "id": "O5e2EytL8aDY" - } - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "hbcOapWDwrmp", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 778 - }, - "outputId": "1d28b29e-0eb8-4473-fa33-4ae2bd9ee3af" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Fitting 2 folds for each of 9 candidates, totalling 18 fits\n", - "[CV 1/2; 1/9] START alpha=0.0001, gamma=0.0001, kernel=laplacian................\n", - "[CV 1/2; 1/9] END alpha=0.0001, gamma=0.0001, kernel=laplacian;, score=-0.310 total time= 0.3s\n", - "[CV 2/2; 1/9] START alpha=0.0001, gamma=0.0001, kernel=laplacian................\n", - "[CV 2/2; 1/9] END alpha=0.0001, gamma=0.0001, kernel=laplacian;, score=-0.318 total time= 0.3s\n", - "[CV 1/2; 2/9] START alpha=0.0001, gamma=0.001, kernel=laplacian.................\n", - "[CV 1/2; 2/9] END alpha=0.0001, gamma=0.001, kernel=laplacian;, score=-0.313 total time= 0.3s\n", - "[CV 2/2; 2/9] START alpha=0.0001, gamma=0.001, kernel=laplacian.................\n", - "[CV 2/2; 2/9] END alpha=0.0001, gamma=0.001, kernel=laplacian;, score=-0.298 total time= 0.3s\n", - "[CV 1/2; 3/9] START alpha=0.0001, gamma=0.01, kernel=laplacian..................\n", - "[CV 1/2; 3/9] END alpha=0.0001, gamma=0.01, kernel=laplacian;, score=-1.578 total time= 0.3s\n", - "[CV 2/2; 3/9] START alpha=0.0001, gamma=0.01, kernel=laplacian..................\n", - "[CV 2/2; 3/9] END alpha=0.0001, gamma=0.01, kernel=laplacian;, score=-1.649 total time= 0.3s\n", - "[CV 1/2; 4/9] START alpha=0.001, gamma=0.0001, kernel=laplacian.................\n", - "[CV 1/2; 4/9] END alpha=0.001, gamma=0.0001, kernel=laplacian;, score=-0.305 total time= 0.3s\n", - "[CV 2/2; 4/9] START alpha=0.001, gamma=0.0001, kernel=laplacian.................\n", - "[CV 2/2; 4/9] END alpha=0.001, gamma=0.0001, kernel=laplacian;, score=-0.307 total time= 0.3s\n", - "[CV 1/2; 5/9] START alpha=0.001, gamma=0.001, kernel=laplacian..................\n", - "[CV 1/2; 5/9] END alpha=0.001, gamma=0.001, kernel=laplacian;, score=-0.313 total time= 0.4s\n", - "[CV 2/2; 5/9] START alpha=0.001, gamma=0.001, kernel=laplacian..................\n", - "[CV 2/2; 5/9] END alpha=0.001, gamma=0.001, kernel=laplacian;, score=-0.298 total time= 0.5s\n", - "[CV 1/2; 6/9] START alpha=0.001, gamma=0.01, kernel=laplacian...................\n", - "[CV 1/2; 6/9] END alpha=0.001, gamma=0.01, kernel=laplacian;, score=-1.579 total time= 0.5s\n", - "[CV 2/2; 6/9] START alpha=0.001, gamma=0.01, kernel=laplacian...................\n", - "[CV 2/2; 6/9] END alpha=0.001, gamma=0.01, kernel=laplacian;, score=-1.650 total time= 0.5s\n", - "[CV 1/2; 7/9] START alpha=0.01, gamma=0.0001, kernel=laplacian..................\n", - "[CV 1/2; 7/9] END alpha=0.01, gamma=0.0001, kernel=laplacian;, score=-0.301 total time= 0.5s\n", - "[CV 2/2; 7/9] START alpha=0.01, gamma=0.0001, kernel=laplacian..................\n", - "[CV 2/2; 7/9] END alpha=0.01, gamma=0.0001, kernel=laplacian;, score=-0.285 total time= 0.5s\n", - "[CV 1/2; 8/9] START alpha=0.01, gamma=0.001, kernel=laplacian...................\n", - "[CV 1/2; 8/9] END alpha=0.01, gamma=0.001, kernel=laplacian;, score=-0.313 total time= 0.5s\n", - "[CV 2/2; 8/9] START alpha=0.01, gamma=0.001, kernel=laplacian...................\n", - "[CV 2/2; 8/9] END alpha=0.01, gamma=0.001, kernel=laplacian;, score=-0.295 total time= 0.3s\n", - "[CV 1/2; 9/9] START alpha=0.01, gamma=0.01, kernel=laplacian....................\n", - "[CV 1/2; 9/9] END alpha=0.01, gamma=0.01, kernel=laplacian;, score=-1.586 total time= 0.3s\n", - "[CV 2/2; 9/9] START alpha=0.01, gamma=0.01, kernel=laplacian....................\n", - "[CV 2/2; 9/9] END alpha=0.01, gamma=0.01, kernel=laplacian;, score=-1.657 total time= 0.3s\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "GridSearchCV(cv=2, estimator=KernelRidge(),\n", - " param_grid=[{'alpha': array([0.0001, 0.001 , 0.01 ]),\n", - " 'gamma': array([0.0001, 0.001 , 0.01 ]),\n", - " 'kernel': ['laplacian']}],\n", - " scoring='neg_mean_absolute_error', verbose=1000)" - ], - "text/html": [ - "
GridSearchCV(cv=2, estimator=KernelRidge(),\n",
-              "             param_grid=[{'alpha': array([0.0001, 0.001 , 0.01  ]),\n",
-              "                          'gamma': array([0.0001, 0.001 , 0.01  ]),\n",
-              "                          'kernel': ['laplacian']}],\n",
-              "             scoring='neg_mean_absolute_error', verbose=1000)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" - ] - }, - "metadata": {}, - "execution_count": 10 - } - ], - "source": [ - "# set up grids for alpha and gamma hyperparameters.\n", - "# first value: lower bound; second value: upper bound;\n", - "# third value: number of points to evaluate (here set to '3' --> '-2', '-1' and '0' are evaluated)\n", - "# --> make sure to change third value as well when changing the bounds!\n", - "alpha = np.logspace(-4, -2, 3)\n", - "gamma = np.logspace(-4, -2, 3)\n", - "\n", - "cv_number = 2 ## choose into how many parts training set is divided for cross-validation\n", - "kernel = 'laplacian' # select kernel function here ('rbf': Gaussian kernel, 'laplacian': Laplacian kernel)\n", - "scoring_function = 'neg_mean_absolute_error' # it is called \"negative\" because scikit-learn interprets\n", - " # highest scoring value as best, but we want small errors\n", - "\n", - "## define settings for grid search routine in scikit-learn with above defined grids as input\n", - "\n", - "grid_search = GridSearchCV(KernelRidge(), #machine learning method (KRR here)\n", - " [{'kernel':[kernel],'alpha': alpha, 'gamma': gamma}],\n", - " cv = cv_number,\n", - " scoring = scoring_function,\n", - " verbose=1000) ## produces detailed output statements of grid search\n", - " # routine so we can see what is computed\n", - "\n", - "# call the fit function in scikit-learn which fits the Coulomb matrices in the training set\n", - "# to their corresponding HOMO energies.\n", - "grid_search.fit(x_train, y_train)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mTrweQ1Mwrmr" - }, - "source": [ - "### Grid search results\n", - "\n", - "Print out the average validation errors and corresponding hyperparameter combinations" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "066IjiWawrmr", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "de299c89-9c65-437c-c798-72bb63bd89e0" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "0.314 (+/-0.007) for {'alpha': 0.0001, 'gamma': 0.0001, 'kernel': 'laplacian'}\n", - "0.306 (+/-0.015) for {'alpha': 0.0001, 'gamma': 0.001, 'kernel': 'laplacian'}\n", - "1.613 (+/-0.071) for {'alpha': 0.0001, 'gamma': 0.01, 'kernel': 'laplacian'}\n", - "0.306 (+/-0.002) for {'alpha': 0.001, 'gamma': 0.0001, 'kernel': 'laplacian'}\n", - "0.306 (+/-0.015) for {'alpha': 0.001, 'gamma': 0.001, 'kernel': 'laplacian'}\n", - "1.614 (+/-0.071) for {'alpha': 0.001, 'gamma': 0.01, 'kernel': 'laplacian'}\n", - "0.293 (+/-0.016) for {'alpha': 0.01, 'gamma': 0.0001, 'kernel': 'laplacian'}\n", - "0.304 (+/-0.018) for {'alpha': 0.01, 'gamma': 0.001, 'kernel': 'laplacian'}\n", - "1.621 (+/-0.071) for {'alpha': 0.01, 'gamma': 0.01, 'kernel': 'laplacian'}\n" - ] - } - ], - "source": [ - "means = grid_search.cv_results_['mean_test_score']\n", - "stds = grid_search.cv_results_['std_test_score']\n", - "for mean, std, params in zip(-means, stds, grid_search.cv_results_['params']):\n", - " print(\"%0.3f (+/-%0.03f) for %r\" % (mean, std * 2, params))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "SrfhK2fTwrms" - }, - "source": [ - "Next, we visualize the grid search results by plotting a heatmap." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "T2swGFTLwrmt", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 488 - }, - "outputId": "e8b282a6-56fb-4a2e-93c0-bacf613f594e" - }, - "outputs": [ - { - "output_type": "display_data", - "data": { - "text/plain": [ - "
" - ], - "image/png": "\n" - }, - "metadata": {} - }, - { - "output_type": "stream", - "name": "stdout", - "text": [ - "The best combinations of parameters are {'alpha': 0.01, 'gamma': 0.0001, 'kernel': 'laplacian'} with a score of 0.293 eV on the validation set.\n" - ] - } - ], - "source": [ - "results = pd.DataFrame(grid_search.cv_results_)\n", - "#pd.DataFrame(grid_search.cv_results_)\n", - "\n", - "pvt = pd.pivot_table(results, values='mean_test_score',\n", - " index='param_gamma', columns='param_alpha')\n", - "heatmap = sns.heatmap(-pvt, annot=True, cmap='viridis', cbar_kws={'label': \"Mean absolute error [eV]\"})\n", - "figure = heatmap.get_figure()\n", - "plt.show()\n", - "\n", - "\n", - "print(\"The best combinations of parameters are %s with a score of %0.3f eV on the validation set.\"\n", - " % (grid_search.best_params_, -grid_search.best_score_))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-tv1CciKwrmu" - }, - "source": [ - "### Testing\n", - "\n", - "With the best combination of hyperparameters, the model is once again trained on the entire training set (this is done automatically in scikit-learn). Then, with the best combination of hyperparameters, predictions are made on the test set to evaluate the final model, which we will use for predictions.\n", - "\n", - "The mean absolute error of the predicted from the reference HOMO energies and $R^2$ score will be our measure for the quality of the fit.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "k8e9G1iNwrmv", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 504 - }, - "outputId": "42be6aaa-f279-436b-97d1-b4a137005585" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Mean absolute error on test set: 0.255 eV\n" - ] - }, - { - "output_type": "display_data", - "data": { - "text/plain": [ - "
" - ], - "image/png": "\n" - }, - "metadata": {} - }, - { - "output_type": "stream", - "name": "stdout", - "text": [ - "R^2 score on test set: 0.607\n" - ] - } - ], - "source": [ - "# predicted HOMO energies for all test molecules\n", - "\n", - "y_pred = grid_search.predict(x_test) # scikit-learn automatically takes the best combination\n", - " # of hyperparameters from grid search\n", - "\n", - "print(\"Mean absolute error on test set: %0.3f eV\" %(np.abs(y_pred-y_test)).mean())\n", - "\n", - "# do the regression plot\n", - "plt.plot(y_test, y_pred, 'o')\n", - "plt.plot([np.min(y_test),np.max(y_test)], [np.min(y_test),np.max(y_test)], '-')\n", - "plt.xlabel('reference HOMO energy [eV]')\n", - "plt.ylabel('predicted HOMO energy [eV]')\n", - "plt.show()\n", - "print(\"R^2 score on test set: %.3f\" % r2_score(y_test, y_pred))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "pYf2oHVbwrmw" - }, - "source": [ - "The $R^2$ score indicates how close the predicted energies in the test set are to the reference energies. The closer the points in the above figure are located to the diagonal, the better the predictions. Points on the diagonal (\"predicted energy\"=\"reference energy\") correspond to $R^2=1$. Therefore, $R^2$ values close to 1 indicate good model performance." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": true, - "id": "zcdZdkThwrmw" - }, - "source": [ - "###**Exercises**\n", - "\n", - "#### a. Grid search\n", - "\n", - "Increase the number and range of grid points used for grid search. Which combination of $\\alpha$ and $\\gamma$ works best? How does the computational time increase? Choose a reasonable number of grid points that don't take too long to evaluate." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "mqN4J1xVwrmx" - }, - "outputs": [], - "source": [ - "#Choose alpha and gamma on a finer logarithmically spaced grid. The other parameters can be chosen as above.\n", - "alpha = None\n", - "gamma = None\n", - "cv_number=None\n", - "kernel = None\n", - "scoring_function = None\n", - "\n", - "\n", - "grid_search = GridSearchCV(KernelRidge(), #machine learning method (KRR here)\n", - " [{'kernel':[kernel],'alpha': alpha, 'gamma': gamma}],\n", - " cv = cv_number,\n", - " scoring = scoring_function,\n", - " verbose=1000) ## produces detailed output statements of grid search routine\n", - "\n", - "grid_search.fit(x_train, y_train)\n", - "\n", - "results = pd.DataFrame(grid_search.cv_results_)\n", - "\n", - "pvt = pd.pivot_table(results, values='mean_test_score',\n", - " index='param_gamma', columns='param_alpha')\n", - "heatmap = sns.heatmap(-pvt, annot=True, cmap='viridis', cbar_kws={'label': \"Mean absolute error [eV]\"})\n", - "figure = heatmap.get_figure()\n", - "plt.show()\n", - "\n", - "print(\"The best combinations of parameters are %s with a score of %0.3f eV on the validation set.\"\n", - " % (grid_search.best_params_, -grid_search.best_score_))\n", - "\n", - "y_pred = grid_search.predict(x_test) # scikit-learn automatically takes the best combination of hyperparameters from grid search\n", - "\n", - "print(\"Mean absolute error on test set: %0.3f eV\" %(np.abs(y_pred-y_test)).mean())\n", - "\n", - "plt.plot(y_test, y_pred, 'o')\n", - "plt.plot([np.min(y_test),np.max(y_test)], [np.min(y_test),np.max(y_test)], '-')\n", - "plt.xlabel('reference HOMO energy [eV]')\n", - "plt.ylabel('predicted HOMO energy [eV]')\n", - "plt.show()\n", - "print(\"R^2 score on test set: %.3f\" % r2_score(y_test, y_pred))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Gsy36Rzowrmy" - }, - "source": [ - "#### b. Kernel function\n", - "\n", - "Use the Gaussian kernel instead of the Laplacian kernel. Which kernel leads to better model performance?" - ] - }, - { - "cell_type": "code", - "source": [ - "#Repeat the above calculation with the Gaussian kernel. The Gaussian kernel is identified with the string 'rbf'.\n", - "alpha = None\n", - "gamma = None\n", - "cv_number=None\n", - "kernel = None\n", - "scoring_function = None\n", - "\n", - "grid_search = GridSearchCV(KernelRidge(), #machine learning method (KRR here)\n", - " [{'kernel':[kernel],'alpha': alpha, 'gamma': gamma}],\n", - " cv = cv_number,\n", - " scoring = scoring_function,\n", - " verbose=1000) ## produces detailed output statements of grid search routine\n", - "\n", - "grid_search.fit(x_train, y_train)\n", - "\n", - "results = pd.DataFrame(grid_search.cv_results_)\n", - "\n", - "pvt = pd.pivot_table(results, values='mean_test_score',\n", - " index='param_gamma', columns='param_alpha')\n", - "heatmap = sns.heatmap(-pvt, annot=True, cmap='viridis', cbar_kws={'label': \"Mean absolute error [eV]\"})\n", - "figure = heatmap.get_figure()\n", - "plt.show()\n", - "\n", - "print(\"The best combinations of parameters are %s with a score of %0.3f eV on the validation set.\"\n", - " % (grid_search.best_params_, -grid_search.best_score_))\n", - "\n", - "y_pred = grid_search.predict(x_test) # scikit-learn automatically takes the best combination of hyperparameters from grid search\n", - "\n", - "print(\"Mean absolute error on test set: %0.3f eV\" %(np.abs(y_pred-y_test)).mean())\n", - "\n", - "plt.plot(y_test, y_pred, 'o')\n", - "plt.plot([np.min(y_test),np.max(y_test)], [np.min(y_test),np.max(y_test)], '-')\n", - "plt.xlabel('reference HOMO energy [eV]')\n", - "plt.ylabel('predicted HOMO energy [eV]')\n", - "plt.show()\n", - "print(\"R^2 score on test set: %.3f\" % r2_score(y_test, y_pred))" - ], - "metadata": { - "id": "srWUmjKapUMD" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "5yJUdzhewrmx" - }, - "source": [ - "#### c. Cross-validation\n", - "\n", - "For this exercise, choose the kernel that performed better in the previous exercise. Increase the number of folds used for cross-validation. Does the quality of the model increase? Take note as well of the increasing computational time and choose a number of folds that does not require too much computational time." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Jqy3rfT9wrmy" - }, - "outputs": [], - "source": [ - "#Repeat the above calculation with a higher number of cross-validation folds. Use the kernel that showed better performance.\n", - "alpha = None\n", - "gamma = None\n", - "cv_number=None\n", - "kernel = None\n", - "scoring_function = None\n", - "\n", - "grid_search = GridSearchCV(KernelRidge(), #machine learning method (KRR here)\n", - " [{'kernel':[kernel],'alpha': alpha, 'gamma': gamma}],\n", - " cv = cv_number,\n", - " scoring = scoring_function,\n", - " verbose=1000) ## produces detailed output statements of grid search routine\n", - "\n", - "grid_search.fit(x_train, y_train)\n", - "\n", - "results = pd.DataFrame(grid_search.cv_results_)\n", - "#pd.DataFrame(grid_search.cv_results_)\n", - "\n", - "pvt = pd.pivot_table(results, values='mean_test_score',\n", - " index='param_gamma', columns='param_alpha')\n", - "heatmap = sns.heatmap(-pvt, annot=True, cmap='viridis', cbar_kws={'label': \"Mean absolute error [eV]\"})\n", - "figure = heatmap.get_figure()\n", - "plt.show()\n", - "\n", - "print(\"The best combinations of parameters are %s with a score of %0.3f eV on the validation set.\"\n", - " % (grid_search.best_params_, -grid_search.best_score_))\n", - "\n", - "y_pred = grid_search.predict(x_test) # scikit-learn automatically takes the best combination of hyperparameters from grid search\n", - "\n", - "print(\"Mean absolute error on test set: %0.3f eV\" %(np.abs(y_pred-y_test)).mean())\n", - "\n", - "plt.plot(y_test, y_pred, 'o')\n", - "plt.plot([np.min(y_test),np.max(y_test)], [np.min(y_test),np.max(y_test)], '-')\n", - "plt.xlabel('reference HOMO energy [eV]')\n", - "plt.ylabel('predicted HOMO energy [eV]')\n", - "plt.show()\n", - "print(\"R^2 score on test set: %.3f\" % r2_score(y_test, y_pred))" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "jjoUSLWuwrmz" - }, - "source": [ - "#### d. Training set size\n", - "\n", - "Increase the size of the training set and plot the mean absolute error and $R^2$ score and training time on the test set as a function of the training set size (e.g. use 1000, 2000, 3000 etc. as training set size). For the MAE and $R^2$ values use a logarithmic time axis. Compare with the previous exercises.\n", - "\n", - "The optimal hyperparameter values for $\\alpha$ and $\\gamma$ can change throughout varying training set sizes. Therefore, when increasing the training set size, it is recommended to perform a cross-validated grid search for each training set size. For the sake of this exercise, we will limit ourselves to taking the optimal hyperparameters from the previous exercise as an approximation and not perform a grid search. Furthermore, we will not cross-validate our model in this exercise in the interest of time." - ] - }, - { - "cell_type": "code", - "source": [ - "#We keep the size of the test set constant in this exercise\n", - "n_test = 1000\n", - "\n", - "#We want to iterate over different training set sizes n_train.\n", - "#Prepare an iterator with reasonable choices of n_train.\n", - "n_train_iterator=range(None,None,None)\n", - "\n", - "#These lists we want to fill during the iteration with the mean absolute error,\n", - "#R^2 score and elapsed training time.\n", - "mae_list=[]\n", - "r2_list=[]\n", - "time_list=[]\n", - "\n", - "#In alpha and gamma, we want to save the best choice of parameters from\n", - "#the previous exercise. Compare with the exercise above, if you are unsure\n", - "#how to access them from grid_search.\n", - "alpha=None\n", - "gamma=None\n", - "\n", - "#Use the kernel that has shown better performance.\n", - "kernel=None\n", - "\n", - "for n_train in n_train_iterator:\n", - " x_train = x[0:n_train]\n", - " y_train = y[0:n_train]\n", - "\n", - " x_test = x[n_train:n_train + n_test]\n", - " y_test = y[n_train:n_train + n_test]\n", - "\n", - " #Here we save the starting time of the kernel ridge training.\n", - " start = time.time()\n", - "\n", - " #In the following we will perform the kernel ridge training without\n", - " #cross-validation. For this we use the Object KernelRidge, which\n", - " #has previously been used as the estimator in the cross-validation.\n", - " #We need to set alpha, gamma and the kernel. For documentation, see:\n", - " #https://scikit-learn.org/stable/modules/generated/sklearn.kernel_ridge.KernelRidge.html\n", - " kernel_ridge=None\n", - "\n", - " #After settung up the kernel_ridge object, we need to train with our\n", - " #training data using the method fit and make predictions on the test set\n", - " #using the method predict in analogy to the cross-validation example.\n", - "\n", - " #Fill in your answer here!\n", - "\n", - " #Here we save the ending time of the kernel ridge training.\n", - " end = time.time()\n", - "\n", - " #Compute from the starting and ending time, the elapsed training time and\n", - " #append it to the list\n", - " time_list.append(None)\n", - " #Append the mean absolute error to the following list. Confer with the\n", - " #previous exercise, if you are unsure how to compute it.\n", - " mae_list.append(None)\n", - " r2_list.append(r2_score(y_test, y_pred))\n", - "\n", - "#In the following plot the size of the training set versus the elapsed time,\n", - "#mean absolute error and R^2 score in three separate plots.\n", - "\n", - "#Fill in your answer here!\n", - "\n", - "#Here we plot the exact vs. the predicted HOMO energies for our largest training\n", - "#set size. Can you see the reduced error from this plot in comparison with\n", - "#the previous exercises?\n", - "plt.plot(y_test, y_pred, 'o')\n", - "plt.plot([np.min(y_test),np.max(y_test)], [np.min(y_test),np.max(y_test)], '-')\n", - "plt.xlabel('reference HOMO energy [eV]')\n", - "plt.ylabel('predicted HOMO energy [eV]')\n", - "plt.show()" - ], - "metadata": { - "id": "0o6HAqu4bYPb" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "#####If you did everything correctly, the plots should look something like the following. Note that because of the random data shuffling in the beginning, they will not look exactly the same.\n", - "\n", - "![image.png]()\n", - "\n", - "![image.png]()\n", - "\n", - "![image.png]()\n", - "\n", - "![image.png]()" - ], - "metadata": { - "id": "HNAUxo7IYwkX" - } - } - ], - "metadata": { - "anaconda-cloud": {}, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "Fl2hw46ADkH4" + }, + "source": [ + "# AI4Spec 1: Predicting individual energy levels with kernel methods\n", + "\n", + "In these tutorials exercises, we review different AI approaches to predicting ground or excited state energy levels simply based on the atomic structure of materials. Here we focus on molecules and molecular orbitals, but the sample principles apply to crystal structures and electronic bands.\n", + "\n", + "We start by predicting individual electronic states (Tutorial 1), proceed towards predicting multiple states at once (Tutorial 2) and finish by predicting entire spectral curves (Tutorial 3). The tutorials were prepared by Milica Todorović (University of Turku) and Kunal Ghosh (Aalto University).\n", + "\n", + "\n", + "\n", + "In the exercise below, our AI objective is to train a model to predict a single electronic energy level, HOMO for example. This is a classic example of supervised machine learning regression, where the objective (label) is a single floating-point number. Many AI models can address this task, from decision trees, via kernel-based learning to neural networks. Here, we demonstrate the use of Kernel Ridge Regression, a relatively simple regression approach that has performed well on numerous prediction tasks in physics and chemistry **{Cite Mathias Rupp}**\n", + "\n", + "We use a large dataset of molecular structures and their computed HOMO energies to train the AI. Once the model is trained, it can be used as a tool: one can input a new molecule (previously unseen by the model) and instantly receive a value for its HOMO, within the expected error of the trained model.\n", + "\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XegnO6aTwrmX" + }, + "source": [ + "## 1. HOMO energy prediction with kernel ridge regression\n", + "\n", + "\n", + "Here we will machine-learn the relationship between molecular structure (represented by the Coulomb matrix CM) and their HOMO (Highest Occupied Molecular Orbital) energy using KRR.\n", + "\n", + "This tutorial shows step by step how to load the data, visualize them, select the hyperparameters, train the model and validate it. We use the QM7 dataset of 7000 small organic molecules. The HOMO energies of all molecules were pre-computed with first principles quantum mechanical methods (DFT) to obtain the target data that our model can be trained on. Detailed descriptions and results for a similar dataset (QM9) can be found in [A. Stuke, et al. \"Chemical diversity in molecular orbital energy predictions with kernel ridge regression.\" J. Chem. Phys. 150. 204121 (2019)](https://aip.scitation.org/doi/10.1063/1.5086105).\n", + "\n", + "\n", + "## Setup" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "qdphsPPRwwNF" + }, + "outputs": [], + "source": [ + "from sklearn.model_selection import GridSearchCV\n", + "from sklearn.model_selection import cross_validate\n", + "from sklearn.kernel_ridge import KernelRidge\n", + "from sklearn.metrics import r2_score\n", + "\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import pandas as pd\n", + "import seaborn as sns\n", + "import json\n", + "import math, random\n", + "from scipy.sparse import load_npz\n", + "from matplotlib.colors import LinearSegmentedColormap\n", + "import time" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hi4ISia7wrmc" + }, + "source": [ + "### Load and visualize data\n", + "\n", + "At first, we load the data.\n", + "\n", + "The input data x is an array that contains all 7000 molecules of the QM7 dataset, represented by their Coulomb matrices, which were computed with the [Dscribe](https://www.sciencedirect.com/science/article/pii/S0010465519303042?via%3Dihub) package.\n", + "\n", + "The output data y is a list that contains the corresponding (pre-computed) HOMO energies." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { "colab": { - "provenance": [], - "collapsed_sections": [ - "0FxS6Kazwrmo" - ], - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.3" + "base_uri": "https://localhost:8080/" + }, + "id": "jTZ_1hpqLBK5", + "outputId": "a15f3c92-4b6c-453a-e0d6-1a9e91e72112" + }, + "outputs": [], + "source": [ + "!wget https://github.com/fullmetalfelix/ML-CSC-tutorial/raw/master/data/qm7/cm.npz\n", + "!wget https://raw.githubusercontent.com/fullmetalfelix/ML-CSC-tutorial/master/data/qm7/HOMO.txt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "6tTcc8Mtwrmd", + "outputId": "baf6dc4d-80fb-4b36-f3ba-026677ad11ee" + }, + "outputs": [], + "source": [ + "x = load_npz(\"cm.npz\").toarray()\n", + "y = np.genfromtxt(\"HOMO.txt\")\n", + "\n", + "print(\"Number of molecules:\", len(y))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yWa4pRHxwrmf" + }, + "source": [ + "Print the Coulomb matrix of a random molecule in the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "7CovKOElwrmg", + "outputId": "68c822c8-6e05-431d-8223-791fc3851287" + }, + "outputs": [], + "source": [ + "rand_mol = random.randint(0, len(y))\n", + "\n", + "print(x[rand_mol])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BKo-SoN2wrmh" + }, + "source": [ + "Visualize the Coulomb matrix of the random molecule." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 502 + }, + "id": "GJu2RtUhwrmh", + "outputId": "1466b332-185d-4ad7-e2f6-c3696ceed6d5" + }, + "outputs": [], + "source": [ + "shape = (23, 23)\n", + "mat = x[rand_mol].reshape(shape)\n", + "\n", + "plt.figure()\n", + "plt.figure(figsize = (6,6))\n", + "plt.imshow(mat, origin=\"upper\", cmap='rainbow', vmin=-15, vmax=75, interpolation='nearest')\n", + "plt.colorbar(fraction=0.046, pad=0.04).ax.tick_params(labelsize=20)\n", + "plt.axis('off')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "n434DnNSwrmi" + }, + "source": [ + "Note that many of the outputs are zero in the Coulomb matrix. This is because the number of non-zero outputs depends on the size of the molecule, and all smaller molecules contain zero padding.\n", + "\n", + "Visualize the target data by plotting the distribution of HOMO energies in the dataset:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 510 + }, + "id": "YxjygJgJwrmj", + "outputId": "c0683fc7-7a3d-490a-e012-bc6e7210c3a5" + }, + "outputs": [], + "source": [ + "plt.hist(y, bins=20, density=False, facecolor='blue')\n", + "plt.xlabel(\"Energy [eV]\")\n", + "plt.ylabel(\"Number of molecules\")\n", + "plt.title(\"Distribution of HOMO energies\")\n", + "plt.show()\n", + "\n", + "## mean value of distribution\n", + "print(\"Mean value of HOMO energies in QM7 dataset: %0.2f eV\" %np.mean(y))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QpgK9ky-wrmk" + }, + "source": [ + "Before dividing the dataset into training and test set, we shuffle the data. This is because data are often stored in a logical order (e.g., certain types of molecules grouped one after each other). Simply taking the first part for training and the second for testing would not result in a well trained model, since the training set would not represent the test data well (and vice versa)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6luJyqCKwrmk" + }, + "outputs": [], + "source": [ + "## shuffle the data\n", + "\n", + "c = list(zip(x, y))\n", + "random.shuffle(c)\n", + "\n", + "x, y = zip(*c)\n", + "\n", + "x = np.array(x)\n", + "y = np.array(y)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XQh-MAv5wrml" + }, + "source": [ + "Now, we divide the data into training and test set." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "wmnl8ZzAwrmm" + }, + "outputs": [], + "source": [ + "# decide how many samples to take from the database for training and testing\n", + "n_train = 1000\n", + "n_test = 1000\n", + "\n", + "# split data in training and test\n", + "# take first n_train molecules for training\n", + "x_train = x[0:n_train]\n", + "y_train = y[0:n_train]\n", + "\n", + "# take the next n_test data for testing\n", + "x_test = x[n_train:n_train + n_test]\n", + "y_test = y[n_train:n_train + n_test]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uEP7iSchwrmm" + }, + "source": [ + "Check that the training data resemble the test data well by plotting the distribution of HOMO energies for both sets. The distributions should be centered around the same mean value and have the same shape." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 504 + }, + "id": "w_H7kOBTwrmn", + "outputId": "14677f4e-42d1-4de3-8360-dbadedd01a3d" + }, + "outputs": [], + "source": [ + "plt.hist(y_test, bins=20, density=False, alpha=0.5, facecolor='red', label='test set')\n", + "plt.hist(y_train, bins=20, density=False, alpha=0.5, facecolor='gray', label='training set')\n", + "plt.xlabel(\"Energy [eV]\")\n", + "plt.ylabel(\"Number of molecules\")\n", + "plt.legend()\n", + "plt.show()\n", + "\n", + "## mean value of distributions\n", + "print(\"Mean value of HOMO energies in training set: %0.2f eV\" %np.mean(y_train))\n", + "print(\"Mean value of HOMO energies in test set: %0.2f eV\" %np.mean(y_test))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0FxS6Kazwrmo" + }, + "source": [ + "### Training\n", + "\n", + "In the training phase we use a kernel function to measure the distance between all pairs of molecules (represented by their Coulomb matrices) in the training set. We here employ one of two kernels, the Gaussian kernel or the Laplacian kernel. Both kernels have two hyperparameters: $\\alpha$ controls the penalty term and $\\gamma$ the kernel width.\n", + "\n", + "To find the optimal hyperparameters, we will do a grid search, i.e. we test the performance of a model trained with values of the hyperparameters that are spaced out on a grid in search space and choose the hyperparameters that yield the best performance.\n", + "\n", + "The model performance is quantified by splitting off part of the training set as validation set. We validate the model by making predictions on this validation set. This check of the model performance can be further refined with cross-validation, where the roles of training and validation sets alternate. The ratio can be varied, for example in 5-fold cross-validation, the training set is split in 5 equal parts. The model is trained on 80% of the data and validated on the other 20%. Then the roles of training and validation set rotate until each part has served as validation set exactly once.\n", + "\n", + "For our implementation, we will use the scikit-learn module.\n", + "\n", + "A more detailed optional explanation, which is not required for the solution of the exercise, can be found in the hidden cell below." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IqRlAI8983CM" + }, + "source": [ + "The Gaussian kernel is given by\n", + "\n", + "\\begin{equation}\n", + "k_{Gaussian}(\\boldsymbol{x},\\boldsymbol{x}')=e^{-\\frac{||{\\boldsymbol{x}-\\boldsymbol{x}'}||_2^2}{2\\gamma^2}},\n", + "\\end{equation}\n", + "\n", + "which employs the Euclidean distance as similarity measure. The parameter $\\gamma$ is defined as $\\frac{1}{2\\sigma^2}$, where $\\sigma$ is the standard deviation of the Gaussian kernel (kernel width). The Laplacian kernel is given by\n", + "\n", + "\\begin{equation}\n", + " k_{Laplacian}(\\boldsymbol{x},\\boldsymbol{x}')=e^{-\\frac{||{\\boldsymbol{x}-\\boldsymbol{x}'}||_1}{\\gamma}},\n", + "\\end{equation}\n", + "\n", + "which uses the 1-norm as similarity measure. Here, $\\gamma$ is defined as $\\frac{1}{\\sigma}$, where $\\sigma$ is the kernel width of the Laplacian kernel.\n", + "\n", + "In the KRR training phase with $N$ training molecules, the machine learns the relationship between the molecules (represented by their Coulomb matrix) and their corresponding (pre-computed) HOMO energies. It does so by employing a function $f(\\boldsymbol{x})$ that maps a training molecule $\\boldsymbol{x}$ to its reference HOMO energy:\n", + "\n", + "\\begin{equation}\n", + "f(\\boldsymbol{x}) = \\sum_{i=1}^N \\omega_i k(\\boldsymbol{x}, \\boldsymbol{x}_i) = HOMO^{ref},\n", + "\\end{equation}\n", + "\n", + "For a given training molecule $\\boldsymbol{x}$, the distance to each molecule in the training set is computed by employing the kernel function $k$ (either Gaussian or Laplacian). Each kernel contribution (distance) is then weighted by a regression weight $\\omega_i$. The above function is thus given by the weighted sum of kernel contributions (sum over $N$ training molecules). The purpose of training is to fit the regression weight $\\omega_i$ so that HOMO$_{ref}$ is matched for each training molecule. In practice, the machine solves the minimization problem\n", + "\n", + "\n", + "\\begin{equation}\n", + " \\underset{\\omega}{min} \\sum_{i=1}^N (f(\\boldsymbol{x}_i) - HOMO^{ref}_i)^2 + \\alpha \\boldsymbol{\\omega}^T \\mathbf{K} \\boldsymbol{\\omega}.\n", + "\\end{equation}\n", + "\n", + "for a vector $\\boldsymbol{\\omega} \\in \\mathbb{R}^N = (\\omega_1, \\omega_2, ..., \\omega_N)$ of regression weights. In KRR, the penalty term $ \\alpha \\boldsymbol{\\omega}^T \\mathbf{K} \\boldsymbol{\\omega}$ is added to the minimization problem in order to avoid over- and underfitting. Overfitting occurs when the model learns the training data too well, even the noise and other unimportant details. The model is unable to generalize on unseen data and therefore yields high prediction errors on the test data. Underfitting occurs when the model is too simple and does not learn the training data at all, and therefore is not able to predict test data well either. Both behaviours can be avoided by tuning the parameter $\\alpha \\in \\left[0,1\\right]$ to a reasonable value. This has do be done separately from training. Both the regularization parameter $\\alpha$ and the kernel width $\\gamma$ are so called hyperparameters. Hyperparameters cannot be learned during training and have to be selected beforehand. However, it is not always obvious how to choose these hyperparameters and it often requires intuition or rules of thumb. We here employ a cross-validated grid search in order to find the best values for these two hyperparameters.\n", + "\n", + "In grid search, a part of the training set is split off as validation set. We set up a grid of pre-defined hyperparameter values and train the machine on the remaining training set, for each possible combination of $\\alpha$ and $\\gamma$ values. We validate each possible combination by making predictions on the validation set. The two hyperparameter values that yield the best performance (lowest error) are then selected for the final model to make predictions on the test set.\n", + "\n", + "In cross-validation, the roles of training and validation sets alternate. As described above, a part from the training set is split off as validation set. After training one combination of hyperparameters on the remaining training set and validating on the validation set, the validation set becomes the training set and vice versa, and the model is trained on the new training set and validated on the new validation set for the same combination of hyperparameters. The ratio can be varied, for example in 5-fold cross-validation, the training set is split in 5 equal parts. For each combination of hyperparameters, the model is trained on 80% of the data and validated on the other 20%. Then the roles of training and validation set rotate until each part has served as validation set exactly once. The final validation error for one particular combination of hyperparameters is computed as the mean from all 5 errors on the 5 validation sets. The combination with lowest average error is chosen for the final model.\n", + "\n", + "The cross-validated grid search routine is implemented in scikit-learn." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "O5e2EytL8aDY" + }, + "source": [ + "### KRR Code" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 778 + }, + "id": "hbcOapWDwrmp", + "outputId": "1d28b29e-0eb8-4473-fa33-4ae2bd9ee3af" + }, + "outputs": [], + "source": [ + "# set up grids for alpha and gamma hyperparameters.\n", + "# first value: lower bound; second value: upper bound;\n", + "# third value: number of points to evaluate (here set to '3' --> '-2', '-1' and '0' are evaluated)\n", + "# --> make sure to change third value as well when changing the bounds!\n", + "alpha = np.logspace(-4, -2, 3)\n", + "gamma = np.logspace(-4, -2, 3)\n", + "\n", + "cv_number = 2 ## choose into how many parts training set is divided for cross-validation\n", + "kernel = 'laplacian' # select kernel function here ('rbf': Gaussian kernel, 'laplacian': Laplacian kernel)\n", + "scoring_function = 'neg_mean_absolute_error' # it is called \"negative\" because scikit-learn interprets\n", + " # highest scoring value as best, but we want small errors\n", + "\n", + "## define settings for grid search routine in scikit-learn with above defined grids as input\n", + "\n", + "grid_search = GridSearchCV(KernelRidge(), #machine learning method (KRR here)\n", + " [{'kernel':[kernel],'alpha': alpha, 'gamma': gamma}],\n", + " cv = cv_number,\n", + " scoring = scoring_function,\n", + " verbose=1000) ## produces detailed output statements of grid search\n", + " # routine so we can see what is computed\n", + "\n", + "# call the fit function in scikit-learn which fits the Coulomb matrices in the training set\n", + "# to their corresponding HOMO energies.\n", + "grid_search.fit(x_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mTrweQ1Mwrmr" + }, + "source": [ + "### Grid search results\n", + "\n", + "Print out the average validation errors and corresponding hyperparameter combinations" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "066IjiWawrmr", + "outputId": "de299c89-9c65-437c-c798-72bb63bd89e0" + }, + "outputs": [], + "source": [ + "means = grid_search.cv_results_['mean_test_score']\n", + "stds = grid_search.cv_results_['std_test_score']\n", + "for mean, std, params in zip(-means, stds, grid_search.cv_results_['params']):\n", + " print(\"%0.3f (+/-%0.03f) for %r\" % (mean, std * 2, params))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SrfhK2fTwrms" + }, + "source": [ + "Next, we visualize the grid search results by plotting a heatmap." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 488 + }, + "id": "T2swGFTLwrmt", + "outputId": "e8b282a6-56fb-4a2e-93c0-bacf613f594e" + }, + "outputs": [], + "source": [ + "results = pd.DataFrame(grid_search.cv_results_)\n", + "#pd.DataFrame(grid_search.cv_results_)\n", + "\n", + "pvt = pd.pivot_table(results, values='mean_test_score',\n", + " index='param_gamma', columns='param_alpha')\n", + "heatmap = sns.heatmap(-pvt, annot=True, cmap='viridis', cbar_kws={'label': \"Mean absolute error [eV]\"})\n", + "figure = heatmap.get_figure()\n", + "plt.show()\n", + "\n", + "\n", + "print(\"The best combinations of parameters are %s with a score of %0.3f eV on the validation set.\"\n", + " % (grid_search.best_params_, -grid_search.best_score_))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-tv1CciKwrmu" + }, + "source": [ + "### Testing\n", + "\n", + "With the best combination of hyperparameters, the model is once again trained on the entire training set (this is done automatically in scikit-learn). Then, with the best combination of hyperparameters, predictions are made on the test set to evaluate the final model, which we will use for predictions.\n", + "\n", + "The mean absolute error of the predicted from the reference HOMO energies and $R^2$ score will be our measure for the quality of the fit.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 504 + }, + "id": "k8e9G1iNwrmv", + "outputId": "42be6aaa-f279-436b-97d1-b4a137005585" + }, + "outputs": [], + "source": [ + "# predicted HOMO energies for all test molecules\n", + "\n", + "y_pred = grid_search.predict(x_test) # scikit-learn automatically takes the best combination\n", + " # of hyperparameters from grid search\n", + "\n", + "print(\"Mean absolute error on test set: %0.3f eV\" %(np.abs(y_pred-y_test)).mean())\n", + "\n", + "# do the regression plot\n", + "plt.plot(y_test, y_pred, 'o')\n", + "plt.plot([np.min(y_test),np.max(y_test)], [np.min(y_test),np.max(y_test)], '-')\n", + "plt.xlabel('reference HOMO energy [eV]')\n", + "plt.ylabel('predicted HOMO energy [eV]')\n", + "plt.show()\n", + "print(\"R^2 score on test set: %.3f\" % r2_score(y_test, y_pred))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pYf2oHVbwrmw" + }, + "source": [ + "The $R^2$ score indicates how close the predicted energies in the test set are to the reference energies. The closer the points in the above figure are located to the diagonal, the better the predictions. Points on the diagonal (\"predicted energy\"=\"reference energy\") correspond to $R^2=1$. Therefore, $R^2$ values close to 1 indicate good model performance." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true, + "id": "zcdZdkThwrmw", + "jupyter": { + "outputs_hidden": true } + }, + "source": [ + "###**Exercises**\n", + "\n", + "#### a. Grid search\n", + "\n", + "Increase the number and range of grid points used for grid search. Which combination of $\\alpha$ and $\\gamma$ works best? How does the computational time increase? Choose a reasonable number of grid points that don't take too long to evaluate." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mqN4J1xVwrmx" + }, + "outputs": [], + "source": [ + "#Choose alpha and gamma on a finer logarithmically spaced grid. The other parameters can be chosen as above.\n", + "alpha = None\n", + "gamma = None\n", + "cv_number=None\n", + "kernel = None\n", + "scoring_function = None\n", + "\n", + "\n", + "grid_search = GridSearchCV(KernelRidge(), #machine learning method (KRR here)\n", + " [{'kernel':[kernel],'alpha': alpha, 'gamma': gamma}],\n", + " cv = cv_number,\n", + " scoring = scoring_function,\n", + " verbose=1000) ## produces detailed output statements of grid search routine\n", + "\n", + "grid_search.fit(x_train, y_train)\n", + "\n", + "results = pd.DataFrame(grid_search.cv_results_)\n", + "\n", + "pvt = pd.pivot_table(results, values='mean_test_score',\n", + " index='param_gamma', columns='param_alpha')\n", + "heatmap = sns.heatmap(-pvt, annot=True, cmap='viridis', cbar_kws={'label': \"Mean absolute error [eV]\"})\n", + "figure = heatmap.get_figure()\n", + "plt.show()\n", + "\n", + "print(\"The best combinations of parameters are %s with a score of %0.3f eV on the validation set.\"\n", + " % (grid_search.best_params_, -grid_search.best_score_))\n", + "\n", + "y_pred = grid_search.predict(x_test) # scikit-learn automatically takes the best combination of hyperparameters from grid search\n", + "\n", + "print(\"Mean absolute error on test set: %0.3f eV\" %(np.abs(y_pred-y_test)).mean())\n", + "\n", + "plt.plot(y_test, y_pred, 'o')\n", + "plt.plot([np.min(y_test),np.max(y_test)], [np.min(y_test),np.max(y_test)], '-')\n", + "plt.xlabel('reference HOMO energy [eV]')\n", + "plt.ylabel('predicted HOMO energy [eV]')\n", + "plt.show()\n", + "print(\"R^2 score on test set: %.3f\" % r2_score(y_test, y_pred))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Gsy36Rzowrmy" + }, + "source": [ + "#### b. Kernel function\n", + "\n", + "Use the Gaussian kernel instead of the Laplacian kernel. Which kernel leads to better model performance?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "srWUmjKapUMD" + }, + "outputs": [], + "source": [ + "#Repeat the above calculation with the Gaussian kernel. The Gaussian kernel is identified with the string 'rbf'.\n", + "alpha = None\n", + "gamma = None\n", + "cv_number=None\n", + "kernel = None\n", + "scoring_function = None\n", + "\n", + "grid_search = GridSearchCV(KernelRidge(), #machine learning method (KRR here)\n", + " [{'kernel':[kernel],'alpha': alpha, 'gamma': gamma}],\n", + " cv = cv_number,\n", + " scoring = scoring_function,\n", + " verbose=1000) ## produces detailed output statements of grid search routine\n", + "\n", + "grid_search.fit(x_train, y_train)\n", + "\n", + "results = pd.DataFrame(grid_search.cv_results_)\n", + "\n", + "pvt = pd.pivot_table(results, values='mean_test_score',\n", + " index='param_gamma', columns='param_alpha')\n", + "heatmap = sns.heatmap(-pvt, annot=True, cmap='viridis', cbar_kws={'label': \"Mean absolute error [eV]\"})\n", + "figure = heatmap.get_figure()\n", + "plt.show()\n", + "\n", + "print(\"The best combinations of parameters are %s with a score of %0.3f eV on the validation set.\"\n", + " % (grid_search.best_params_, -grid_search.best_score_))\n", + "\n", + "y_pred = grid_search.predict(x_test) # scikit-learn automatically takes the best combination of hyperparameters from grid search\n", + "\n", + "print(\"Mean absolute error on test set: %0.3f eV\" %(np.abs(y_pred-y_test)).mean())\n", + "\n", + "plt.plot(y_test, y_pred, 'o')\n", + "plt.plot([np.min(y_test),np.max(y_test)], [np.min(y_test),np.max(y_test)], '-')\n", + "plt.xlabel('reference HOMO energy [eV]')\n", + "plt.ylabel('predicted HOMO energy [eV]')\n", + "plt.show()\n", + "print(\"R^2 score on test set: %.3f\" % r2_score(y_test, y_pred))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5yJUdzhewrmx" + }, + "source": [ + "#### c. Cross-validation\n", + "\n", + "For this exercise, choose the kernel that performed better in the previous exercise. Increase the number of folds used for cross-validation. Does the quality of the model increase? Take note as well of the increasing computational time and choose a number of folds that does not require too much computational time." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Jqy3rfT9wrmy" + }, + "outputs": [], + "source": [ + "#Repeat the above calculation with a higher number of cross-validation folds. Use the kernel that showed better performance.\n", + "alpha = None\n", + "gamma = None\n", + "cv_number=None\n", + "kernel = None\n", + "scoring_function = None\n", + "\n", + "grid_search = GridSearchCV(KernelRidge(), #machine learning method (KRR here)\n", + " [{'kernel':[kernel],'alpha': alpha, 'gamma': gamma}],\n", + " cv = cv_number,\n", + " scoring = scoring_function,\n", + " verbose=1000) ## produces detailed output statements of grid search routine\n", + "\n", + "grid_search.fit(x_train, y_train)\n", + "\n", + "results = pd.DataFrame(grid_search.cv_results_)\n", + "#pd.DataFrame(grid_search.cv_results_)\n", + "\n", + "pvt = pd.pivot_table(results, values='mean_test_score',\n", + " index='param_gamma', columns='param_alpha')\n", + "heatmap = sns.heatmap(-pvt, annot=True, cmap='viridis', cbar_kws={'label': \"Mean absolute error [eV]\"})\n", + "figure = heatmap.get_figure()\n", + "plt.show()\n", + "\n", + "print(\"The best combinations of parameters are %s with a score of %0.3f eV on the validation set.\"\n", + " % (grid_search.best_params_, -grid_search.best_score_))\n", + "\n", + "y_pred = grid_search.predict(x_test) # scikit-learn automatically takes the best combination of hyperparameters from grid search\n", + "\n", + "print(\"Mean absolute error on test set: %0.3f eV\" %(np.abs(y_pred-y_test)).mean())\n", + "\n", + "plt.plot(y_test, y_pred, 'o')\n", + "plt.plot([np.min(y_test),np.max(y_test)], [np.min(y_test),np.max(y_test)], '-')\n", + "plt.xlabel('reference HOMO energy [eV]')\n", + "plt.ylabel('predicted HOMO energy [eV]')\n", + "plt.show()\n", + "print(\"R^2 score on test set: %.3f\" % r2_score(y_test, y_pred))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jjoUSLWuwrmz" + }, + "source": [ + "#### d. Training set size\n", + "\n", + "Increase the size of the training set and plot the mean absolute error and $R^2$ score and training time on the test set as a function of the training set size (e.g. use 1000, 2000, 3000 etc. as training set size). For the MAE and $R^2$ values use a logarithmic time axis. Compare with the previous exercises.\n", + "\n", + "The optimal hyperparameter values for $\\alpha$ and $\\gamma$ can change throughout varying training set sizes. Therefore, when increasing the training set size, it is recommended to perform a cross-validated grid search for each training set size. For the sake of this exercise, we will limit ourselves to taking the optimal hyperparameters from the previous exercise as an approximation and not perform a grid search. Furthermore, we will not cross-validate our model in this exercise in the interest of time." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0o6HAqu4bYPb" + }, + "outputs": [], + "source": [ + "#We keep the size of the test set constant in this exercise\n", + "n_test = 1000\n", + "\n", + "#We want to iterate over different training set sizes n_train.\n", + "#Prepare an iterator with reasonable choices of n_train.\n", + "n_train_iterator=range(None,None,None)\n", + "\n", + "#These lists we want to fill during the iteration with the mean absolute error,\n", + "#R^2 score and elapsed training time.\n", + "mae_list=[]\n", + "r2_list=[]\n", + "time_list=[]\n", + "\n", + "#In alpha and gamma, we want to save the best choice of parameters from\n", + "#the previous exercise. Compare with the exercise above, if you are unsure\n", + "#how to access them from grid_search.\n", + "alpha=None\n", + "gamma=None\n", + "\n", + "#Use the kernel that has shown better performance.\n", + "kernel=None\n", + "\n", + "for n_train in n_train_iterator:\n", + " x_train = x[0:n_train]\n", + " y_train = y[0:n_train]\n", + "\n", + " x_test = x[n_train:n_train + n_test]\n", + " y_test = y[n_train:n_train + n_test]\n", + "\n", + " #Here we save the starting time of the kernel ridge training.\n", + " start = time.time()\n", + "\n", + " #In the following we will perform the kernel ridge training without\n", + " #cross-validation. For this we use the Object KernelRidge, which\n", + " #has previously been used as the estimator in the cross-validation.\n", + " #We need to set alpha, gamma and the kernel. For documentation, see:\n", + " #https://scikit-learn.org/stable/modules/generated/sklearn.kernel_ridge.KernelRidge.html\n", + " kernel_ridge=None\n", + "\n", + " #After settung up the kernel_ridge object, we need to train with our\n", + " #training data using the method fit and make predictions on the test set\n", + " #using the method predict in analogy to the cross-validation example.\n", + "\n", + " #Fill in your answer here!\n", + "\n", + " #Here we save the ending time of the kernel ridge training.\n", + " end = time.time()\n", + "\n", + " #Compute from the starting and ending time, the elapsed training time and\n", + " #append it to the list\n", + " time_list.append(None)\n", + " #Append the mean absolute error to the following list. Confer with the\n", + " #previous exercise, if you are unsure how to compute it.\n", + " mae_list.append(None)\n", + " r2_list.append(r2_score(y_test, y_pred))\n", + "\n", + "#In the following plot the size of the training set versus the elapsed time,\n", + "#mean absolute error and R^2 score in three separate plots.\n", + "\n", + "#Fill in your answer here!\n", + "\n", + "#Here we plot the exact vs. the predicted HOMO energies for our largest training\n", + "#set size. Can you see the reduced error from this plot in comparison with\n", + "#the previous exercises?\n", + "plt.plot(y_test, y_pred, 'o')\n", + "plt.plot([np.min(y_test),np.max(y_test)], [np.min(y_test),np.max(y_test)], '-')\n", + "plt.xlabel('reference HOMO energy [eV]')\n", + "plt.ylabel('predicted HOMO energy [eV]')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HNAUxo7IYwkX" + }, + "source": [ + "#####If you did everything correctly, the plots should look something like the following. Note that because of the random data shuffling in the beginning, they will not look exactly the same.\n", + "\n", + "![image.png]()\n", + "\n", + "![image.png]()\n", + "\n", + "![image.png]()\n", + "\n", + "![image.png]()" + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "colab": { + "collapsed_sections": [ + "0FxS6Kazwrmo" + ], + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" }, - "nbformat": 4, - "nbformat_minor": 0 -} \ No newline at end of file + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.18" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/AI4Spec_Tutorial2.ipynb b/AI4Spec_Tutorial2.ipynb index ad52a2b..1450ba5 100644 --- a/AI4Spec_Tutorial2.ipynb +++ b/AI4Spec_Tutorial2.ipynb @@ -36,7 +36,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": { "id": "qdphsPPRwwNF" }, @@ -66,7 +66,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -74,36 +74,7 @@ "id": "jTZ_1hpqLBK5", "outputId": "efa921b4-fb04-4fe9-dc78-8e039f748f37" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "--2024-03-04 21:34:38-- https://zenodo.org/records/10069732/files/coulomb_7000.npz\n", - "Resolving zenodo.org (zenodo.org)... 188.184.103.159, 188.184.98.238, 188.185.79.172, ...\n", - "Connecting to zenodo.org (zenodo.org)|188.184.103.159|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 47096264 (45M) [application/octet-stream]\n", - "Saving to: ‘coulomb_7000.npz.1’\n", - "\n", - "coulomb_7000.npz.1 100%[===================>] 44.91M 46.3MB/s in 1.0s \n", - "\n", - "2024-03-04 21:34:39 (46.3 MB/s) - ‘coulomb_7000.npz.1’ saved [47096264/47096264]\n", - "\n", - "--2024-03-04 21:34:40-- https://zenodo.org/records/10069732/files/energies_7000.npz\n", - "Resolving zenodo.org (zenodo.org)... 188.184.98.238, 188.185.79.172, 188.184.103.159, ...\n", - "Connecting to zenodo.org (zenodo.org)|188.184.98.238|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 896264 (875K) [application/octet-stream]\n", - "Saving to: ‘energies_7000.npz.1’\n", - "\n", - "energies_7000.npz.1 100%[===================>] 875.26K 4.13MB/s in 0.2s \n", - "\n", - "2024-03-04 21:34:40 (4.13 MB/s) - ‘energies_7000.npz.1’ saved [896264/896264]\n", - "\n" - ] - } - ], + "outputs": [], "source": [ "!wget https://zenodo.org/records/10069732/files/coulomb_7000.npz\n", "!wget https://zenodo.org/records/10069732/files/energies_7000.npz" @@ -111,7 +82,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -119,15 +90,7 @@ "id": "6tTcc8Mtwrmd", "outputId": "e96ae7fe-8e7b-49f5-96d8-f54f9980ddb5" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Number of molecules: 7000\n" - ] - } - ], + "outputs": [], "source": [ "# Here we load all the data\n", "x = np.abs(np.load(\"coulomb_7000.npz\")['arr_0'])\n", @@ -154,7 +117,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -162,102 +125,7 @@ "id": "7CovKOElwrmg", "outputId": "63b5843a-2a4e-4ae5-92d3-69efb8bf4ce9" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "(29, 29)\n", - "[[36.858 23.545 19.643 14.298 12.615 8.577 10.93 14.165 5.46 5.451\n", - " 5.466 2.245 2.124 2.203 1.135 2.263 2.071 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [23.545 36.858 33.353 23.482 19.429 13.106 16.645 23.444 2.74 2.752\n", - " 2.774 3.039 2.741 2.759 1.571 2.759 2.788 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [19.643 33.353 73.517 19.681 21.95 15.243 18.479 20.284 2.35 2.933\n", - " 2.975 8.241 2.959 2.359 1.972 2.415 3.243 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [14.298 23.482 19.681 36.858 33.39 15.238 15.014 14.537 2.16 2.15\n", - " 1.729 2.339 5.456 5.44 1.819 2.07 1.759 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [12.615 19.429 21.95 33.39 73.517 35.114 23.509 16.879 1.95 1.931\n", - " 1.732 2.55 3.934 3.857 3.966 2.322 2.135 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 8.577 13.106 15.243 15.238 35.114 36.858 33.114 15.338 1.343 1.247\n", - " 1.253 1.617 1.854 2.131 5.488 2.008 1.916 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [10.93 16.645 18.479 15.014 23.509 33.114 53.359 28.812 1.713 1.496\n", - " 1.667 1.825 1.835 2.231 3.402 3.348 3.366 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [14.165 23.444 20.284 14.537 16.879 15.338 28.812 36.858 2.122 1.714\n", - " 2.161 1.863 1.738 2.159 1.784 5.445 5.453 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 5.46 2.74 2.35 2.16 1.95 1.343 1.713 2.122 0.5 0.562\n", - " 0.561 0.273 0.313 0.396 0.18 0.4 0.298 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 5.451 2.752 2.933 2.15 1.931 1.247 1.496 1.714 0.562 0.5\n", - " 0.563 0.392 0.377 0.324 0.171 0.27 0.259 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 5.466 2.774 2.975 1.729 1.732 1.253 1.667 2.161 0.561 0.563\n", - " 0.5 0.328 0.262 0.268 0.17 0.346 0.369 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 2.245 3.039 8.241 2.339 2.55 1.617 1.825 1.863 0.273 0.392\n", - " 0.328 0.5 0.415 0.28 0.22 0.245 0.293 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 2.124 2.741 2.959 5.456 3.934 1.854 1.835 1.738 0.313 0.377\n", - " 0.262 0.415 0.5 0.561 0.247 0.256 0.234 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 2.203 2.759 2.359 5.44 3.857 2.131 2.231 2.159 0.396 0.324\n", - " 0.268 0.28 0.561 0.5 0.27 0.361 0.262 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 1.135 1.571 1.972 1.819 3.966 5.488 3.402 1.784 0.18 0.171\n", - " 0.17 0.22 0.247 0.27 0.5 0.253 0.248 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 2.263 2.759 2.415 2.07 2.322 2.008 3.348 5.445 0.4 0.27\n", - " 0.346 0.245 0.256 0.361 0.253 0.5 0.567 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 2.071 2.788 3.243 1.759 2.135 1.916 3.366 5.453 0.298 0.259\n", - " 0.369 0.293 0.234 0.262 0.248 0.567 0.5 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]\n", - " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", - " 0. 0. 0. 0. 0. 0. 0. 0. 0. ]]\n" - ] - } - ], + "outputs": [], "source": [ "rand_mol = random.randint(0, len(y))\n", "\n", @@ -279,7 +147,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", @@ -288,27 +156,7 @@ "id": "GJu2RtUhwrmh", "outputId": "0d52abe6-0262-4ac4-d53b-4f5e4d71ef3b" }, - "outputs": [ - { - "data": { - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "shape = (29, 29)\n", "mat = x[rand_mol].reshape(shape)\n", @@ -332,7 +180,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -340,19 +188,7 @@ "id": "u44zKs0_4IJv", "outputId": "603a67c1-86f4-4d47-a578-940126f085f4" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[-13.97184 -12.99997 -12.46609 -11.83582 -11.2511 -10.46114 -10.41363\n", - " -10.08263 -9.84337 -9.36033 -8.81967 -8.51355 -8.09764 -6.67936\n", - " -6.13557 -5.84277]\n", - "-5.84277\n", - "[-6.67936 -6.13557 -5.84277]\n" - ] - } - ], + "outputs": [], "source": [ "# Here we review the entire vector of orbital energies\n", "print(energies[rand_mol,:])\n", @@ -375,7 +211,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", @@ -384,26 +220,7 @@ "id": "YxjygJgJwrmj", "outputId": "89f2f6cc-45de-4473-a771-691cfc00f932" }, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Mean value of HOMO energies in QM9 dataset: -5.75 eV\n", - "[-9.42518 -8.12121 -6.48731 ... -5.94942 -5.47667 -5.69919]\n" - ] - } - ], + "outputs": [], "source": [ "plt.hist(y, bins=20, density=False, facecolor='blue')\n", "plt.xlabel(\"Energy [eV]\")\n", @@ -427,7 +244,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": { "id": "6luJyqCKwrmk" }, @@ -455,7 +272,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": { "id": "wmnl8ZzAwrmm" }, @@ -491,7 +308,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", @@ -500,27 +317,7 @@ "id": "w_H7kOBTwrmn", "outputId": "2a4683d7-30dc-4419-cf0a-e064632f88c2" }, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAj8AAAG2CAYAAACQ++e6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzs3XtclGX+//H3gHJQTkICsqKglocEPJvaFqapWG4ediu10jLt4KF0XZXNVKwVtSwz7Wzi7upabZnlFp5RMyRPiJpZmqdWwAxlAteRw/z+8Od8G0FkcIYZnNfz8bgfX+7TdX9m5Lu8u67rvm+D2Ww2CwAAwE14OLsAAACA6kT4AQAAboXwAwAA3ArhBwAAuBXCDwAAcCuEHwAA4FYIPwAAwK0QfgAAgFsh/AAAALdC+AEAAG7FqeEnOTlZHTt2lL+/v0JDQ9W/f38dOnTIsj8vL09jx45V8+bN5evrq0aNGmncuHHKz8+3asdgMJRZVqxYUd0fBwAA1ABODT+bN2/W6NGjtX37dq1bt05FRUXq1auXCgsLJUmnTp3SqVOn9PLLL2v//v1KSUlRamqqRowYUaatJUuWKDs727L079+/uj8OAACoAQyu9GLTn3/+WaGhodq8ebPuuOOOco/56KOP9NBDD6mwsFC1atWSdKnnZ+XKlQQeAABwTbWcXcBvXR7OCg4OrvCYgIAAS/C5bPTo0Xr88cfVpEkTPfnkk3r00UdlMBjKbcNkMslkMlnWS0tLlZeXp5CQkKueAwAAXIvZbNavv/6qiIgIeXjYMJhldhElJSXme+65x9ytW7erHvPzzz+bGzVqZP7rX/9qtX3mzJnmr776yrx7927z7Nmzzd7e3ubXXnvtqu1Mnz7dLImFhYWFhYXlBlhOnjxpU+ZwmWGvp556Sl9++aW++uorNWzYsMx+o9Gou+++W8HBwfrss89Uu3btq7Y1bdo0LVmyRCdPnix3/5U9P/n5+WrUqJFOnjypgICA6/8wAADA4YxGoyIjI3Xu3DkFBgZW+jyXGPYaM2aMVq9erS1btpQbfH799Vf16dNH/v7+WrlyZYXBR5I6d+6sF154QSaTSd7e3mX2e3t7l7s9ICCA8AMAQA1j65QVp97tZTabNWbMGK1cuVIbN25UdHR0mWOMRqN69eolLy8vffbZZ/Lx8blmu5mZmapXr165AQcAALg3p/b8jB49WsuXL9eqVavk7++vnJwcSVJgYKB8fX0twef8+fP65z//KaPRKKPRKEmqX7++PD099fnnnys3N1e33XabfHx8tG7dOs2aNUsTJ0505kcDAAAuyqlzfq7WTbVkyRINHz5caWlp6t69e7nHHD16VFFRUUpNTVViYqIOHz4ss9msZs2a6amnntLIkSMrPfPbaDQqMDDQcicZAABwfVX9++0yE56difADAI5XUlKioqIiZ5eBGqR27dry9PS86v6q/v12iQnPAIAbl9lsVk5Ojs6dO+fsUlADBQUFKTw83K7P4SP8AAAc6nLwCQ0NVZ06dXiYLCrFbDbr/PnzOn36tCSpQYMGdmub8AMAcJiSkhJL8AkJCXF2OahhfH19JUmnT59WaGhohUNgtnDqre4AgBvb5Tk+derUcXIlqKku/+7Yc74Y4QcA4HAMdaGqHPG7Q/gBAABuhfADAADcChOeAQDOMWOGS18vPj5ebdq00fz58+1WwvDhw3Xu3Dl9+umndmvzSseOHVN0dLT27NmjNm3aOOw6NRk9PwAAwK0QfgAAuMLw4cO1efNmvfbaazIYDDIYDDp27Jgkaf/+/UpISJCfn5/CwsL08MMP68yZM5Zz//3vfysmJka+vr4KCQlRz549VVhYqBkzZmjp0qVatWqVpc20tLRyr3+1Ni5777331LJlS/n4+KhFixZ64403LPsuvyS8bdu2MhgMio+Pt/v3U9Mx7AUAwBVee+01ff/992rdurVmzpwp6dILtc+dO6e77rpLjz/+uF599VX973//0+TJk3X//fdr48aNys7O1uDBgzV37lwNGDBAv/76q7Zu3Sqz2ayJEyfq4MGDMhqNWrJkiSQpODi4zLUrakOSli1bpmnTpmnhwoVq27at9uzZo5EjR6pu3boaNmyYvvnmG3Xq1Enr16/XrbfeKi8vr+r74moIwg8A3CCu1otQGfQOWAsMDJSXl5fq1Kmj8PBwy/bLgWPWrFmWbe+//74iIyP1/fffq6CgQMXFxRo4cKAaN24sSYqJibEc6+vrK5PJZNXmlbKzsytsY/r06Zo3b54GDhwo6VJPz7fffqu3335bw4YNU/369SVJISEhFV7HnRF+AACopL1792rTpk3y8/Mrs+/IkSPq1auXevTooZiYGPXu3Vu9evXSH//4R9WrV6/S14iLi7tqG4WFhTpy5IhGjBihkSNHWs4pLi5WYGCgXT6jOyD8AABQSQUFBerXr5/mzJlTZl+DBg3k6empdevW6euvv9batWv1+uuv67nnnlNGRoZlLs61VNTG5acdv/vuu+rcuXOZ81A5THgGAKAcXl5eKikpsdrWrl07HThwQFFRUWrWrJnVUrduXUmXnkjcrVs3JSUlac+ePfLy8tLKlSuv2mZ5rtZGWFiYIiIi9OOPP5a5/uVwdXmOT2Wu467o+QEAoBxRUVHKyMjQsWPH5Ofnp+DgYI0ePVrvvvuuBg8erEmTJik4OFiHDx/WihUr9N5772nnzp3asGGDevXqpdDQUGVkZOjnn39Wy5YtLW2uWbNGhw4dUkhIiAIDA1W7dm2r62ZkZFTYRlJSksaNG6fAwED16dNHJpNJO3fu1NmzZzVhwgSFhobK19dXqampatiwoXx8fBgSuwI9PwAAlGPixIny9PRUq1atVL9+fZ04cUIRERHatm2bSkpK1KtXL8XExOjZZ59VUFCQPDw8FBAQoC1btqhv37665ZZbNHXqVM2bN08JCQmSpJEjR6p58+bq0KGD6tevr23btpW57rXaePzxx/Xee+9pyZIliomJ0Z133qmUlBRLz0+tWrW0YMECvf3224qIiNB9991XfV9aDWEwX753zo0ZjUYFBgYqPz9fAQEBzi4HAKrEFe/2unDhgo4eParo6Gj5+Pg45Bq4sVX0O1TVv9/0/AAAALdC+AEAAG6F8AMAANwK4QcAALgVwg8AAHArhB8AAOBWCD8AAMCtEH4AAIBbIfwAAAC3QvgBAMBBoqKiNH/+fMu6wWDQp59+etXjjx07JoPBoMzMzOu6rr3auVHxYlMAgHNkzaje68VW8/XKkZ2drXr16tm1zeHDh+vcuXNWoSoyMlLZ2dm66aab7HotezMYDFq5cqX69+9frdcl/AAAUE3Cw8Or5Tqenp7Vdq2ayKnDXsnJyerYsaP8/f0VGhqq/v3769ChQ1bHXLhwQaNHj1ZISIj8/Pw0aNAg5ebmWh1z4sQJ3XPPPapTp45CQ0P1l7/8RcXFxdX5UQAAN5B33nlHERERKi0ttdp+33336bHHHpMkHTlyRPfdd5/CwsLk5+enjh07av369RW2e+Ww1zfffKO2bdvKx8dHHTp00J49e6yOLykp0YgRIxQdHS1fX181b95cr732mmX/jBkztHTpUq1atUoGg0EGg0FpaWnlDntt3rxZnTp1kre3txo0aKApU6ZY/a2Mj4/XuHHjNGnSJAUHBys8PFwzZsyo8POkpaWpU6dOqlu3roKCgtStWzcdP37csn/VqlVq166dfHx81KRJEyUlJVmuGRUVJUkaMGCADAaDZb06ODX8bN68WaNHj9b27du1bt06FRUVqVevXiosLLQcM378eH3++ef66KOPtHnzZp06dUoDBw607C8pKdE999yjixcv6uuvv9bSpUuVkpKiadOmOeMjAQBuAH/605/0yy+/aNOmTZZteXl5Sk1N1dChQyVJBQUF6tu3rzZs2KA9e/aoT58+6tevn06cOFGpaxQUFOjee+9Vq1attGvXLs2YMUMTJ060Oqa0tFQNGzbURx99pG+//VbTpk3TX//6V3344YeSpIkTJ+r+++9Xnz59lJ2drezsbHXt2rXMtf773/+qb9++6tixo/bu3as333xTixcv1osvvmh13NKlS1W3bl1lZGRo7ty5mjlzptatW1du/cXFxerfv7/uvPNOZWVlKT09XaNGjZLBYJAkbd26VY888oieeeYZffvtt3r77beVkpKiv/3tb5KkHTt2SJKWLFmi7Oxsy3p1cOqwV2pqqtV6SkqKQkNDtWvXLt1xxx3Kz8/X4sWLtXz5ct11112SLn1JLVu21Pbt23Xbbbdp7dq1+vbbb7V+/XqFhYWpTZs2euGFFzR58mTNmDFDXl5ezvhoAIAarF69ekpISNDy5cvVo0cPSdK///1v3XTTTerevbskKS4uTnFxcZZzXnjhBa1cuVKfffaZxowZc81rLF++XKWlpVq8eLF8fHx066236qefftJTTz1lOaZ27dpKSkqyrEdHRys9PV0ffvih7r//fvn5+cnX11cmk6nCYa433nhDkZGRWrhwoQwGg1q0aKFTp05p8uTJmjZtmjw8LvWFxMbGavr06ZKkm2++WQsXLtSGDRt09913l2nTaDQqPz9f9957r5o2bSpJatmypWV/UlKSpkyZomHDhkmSmjRpohdeeEGTJk3S9OnTVb9+fUlSUFBQtQ/RudTdXvn5+ZKk4OBgSdKuXbtUVFSknj17Wo5p0aKFGjVqpPT0dElSenq6YmJiFBYWZjmmd+/eMhqNOnDgQLnXMZlMMhqNVgsAAL81dOhQffzxxzKZTJKkZcuW6cEHH7QEhYKCAk2cOFEtW7ZUUFCQ/Pz8dPDgwUr3/Bw8eFCxsbHy8fGxbOvSpUuZ4xYtWqT27durfv368vPz0zvvvFPpa/z2Wl26dLH0ykhSt27dVFBQoJ9++smyLTY21uq8Bg0a6PTp0+W2GRwcrOHDh6t3797q16+fXnvtNWVnZ1v27927VzNnzpSfn59lGTlypLKzs3X+/Hmb6rc3lwk/paWlevbZZ9WtWze1bt1akpSTkyMvLy8FBQVZHRsWFqacnBzLMb8NPpf3X95XnuTkZAUGBlqWyMhIe38cAEAN169fP5nNZv3nP//RyZMntXXrVsuQl3RpyGnlypWaNWuWtm7dqszMTMXExOjixYt2q2HFihWaOHGiRowYobVr1yozM1OPPvqoXa/xW7Vr17ZaNxgMZeY9/daSJUuUnp6url276oMPPtAtt9yi7du3S7oUDpOSkpSZmWlZ9u3bpx9++MEq8DmDy9ztNXr0aO3fv19fffWVw6+VmJioCRMmWNaNRiMBCABgxcfHRwMHDtSyZct0+PBhNW/eXO3atbPs37Ztm4YPH64BAwZIuvTH/tixY5Vuv2XLlvrHP/6hCxcuWMLA5eDw22t07dpVTz/9tGXbkSNHrI7x8vJSSUnJNa/18ccfy2w2W3p/tm3bJn9/fzVs2LDSNZenbdu2atu2rRITE9WlSxctX75ct912m9q1a6dDhw6pWbNmVz23du3a16zdEVyi52fMmDFavXq1Nm3aZPWPEB4erosXL+rcuXNWx+fm5lrGB8PDw8vc/XV5/WpjiN7e3goICLBaAAC40tChQ/Wf//xH77//vlWvj3RpTswnn3yizMxM7d27V0OGDKmwl+RKQ4YMkcFg0MiRI/Xtt9/qiy++0Msvv1zmGjt37tSaNWv0/fff6/nnny8zMTgqKkpZWVk6dOiQzpw5o6KiojLXevrpp3Xy5EmNHTtW3333nVatWqXp06drwoQJlmE8Wx09elSJiYlKT0/X8ePHtXbtWv3www+WeT/Tpk3T3//+dyUlJenAgQM6ePCgVqxYoalTp1rVvmHDBuXk5Ojs2bNVqqMqnBp+zGazxowZo5UrV2rjxo2Kjo622t++fXvVrl1bGzZssGw7dOiQTpw4YRkX7dKli/bt22c1Jrlu3ToFBASoVatW1fNBAAA3pLvuukvBwcE6dOiQhgwZYrXvlVdeUb169dS1a1f169dPvXv3tuoZuhY/Pz99/vnn2rdvn9q2bavnnntOc+bMsTrmiSee0MCBA/XAAw+oc+fO+uWXX6x6gSRp5MiRat68uTp06KD69etr27ZtZa71u9/9Tl988YW++eYbxcXF6cknn9SIESOsgoit6tSpo++++06DBg3SLbfcolGjRmn06NF64oknJF2af7t69WqtXbtWHTt21G233aZXX31VjRs3trQxb948rVu3TpGRkWrbtm2Va7GVwWw2m6vtald4+umntXz5cq1atUrNmze3bA8MDJSvr68k6amnntIXX3yhlJQUBQQEaOzYsZKkr7/+WtKlW93btGmjiIgIzZ07Vzk5OXr44Yf1+OOPa9asWZWqw2g0KjAwUPn5+fQCAaix0tLSqnxufHy83er4rQsXLujo0aOKjo52+jwP1EwV/Q5V9e+3U+f8vPnmm5LK/j/dkiVLNHz4cEnSq6++Kg8PDw0aNEgmk0m9e/fWG2+8YTnW09NTq1ev1lNPPaUuXbqobt26GjZsmGbOnFldHwMAANQgTg0/lel08vHx0aJFi7Ro0aKrHtO4cWN98cUX9iwNAADcoFxiwjMAAEB1IfwAAAC3QvgBAABuhfADAADcCuEHAAC4FcIPAABwK4QfAADgVgg/AABUg6ioKM2fP7/Sx6elpclgMJR5vyWun8u81R0A4F6u53UcVWHrKzzi4+PVpk0bmwJLRXbs2KG6detW+viuXbsqOztbgYGBdrm+o9j7e6oOhB8AAKrIbDarpKREtWpd+89p/fr1bWrby8tL4eHhVS0NFWDYCwCAKwwfPlybN2/Wa6+9JoPBIIPBoGPHjlmGor788ku1b99e3t7e+uqrr3TkyBHdd999CgsLk5+fnzp27Kj169dbtXnlsJfBYNB7772nAQMGqE6dOrr55pv12WefWfZfOeyVkpKioKAgrVmzRi1btpSfn5/69Omj7OxsyznFxcUaN26cgoKCFBISosmTJ2vYsGHq37//VT/r8ePH1a9fP9WrV09169bVrbfeavXKqP379yshIUF+fn4KCwvTww8/rDNnzlT4Pbk6wg8AAFd47bXX1KVLF40cOVLZ2dnKzs5WZGSkZf+UKVM0e/ZsHTx4ULGxsSooKFDfvn21YcMG7dmzR3369FG/fv104sSJCq+TlJSk+++/X1lZWerbt6+GDh2qvLy8qx5//vx5vfzyy/rHP/6hLVu26MSJE5o4caJl/5w5c7Rs2TItWbJE27Ztk9Fo1KefflphDaNHj5bJZNKWLVu0b98+zZkzR35+fpKkc+fO6a677lLbtm21c+dOpaamKjc3V/fff3+lvidXxbAXAABXCAwMlJeXl+rUqVPu0NPMmTN19913W9aDg4MVFxdnWX/hhRe0cuVKffbZZxozZsxVrzN8+HANHjxYkjRr1iwtWLBA33zzjfr06VPu8UVFRXrrrbfUtGlTSdKYMWM0c+ZMy/7XX39diYmJGjBggCRp4cKF13zx94kTJzRo0CDFxMRIkpo0aWLZt3DhQrVt21azZs2ybHv//fcVGRmp77//XrfcckuF35OroucHAAAbdejQwWq9oKBAEydOVMuWLRUUFCQ/Pz8dPHjwmj0/sbGxlp/r1q2rgIAAnT59+qrH16lTxxJ8JKlBgwaW4/Pz85Wbm6tOnTpZ9nt6eqp9+/YV1jBu3Di9+OKL6tatm6ZPn66srCzLvr1792rTpk3y8/OzLC1atJAkHTlypMJ2XRnhBwAAG11519bEiRO1cuVKzZo1S1u3blVmZqZiYmJ08eLFCtupXbu21brBYFBpaalNx5vNZhurt/b444/rxx9/1MMPP6x9+/apQ4cOev311yVdCnX9+vVTZmam1fLDDz/ojjvuuK7rOhPhBwCAcnh5eamkpKRSx27btk3Dhw/XgAEDFBMTo/Dw8Gqf+BsYGKiwsDDt2LHDsq2kpES7d+++5rmRkZF68skn9cknn+jPf/6z3n33XUlSu3btdODAAUVFRalZs2ZWy+UAaMv35CoIPwAAlCMqKkoZGRk6duyYzpw5U2GPzM0336xPPvlEmZmZ2rt3r4YMGVLh8Y4yduxYJScna9WqVTp06JCeeeYZnT17VgaD4arnPPvss1qzZo2OHj2q3bt3a9OmTWrZsqWkS5Oh8/LyNHjwYO3YsUNHjhzRmjVr9Oijj1oCjy3fk6sg/AAAUI6JEyfK09NTrVq1Uv369Sucv/PKK6+oXr166tq1q/r166fevXurXbt21VjtJZMnT9bgwYP1yCOPqEuXLvLz81Pv3r3l4+Nz1XNKSko0evRotWzZUn369NEtt9yiN954Q5IUERGhbdu2qaSkRL169VJMTIyeffZZBQUFycPjUoSw5XtyFQbz9Q4W3gCMRqMCAwOVn5+vgIAAZ5cDAFVyPU9MtvXpx5V14cIFHT16VNHR0RX+AYZjlJaWqmXLlrr//vv1wgsvOLucKqnod6iqf7+51R0AgBvE8ePHtXbtWt15550ymUxauHChjh49qiFDhji7NJfCsBcAADcIDw8PpaSkqGPHjurWrZv27dun9evXW+bw4BJ6fgAAuEFERkZq27Ztzi7D5dHzAwAA3ArhBwDgcNxbg6pyxO8O4QcA4DCXn0h8/vx5J1eCmury786VT7e+Hsz5AQA4jKenp4KCgizvn6pTp06FD9wDLjObzTp//rxOnz6toKAgeXp62q1twg8AwKEuv+27ohd2AlcTFBRk9zfGE34AAA5lMBjUoEEDhYaGqqioyNnloAapXbu2XXt8LiP8AACqhaenp0P+kAG2YsIzAABwK04NP1u2bFG/fv0UEREhg8GgTz/91Gq/wWAod3nppZcsx0RFRZXZP3v27Or+KAAAoIZw6rBXYWGh4uLi9Nhjj2ngwIFl9mdnZ1utf/nllxoxYoQGDRpktX3mzJkaOXKkZd3f398xBQOAA13Pi0kBVJ5Tw09CQoISEhKuuv/K2d2rVq1S9+7d1aRJE6vt/v7+dp8JDgAAbkw1Zs5Pbm6u/vOf/2jEiBFl9s2ePVshISFq27atXnrpJRUXF1fYlslkktFotFoAAIB7qDF3ey1dulT+/v5lhsfGjRundu3aKTg4WF9//bUSExOVnZ2tV1555aptJScnKykpydElAwAAF1Rjws/777+voUOHysfHx2r7hAkTLD/HxsbKy8tLTzzxhJKTk+Xt7V1uW4mJiVbnGY1GRUZGOqZwAADgUmpE+Nm6dasOHTqkDz744JrHdu7cWcXFxTp27JiaN29e7jHe3t5XDUYAAODGViPm/CxevFjt27dXXFzcNY/NzMyUh4eHQkNDq6EyAABQ0zi156egoECHDx+2rB89elSZmZkKDg5Wo0aNJF0akvroo480b968Muenp6crIyND3bt3l7+/v9LT0zV+/Hg99NBDqlevXrV9DgAAUHM4Nfzs3LlT3bt3t6xfnoczbNgwpaSkSJJWrFghs9mswYMHlznf29tbK1as0IwZM2QymRQdHa3x48dbzecBAAD4LYPZbDY7uwhnMxqNCgwMVH5+vgICApxdDgA35cyHHMbHxzvt2kBVVfXvd42Y8wMAAGAvhB8AAOBWCD8AAMCtEH4AAIBbIfwAAAC3QvgBAABuhfADAADcCuEHAAC4FcIPAABwK4QfAADgVgg/AADArRB+AACAWyH8AAAAt0L4AQAAboXwAwAA3ArhBwAAuBXCDwAAcCuEHwAA4FYIPwAAwK0QfgAAgFsh/AAAALdC+AEAAG6F8AMAANwK4QcAALgVm8PPyZMn9dNPP1nWv/nmGz377LN655137FoYAACAI9gcfoYMGaJNmzZJknJycnT33Xfrm2++0XPPPaeZM2favUAAAAB7sjn87N+/X506dZIkffjhh2rdurW+/vprLVu2TCkpKfauDwAAwK5sDj9FRUXy9vaWJK1fv15/+MMfJEktWrRQdna2fasDAACwM5vDz6233qq33npLW7du1bp169SnTx9J0qlTpxQSEmL3AgEAAOzJ5vAzZ84cvf3224qPj9fgwYMVFxcnSfrss88sw2EAAACuyubwEx8frzNnzujMmTN6//33LdtHjRqlt956y6a2tmzZon79+ikiIkIGg0Gffvqp1f7hw4fLYDBYLZd7mi7Ly8vT0KFDFRAQoKCgII0YMUIFBQW2fiwAAOAmqvScH7PZrF27duntt9/Wr7/+Kkny8vJSnTp1bGqnsLBQcXFxWrRo0VWP6dOnj7Kzsy3Lv/71L6v9Q4cO1YEDB7Ru3TqtXr1aW7Zs0ahRo2z/UAAAwC3UsvWE48ePq0+fPjpx4oRMJpPuvvtu+fv7a86cOTKZTDb1/iQkJCghIaHCY7y9vRUeHl7uvoMHDyo1NVU7duxQhw4dJEmvv/66+vbtq5dfflkRERGV/2AAAMAt2Nzz88wzz6hDhw46e/asfH19LdsHDBigDRs22LU4SUpLS1NoaKiaN2+up556Sr/88otlX3p6uoKCgizBR5J69uwpDw8PZWRkXLVNk8kko9FotQAAAPdgc8/P1q1b9fXXX8vLy8tqe1RUlP773//arTDp0pDXwIEDFR0drSNHjuivf/2rEhISlJ6eLk9PT+Xk5Cg0NNTqnFq1aik4OFg5OTlXbTc5OVlJSUl2rRUAANQMNoef0tJSlZSUlNn+008/yd/f3y5FXfbggw9afo6JiVFsbKyaNm2qtLQ09ejRo8rtJiYmasKECZZ1o9GoyMjI66oVAADUDDYPe/Xq1Uvz58+3rBsMBhUUFGj69Onq27evXYu7UpMmTXTTTTfp8OHDkqTw8HCdPn3a6pji4mLl5eVddZ6QdGkeUUBAgNUCAADcg83hZ968edq2bZtatWqlCxcuaMiQIZYhrzlz5jiiRouffvpJv/zyixo0aCBJ6tKli86dO6ddu3ZZjtm4caNKS0vVuXNnh9YCAABqJpuHvRo2bKi9e/dqxYoVysrKUkFBgUaMGKGhQ4daTYCujIKCAksvjiQdPXpUmZmZCg4OVnBwsJKSkjRo0CCFh4fryJEjmjRpkpo1a6bevXtLklq2bKk+ffpo5MiReuutt1RUVKQxY8bowQcf5E4vAABQLpuTD+lXAAAgAElEQVTDj3RpUvFDDz103RffuXOnunfvblm/PA9n2LBhevPNN5WVlaWlS5fq3LlzioiIUK9evfTCCy9Y3i0mScuWLdOYMWPUo0cPeXh4aNCgQVqwYMF11wYAAG5MlQo/n332WaUbvPyi08qIj4+X2Wy+6v41a9Zcs43g4GAtX7680tcEAADurVLhp3///pVqzGAwlHsnGAAAgKuoVPgpLS11dB0AAADVokrv9gIAAKipbJ7wPHPmzAr3T5s2rcrFAAAAOJrN4WflypVW60VFRTp69Khq1aqlpk2bEn4AAIBLszn87Nmzp8w2o9Go4cOHa8CAAXYpCgAAwFHsMucnICBASUlJev755+3RHAAAgMPYbcJzfn6+8vPz7dUcAACAQ9g87HXl05PNZrOys7P1j3/8QwkJCXYrDAAAwBFsDj+vvvqq1bqHh4fq16+vYcOGKTEx0W6FAQAAOILN4efo0aOOqAMAAKBa2DznJz8/X3l5eWW25+XlyWg02qUoAAAAR7E5/Dz44INasWJFme0ffvihHnzwQbsUBQAA4Cg2h5+MjAx17969zPb4+HhlZGTYpSgAAABHsTn8mEwmFRcXl9leVFSk//3vf3YpCgAAwFFsDj+dOnXSO++8U2b7W2+9pfbt29ulKAAAAEex+W6vF198UT179tTevXvVo0cPSdKGDRu0Y8cOrV271u4FAgAA2JPNPT/dunVTenq6GjZsqA8//FCff/65mjVrpqysLP3+9793RI0AAAB2Y3PPjyS1adNGy5cvt3ctAAAADleld3sdOXJEU6dO1ZAhQ3T69GlJ0pdffqkDBw7YtTgAAAB7szn8bN68WTExMcrIyNDHH3+sgoICSdLevXs1ffp0uxcIAABgTzaHnylTpujFF1/UunXr5OXlZdl+1113afv27XYtDgAAwN5sDj/79u3TgAEDymwPDQ3VmTNn7FIUAACAo9gcfoKCgpSdnV1m+549e/S73/3OLkUBAAA4SpXe7TV58mTl5OTIYDCotLRU27Zt08SJE/XII484okYAAAC7sTn8zJo1Sy1atFBkZKQKCgrUqlUr3XHHHerataumTp3qiBoBAADsxubn/Hh5eendd9/V888/r/3796ugoEBt27bVzTff7Ij6AAAA7KpKDzmUpEaNGqlRo0b2rAUAAMDhKhV+JkyYUOkGX3nllSoXAwAA4GiVCj979uypVGMGg+G6igEAAHC0SoWfTZs2OeTiW7Zs0UsvvaRdu3YpOztbK1euVP/+/SVJRUVFmjp1qr744gv9+OOPCgwMVM+ePTV79mxFRERY2oiKitLx48et2k1OTtaUKVMcUjMAAKjZqvRur8t++ukn/fTTT1U+v7CwUHFxcVq0aFGZfefPn9fu3bv1/PPPa/fu3frkk0906NAh/eEPfyhz7MyZM5WdnW1Zxo4dW+WaAADAjc3mCc+lpaV68cUXNW/ePMt7vfz9/fXnP/9Zzz33nDw8Kp+nEhISlJCQUO6+wMBArVu3zmrbwoUL1alTJ504ccJqsrW/v7/Cw8Nt/SgAAMAN2dzz89xzz2nhwoWaPXu29uzZoz179mjWrFl6/fXX9fzzzzuiRov8/HwZDAYFBQVZbZ89e7ZCQkLUtm1bvfTSSyouLq6wHZPJJKPRaLUAAAD3YHPPz9KlS/Xee+9ZDT/Fxsbqd7/7nZ5++mn97W9/s2uBl124cEGTJ0/W4MGDFRAQYNk+btw4tWvXTsHBwfr666+VmJio7OzsCu86S05OVlJSkkPqBAAArs3m8JOXl6cWLVqU2d6iRQvl5eXZpagrFRUV6f7775fZbNabb75pte+3t+HHxsbKy8tLTzzxhJKTk+Xt7V1ue4mJiVbnGY1GRUZGOqR2AADgWmwOP3FxcVq4cKEWLFhgtX3hwoWKi4uzW2GXXQ4+x48f18aNG616fcrTuXNnFRcX69ixY2revHm5x3h7e181GAGAS8hNc0y7YfGOaReoQWwOP3PnztU999yj9evXq0uXLpKk9PR0nTx5Ul988YVdi7scfH744Qdt2rRJISEh1zwnMzNTHh4eCg0NtWstAADgxmBz+Lnzzjv1/fffa9GiRfruu+8kSQMHDtTTTz9t9fydyigoKNDhw4ct60ePHlVmZqaCg4PVoEED/fGPf9Tu3bu1evVqlZSUKCcnR5IUHBwsLy8vpaenKyMjQ927d5e/v7/S09M1fvx4PfTQQ6pXr56tHw0AALgBg9lsNjvr4mlpaerevXuZ7cOGDdOMGTMUHR1d7nmbNm1SfHy8du/eraefflrfffedTCaToqOj9fDDD2vChAk2DWsZjUYFBgYqPz//msNqAOAoaWlp/7dSzcNe8fHlbwdcWVX/flfpxaYXLlxQVlaWTp8+rdLSUqt95T2E8Gri4+NVUfa6Vi5r166dtm/fXunrAQAA2Bx+UlNT9cgjj+jMmTNl9hkMBpWUlNilMAAAAEew+SGHY8eO1Z/+9CdlZ2ertLTUaiH4AAAAV2dz+MnNzdWECRMUFhbmiHoAAAAcyubw88c//tF6Uh4AAEANYvOcn4ULF+pPf/qTtm7dqpiYGNWuXdtq/7hx4+xWHAAAgL3ZHH7+9a9/ae3atfLx8VFaWpoMBoNln8FgIPwAcGv0jAOuz+bw89xzzykpKUlTpkyRh4fNo2YAAABOZXN6uXjxoh544AGCDwAAqJFsTjDDhg3TBx984IhaAAAAHM7mYa+SkhLNnTtXa9asUWxsbJkJz6+88ordigMAALA3m8PPvn371LZtW0nS/v37rfb9dvIzAACAK7I5/GzatMkRdQAAAFQLZi0DAAC3QvgBAABuhfADAADcCuEHAAC4lUqFn3bt2uns2bOSpJkzZ+r8+fMOLQoAAMBRKhV+Dh48qMLCQklSUlKSCgoKHFoUAACAo1TqVvc2bdro0Ucf1e233y6z2ayXX35Zfn5+5R47bdo0uxYIAABgT5UKPykpKZo+fbpWr14tg8GgL7/8UrVqlT3VYDAQfgAAgEurVPhp3ry5VqxYIUny8PDQhg0bFBoa6tDCAAAAHMHmJzyXlpY6og4AAIBqYXP4kaQjR45o/vz5OnjwoCSpVatWeuaZZ9S0aVO7FgcAAGBvNj/nZ82aNWrVqpW++eYbxcbGKjY2VhkZGbr11lu1bt06R9QIAABgNzb3/EyZMkXjx4/X7Nmzy2yfPHmy7r77brsVBwBu79gx+7QTFWWfdoAbgM09PwcPHtSIESPKbH/sscf07bff2qUoAAAAR7G556d+/frKzMzUzTffbLU9MzOTO8AAoIZKS0ur8rnx8fF2qwOoDjaHn5EjR2rUqFH68ccf1bVrV0nStm3bNGfOHE2YMMHuBQIAANiTzeHn+eefl7+/v+bNm6fExERJUkREhGbMmKFx48bZvUAAAAB7sjn8GAwGjR8/XuPHj9evv/4qSfL397d7YQAAAI5Qpef8XEboAQAANY3Nd3vZ05YtW9SvXz9FRETIYDDo008/tdpvNps1bdo0NWjQQL6+vurZs6d++OEHq2Py8vI0dOhQBQQEKCgoSCNGjOCt8wAA4KqcGn4KCwsVFxenRYsWlbt/7ty5WrBggd566y1lZGSobt266t27ty5cuGA5ZujQoTpw4IDWrVun1atXa8uWLRo1alR1fQQAAFDDXNew1/VKSEhQQkJCufvMZrPmz5+vqVOn6r777pMk/f3vf1dYWJg+/fRTPfjggzp48KBSU1O1Y8cOdejQQZL0+uuvq2/fvnr55ZcVERFRbtsmk0kmk8mybjQa7fzJAACAq7Kp56eoqEg9evQoM/TkCEePHlVOTo569uxp2RYYGKjOnTsrPT1dkpSenq6goCBL8JGknj17ysPDQxkZGVdtOzk5WYGBgZYlMjLScR8EAAC4FJvCT+3atZWVleWoWqzk5ORIksLCwqy2h4WFWfbl5OSUebBirVq1FBwcbDmmPImJicrPz7csJ0+etHP1AADAVdk85+ehhx7S4sWLHVFLtfH29lZAQIDVAgAA3IPNc36Ki4v1/vvva/369Wrfvr3q1q1rtf+VV16xS2Hh4eGSpNzcXDVo0MCyPTc3V23atLEcc/r06TL15eXlWc4HAAD4LZvDz/79+9WuXTtJ0vfff2+1z2Aw2KcqSdHR0QoPD9eGDRssYcdoNCojI0NPPfWUJKlLly46d+6cdu3apfbt20uSNm7cqNLSUnXu3NlutQBAubJmlN2WW+1VVM7lt8MfS6l6G1d7M3xW2qX/Gzuj6m0D1cjm8LNp0ya7XbygoECHDx+2rB89elSZmZkKDg5Wo0aN9Oyzz+rFF1/UzTffrOjoaD3//POKiIhQ//79JUktW7ZUnz59NHLkSL311lsqKirSmDFj9OCDD171Ti8AAODeqnyr++HDh3XkyBHdcccd8vX1ldlstrnnZ+fOnerevbtl/fKLUYcNG6aUlBRNmjRJhYWFGjVqlM6dO6fbb79dqamp8vHxsZyzbNkyjRkzRj169JCHh4cGDRqkBQsWVPVjAQCAG5zBbDabbTnhl19+0f33369NmzbJYDDohx9+UJMmTfTYY4+pXr16mjdvnqNqdRij0ajAwEDl5+cz+RlA5ZUz7JV20M7XuDxc5QquMuwV3/L//8CwF6pZVf9+23y31/jx41W7dm2dOHFCderUsWx/4IEHlJqaamtzAAAA1crmYa+1a9dqzZo1atiwodX2m2++WcePH7dbYQAAAI5gc89PYWGhVY/PZXl5efL29rZLUQAAAI5ic/j5/e9/r7///e+WdYPBoNLSUs2dO9dq8jIAAIArsnnYa+7cuerRo4d27typixcvatKkSTpw4IDy8vK0bds2R9QIAABgNzb3/LRu3Vrff/+9br/9dt13330qLCzUwIEDtWfPHjVt2tQRNQIAANhNlZ7zExgYqOeee87etQAAADhclcLP2bNntXjxYh08eOmBFq1atdKjjz6q4OBguxYHAABgbzYPe23ZskVRUVFasGCBzp49q7Nnz2rBggWKjo7Wli1bHFEjAACA3djc8zN69Gg98MADevPNN+Xp6SlJKikp0dNPP63Ro0dr3759di8SAADAXmzu+Tl8+LD+/Oc/W4KPJHl6emrChAlWLykFAABwRTaHn3bt2lnm+vzWwYMHFRcXZ5eiAAAAHKVSw15ZWVmWn8eNG6dnnnlGhw8f1m233SZJ2r59uxYtWqTZs2c7pkoAAAA7qVT4adOmjQwGg377AvhJkyaVOW7IkCF64IEH7FcdAACAnVUq/Bw9etTRdQAAAFSLSoWfxo0bO7oOAACAalGlhxyeOnVKX331lU6fPq3S0lKrfePGjbNLYQAAAI5gc/hJSUnRE088IS8vL4WEhMhgMFj2GQwGwg8AAHBpNoef559/XtOmTVNiYqI8PGy+Ux4AAMCpbE4v58+f14MPPkjwAQAANZLNCWbEiBH66KOPHFELAACAw9k87JWcnKx7771XqampiomJUe3ata32v/LKK3YrDgAAwN6qFH7WrFmj5s2bS1KZCc8AAACuzObwM2/ePL3//vsaPny4A8oBAABwLJvn/Hh7e6tbt26OqAUAAMDhbA4/zzzzjF5//XVH1AIAAOBwNg97ffPNN9q4caNWr16tW2+9tcyE508++cRuxQEAANibzeEnKChIAwcOdEQtAAAADmdz+FmyZIkj6gAAAKgWPKYZAAC4FZt7fqKjoyt8ns+PP/54XQVdKSoqSsePHy+z/emnn9aiRYsUHx+vzZs3W+174okn9NZbb9m1DgCwybFjzq4AwFXYHH6effZZq/WioiLt2bNHqamp+stf/mK3wi7bsWOHSkpKLOv79+/X3XffrT/96U+WbSNHjtTMmTMt63Xq1LF7HQAA4MZgc/h55plnyt2+aNEi7dy587oLulL9+vWt1mfPnq2mTZvqzjvvtGyrU6eOwsPDK92myWSSyWSyrBuNxusvFAAA1Ah2m/OTkJCgjz/+2F7NlevixYv65z//qccee8xq6G3ZsmW66aab1Lp1ayUmJur8+fMVtpOcnKzAwEDLEhkZ6dC6AQCA67C55+dq/v3vfys4ONhezZXr008/1blz56xerTFkyBA1btxYERERysrK0uTJk3Xo0KEKnzeUmJioCRMmWNaNRiMBCAAAN2Fz+Gnbtq1Vr4vZbFZOTo5+/vlnvfHGG3Yt7kqLFy9WQkKCIiIiLNtGjRpl+TkmJkYNGjRQjx49dOTIETVt2rTcdry9veXt7e3QWgEAgGuyOfz079/fat3Dw0P169dXfHy8WrRoYbfCrnT8+HGtX7/+mk+Q7ty5syTp8OHDVw0/AADAfdkcfqZPn+6IOq5pyZIlCg0N1T333FPhcZmZmZKkBg0aVEdZAACghrHbnB9HKi0t1ZIlSzRs2DDVqvV/JR85ckTLly9X3759FRISoqysLI0fP1533HGHYmNjnVgxALiPtIP//4e8tCqdHx8fb69SgEqpdPjx8PCo8OGGkmQwGFRcXHzdRV1p/fr1OnHihB577DGr7V5eXlq/fr3mz5+vwsJCRUZGatCgQZo6dardawAAADeGSoeflStXXnVfenq6FixYoNLSUrsUdaVevXrJbDaX2R4ZGVnm6c4AcL3S0tIqd2CuQ8sA4CCVDj/33XdfmW2HDh3SlClT9Pnnn2vo0KFWT1kGAABwRVV6yOGpU6c0cuRIxcTEqLi4WJmZmVq6dKkaN25s7/oAAADsyqbwk5+fr8mTJ6tZs2Y6cOCANmzYoM8//1ytW7d2VH0AAAB2Velhr7lz52rOnDkKDw/Xv/71r3KHwQAAAFxdpcPPlClT5Ovrq2bNmmnp0qVaunRpucdd6yGEAAAAzlTp8PPII49c81Z3AAAAV1fp8JOSkuLAMgAAAKpHle72AgAAqKkIPwAAwK0QfgAAgFsh/AAAALdC+AEAAG6F8AMAANwK4QcAALgVwg8AAHArhB8AAOBWCD8AAMCtEH4AAIBbIfwAAAC3QvgBAABuhfADAADcCuEHAAC4FcIPAABwK4QfAADgVgg/AADArRB+AACAWyH8AAAAt0L4AQAAboXwAwAA3ArhBwAAuBWXDj8zZsyQwWCwWlq0aGHZf+HCBY0ePVohISHy8/PToEGDlJub68SKAQCAq3Pp8CNJt956q7Kzsy3LV199Zdk3fvx4ff755/roo4+0efNmnTp1SgMHDnRitQAAwNXVcnYB11KrVi2Fh4eX2Z6fn6/Fixdr+fLluuuuuyRJS5YsUcuWLbV9+3bddtttV23TZDLJZDJZ1o1Go/0LBwAALsnle35++OEHRUREqEmTJho6dKhOnDghSdq1a5eKiorUs2dPy7EtWrRQo0aNlJ6eXmGbycnJCgwMtCyRkZEO/QwAAMB1uHT46dy5s1JSUpSamqo333xTR48e1e9//3v9+uuvysnJkZeXl4KCgqzOCQsLU05OToXtJiYmKj8/37KcPHnSkR8DAAC4EJce9kpISLD8HBsbq86dO6tx48b68MMP5evrW+V2vb295e3tbY8SAQBADePSPT9XCgoK0i233KLDhw8rPDxcFy9e1Llz56yOyc3NLXeOEAAAgFTDwk9BQYGOHDmiBg0aqH379qpdu7Y2bNhg2X/o0CGdOHFCXbp0cWKVAADAlbn0sNfEiRPVr18/NW7cWKdOndL06dPl6empwYMHKzAwUCNGjNCECRMUHBysgIAAjR07Vl26dKnwTi8AAODeXDr8/PTTTxo8eLB++eUX1a9fX7fffru2b9+u+vXrS5JeffVVeXh4aNCgQTKZTOrdu7feeOMNJ1cNAABcmUuHnxUrVlS438fHR4sWLdKiRYuqqSIAAFDT1ag5PwAAANeL8AMAANyKSw97AUC1mzGj8sf6HnNUFQAciJ4fAADgVgg/AADArRB+AACAW2HODwDAqdLS0qp8bnx8vN3qgPug5wcAALgVwg8AAHArhB8AAOBWCD8AAMCtEH4AAIBb4W4vAEDlHDtW8f6DaZVrhzu04GT0/AAAALdC+AEAAG6F8AMAANwKc34A3PiyZlT+2JA06XyUgwq5wVX2Lfe5aba1GxZvYyFAxej5AQAAboXwAwAA3ArhBwAAuBXCDwAAcCuEHwAA4FYIPwAAwK0QfgAAgFsh/AAAALdC+AEAAG6F8AMAANwKr7cAcENKS0v7v5VcG07k1RY1itW/s43i4+PtVgdqFnp+AACAW6HnBwDglq6n10ii56gmc+men+TkZHXs2FH+/v4KDQ1V//79dejQIatj4uPjZTAYrJYnn3zSSRUDAABX59LhZ/PmzRo9erS2b9+udevWqaioSL169VJhYaHVcSNHjlR2drZlmTt3rpMqBgAArs6lh71SU1Ot1lNSUhQaGqpdu3bpjjvusGyvU6eOwsPDq7s8AK5kxoyr7/M9Vl1VwBFy0xzXdli849qGy3Lpnp8r5efnS5KCg4Otti9btkw33XSTWrdurcTERJ0/f77Cdkwmk4xGo9UCAADcg0v3/PxWaWmpnn32WXXr1k2tW7e2bB8yZIgaN26siIgIZWVlafLkyTp06JA++eSTq7aVnJyspKSk6igbAAC4GIPZbDY7u4jKeOqpp/Tll1/qq6++UsOGDa963MaNG9WjRw8dPnxYTZs2LfcYk8kkk8lkWTcajYqMjFR+fr4CAgLsXjuAanDFsFfab1cY9nItUVHOrsAu4ltWsDN2RnWV4daMRqMCAwNt/vtdI3p+xowZo9WrV2vLli0VBh9J6ty5syRVGH68vb3l7e1t9zoBAIDrc+nwYzabNXbsWK1cuVJpaWmKjo6+5jmZmZmSpAYNGji6PAAOdL3PYIELO3bMPu3U4B4knkztXC4dfkaPHq3ly5dr1apV8vf3V05OjiQpMDBQvr6+OnLkiJYvX66+ffsqJCREWVlZGj9+vO644w7FxsY6uXoAAOCKXDr8vPnmm5LKptwlS5Zo+PDh8vLy0vr16zV//nwVFhYqMjJSgwYN0tSpU51QLQAAqAlcOvxcay52ZGSkNm/eXE3VAABQSVkzKt5vy8t2y4i/npOhGvacHwAAgOtF+AEAAG6F8AMAANwK4QcAALgVwg8AAHArhB8AAOBWXPpWdwA1W6WfYsvTnAFUI8IPAABVkHbQ2RWgqhj2AgAAboWeHwCAe+NFq26Hnh8AAOBW6PkBANRM9uqxsRd71EPvUbUg/ABwDb7HnF0BADfBsBcAAHArhB8AAOBWCD8AAMCtMOcHAABXUZlJ0zNm2OcYN0b4AQCgBkmr1EFXPyo+Pt4+hdRgDHsBAAC3Qs8PgApV+uWkAFBD0PMDAADcCj0/AGyTm2b/NnnAIYBqRPgBAKAmqcx/LFTwHylpH1x9n8LiK2z2RpkszbAXAABwK/T8AKg6V3uxJABUAuEHAIAbTVX/w+RgmvX6DTLMdSWGvQAAgFuh5weoAWx+1s5vJjvGt7zOi+de5/kA4GLo+QEAAG6Fnh+gmjjrSclpB51yWQBwWTdM+Fm0aJFeeukl5eTkKC4uTq+//ro6derk7LKAS8p7w7IzH+wXFeW8awOAk90Qw14ffPCBJkyYoOnTp2v37t2Ki4tT7969dfr0aWeXBgAAXIzBbDabnV3E9ercubM6duyohQsXSpJKS0sVGRmpsWPHasqUKdc832g0KjAwUPn5+QoICLBvcVkz7Nveb8U6sG1nKq+XxJnt2OPfcHOa0s5HXX87AFCdruglrvINFA76e1XVv981ftjr4sWL2rVrlxITEy3bPDw81LNnT6Wnp5d7jslkkslksqzn5+dLuvQl2l2B6drHVJUj6nUFJjt9Z/b6fuzxb/i/YhVeuHj97QBAdTpv/b9/xoIqtuOgv1eX/27b2o9T48PPmTNnVFJSorCwMKvtYWFh+u6778o9Jzk5WUlJSWW2R0ZGOqRGx5nt7AJc22xX+362ObsAAHASx/7v8a+//qrAwMBKH1/jw09VJCYmasKECZb10tJS5eXlKSQkRAaDwYmVOYbRaFRkZKROnjxp/2G9GxjfW9Xx3VUd313V8d1VTU3+3sxms3799VdFRETYdF6NDz833XSTPD09lZtr/SS23NxchYeHl3uOt7e3vL29rbYFBQU5rEZXERAQUON+sV0B31vV8d1VHd9d1fHdVU1N/d5s6fG5rMbf7eXl5aX27dtrw4YNlm2lpaXasGGDunTp4sTKAACAK6rxPT+SNGHCBA0bNkwdOnRQp06dNH/+fBUWFurRRx91dmkAAMDFeM6YYa/7gZ2ndevWCgoK0t/+9je9/PLLkqRly5apefPmTq7MdXh6eio+Pl61at0Qebfa8L1VHd9d1fHdVR3fXdW42/d2QzznBwAAoLJq/JwfAAAAWxB+AACAWyH8AAAAt0L4AQAAboXw42Z2796tu+++W0FBQQoJCdGoUaNUUFDVl7W4l++//1733XefbrrpJgUEBOj222/Xpk2bnF2WS0tLS5PBYCh32bFjh7PLqxH+85//qHPnzvL19VW9evXUv39/Z5dUI0RFRZX5nZvtcq+8cW0mk0lt2rSRwWBQZmams8uxK8KPGzl16pR69uypZs2aKSMjQ6mpqTpw4ICGDx/u7NJqhHvvvVfFxcXauHGjdu3apbi4ON17773Kyclxdmkuq2vXrsrOzrZaHn/8cUVHR6tDhw7OLs/lffzxx3r44Yf16KOPau/evdq2bZuGDBni7LJqjJkzZ1r97o0dO9bZJdUokyZNsvm1ETWGGW7j7bffNoeGhppLSkos27KyssySzD/88IMTK3N9P//8s1mSecuWLZZtRqPRLMm8bt06J1ZWs1y8eNFcv35988yZM51dissrKioy/+53vzO/9957zi6lRmrcuLH51VdfdXYZNdYXX3xhbtGihfnAgQNmSeY9e/Y4uyS7oufHjZhMJnl5ecnD4//+2X19fSVJX331lbPKqhFCQnJbojUAAAseSURBVELUvHlz/f3vf1dhYaGKi4v19ttvKzQ0VO3bt3d2eTXGZ599pl9++YWnr1fC7t279d///lceHh5q27atGjRooISEBO3fv9/ZpdUYs2fPVkhIiNq2bauXXnpJxcXFzi6pRsjNzdXIkSP1j3/8Q3Xq1HF2OQ5B+HEjd911l3JycvTSSy/p4sWLOnv2rKZMmSJJys7OdnJ1rs1gMGj9+vXas2eP/P395ePjo1deeUWpqamqV6+es8urMRYvXqzevXurYcOGzi7F5f3444+SpBkzZmjq1KlavXq16tWrp/j4eOXl5Tm5Otc3btw4rVixQps2bdITTzyhWbNmadKkSc4uy+WZzWYNHz5cTz755A09NE34uQFMmTLlqpNKLy/fffedbr31Vi1dulTz5s1TnTp1FP7/2rv3kKjSNw7g39FJTadJy2tRWpMWmVba5tbKlpPZDWEikqRmXZshooIu0B+h27hU1G4IkcF22R0zogtdtIK0m0UyRVlLtgUJWZqV2UWDnc3Mxvf3R3TorL9qbGeaqfP9wAHPezvPeRlmnnnnHE9kJAYNGoSIiAjZapCSODt3QggsXrwY4eHhqKqqwuXLl2EwGJCZmanIxNHZeXvX/fv3ceLECZhMJg9F7R2cnbvOzk4AQF5eHmbNmoXk5GQUFxdDpVLhwIEDHj4Lz+jO627FihWYOHEiEhMTsXDhQhQWFqKoqAjt7e0ePgvPcHbuioqK8Pfff2PVqlWeDtmt+HiLr8CTJ0/w7NmzD7YZPHgw/Pz8pP3m5mYEBQVBpVJBq9Vi3759mD17trtD9TrOzl1VVRUyMjLQ2toKrVYr1cXGxsJkMkkraErxKa+5NWvWoKioCA8ePECPHj3cHaLXcnbubDYb9Ho9qqqqkJqaKtWlpKQgPT0d69atc3eoXudTXndv3bx5EyNGjMCtW7cU+dxHZ+cuKysLx44dg0qlksodDgd8fX0xd+5clJSUuDvUz0IZTzD7yoWFhSEsLKxbfSIiIgAAVqsVAQEBmDx5sjtC83rOzt2LFy8AoMsKmY+Pj/QNXUm6+5oTQqC4uBg//PCDohMfwPm5S05Ohr+/P2pra6Xkp6OjA/X19YiOjnZ3mF7pU97r3rp27Rp8fHwQHh7u4qi+DM7O3ebNm7F27Vpp/+HDh5gyZQr279+PlJQUd4b4WTH5UZgtW7Zg/Pjx0Gg0OHXqFFauXIkNGzYgODjY06F5tXHjxiEkJAQ5OTlYvXo1evbsiR07duDu3buYMWOGp8PzepWVlbh79y7MZrOnQ/liaLVaLFy4EBaLBQMGDEB0dDQ2btwIAIpcpe2Oixcv4tKlS0hLS0OvXr1w8eJFLF++HPPmzeM1eh8xcOBA2b5GowEA6HS6r+paPSY/CnP58mVYLBbY7XYMGzYM27Ztg9Fo9HRYXi80NBQVFRXIy8uDXq9HR0cH4uPjceTIEYwcOdLT4Xm9P/74A+PHj8ewYcM8HcoXZePGjVCr1TAajWhra0NKSgoqKyv5Af4R/v7+2LdvHwoKCtDe3o5BgwZh+fLlWLFihadDIy/Ba36IiIhIUZR5iw8REREpFpMfIiIiUhQmP0RERKQoTH6IiIhIUZj8EBERkaIw+SEiIiJFYfJDREREisLkh4iIiBSFyQ8R0f9RUFAgPe1606ZNLh8/JiZGGv/58+cuH5+I3o/JDxH9Zz/++KP0Qf7uNnXqVE+H9p/Ex8ejqakJCxYscKp9YWEhQkJC8PLlyy51L168gFarxebNmwEA1dXVOHTokEvjJSLnMPkhIpeYOnUqmpqaZNvevXvdesxXr165dXy1Wo3IyEgEBgY61d5oNOKff/7B4cOHu9QdPHgQr169wrx58wC8ecp2nz59XBovETmHyQ8RuYS/vz8iIyNl27sP4FSpVPj9998xc+ZMBAYGIjY2FkePHpWNcePGDUybNg0ajQYREREwGo14+vSpVD9x4kQsWbIEy5YtQ2hoKKZMmQIAuHXrFlJTUxEQEIDhw4fj9OnTUKlUKCsrAwDo9XosWbJEdqwnT57Az88PZ86c6dZ5Pn/+HGazGWFhYdBqtdDr9aipqQEAhIeHIzMzE1artUs/q9UKg8HAhIfICzD5IaLP5ueff0ZWVhauX7+O6dOnY+7cuWhpaQHwJqnQ6/UYPXo0rly5goqKCjQ3NyMrK0s2RklJCfz8/GCz2bB161Y4HA4YDAYEBgbi0qVL2L59O/Ly8mR9zGYz9uzZg/b2dqls9+7d6N+/P/R6fbfOYfbs2Xj8+DHKy8tx9epVJCUlYdKkSdJ5mEwmVFZWoqGhQepz584dnD9/HiaTqVvHIiI3EURE/1FOTo7w9fUVQUFBsm3dunVSGwAiPz9f2rfb7QKAKC8vF0IIsWbNGpGRkSEbt7GxUQAQtbW1QgghJkyYIEaPHi1rU15eLtRqtWhqapLKTp06JQCI0tJSIYQQbW1tIiQkROzfv19qk5iYKAoKCt57ThaLRYwcOVJWVlVVJbRarXj58qWsXKfTiW3btgkhhHj9+rXo37+/sFgsUv1PP/0kBg4cKBwOh6zf2bNnBQDR2tr63jiIyPXUHs28iOirkZaWht9++01W9u+feBITE6W/g4KCoNVq8fjxYwBATU0Nzp49C41G02Xsuro6xMXFAQCSk5NldbW1tRgwYAAiIyOlsrFjx8raBAQEwGg0wmq1IisrC3/++Sdu3LjR5We3j6mpqYHdbkffvn1l5W1tbairqwMA+Pr6IicnBzt37oTFYoEQAiUlJcjNzYWPDxfbibwBkx8icomgoCAMGTLkg2169Ogh21epVOjs7AQA2O12ZGZm4pdffunSLyoqSnacT2E2mzFq1Cjcv38fxcXF0Ov1iI6O7tYYdrsdUVFROHfuXJe64OBg6e/58+dj/fr1qKysRGdnJxobG5Gbm/tJcROR6zH5ISKvkJSUhEOHDiEmJgZqtfNvTUOHDkVjYyOam5sREREB4M1t5P+WkJCAMWPGYMeOHdizZw+2bNnySTE+evQIarUaMTEx722n0+kwYcIEWK1WCCGQnp7e7USLiNyHa7BE5BLt7e149OiRbHv3Tq2PWbx4MVpaWpCdnY3q6mrU1dXhxIkTyM3NhcPheG+/yZMnQ6fTIScnB9evX4fNZkN+fj6ANytL7zKbzdiwYQOEEJg5c2a3zzE9PR3jxo2DwWDAyZMnUV9fjwsXLiAvLw9XrlyRtTWZTDh8+DBKS0t5oTORl2HyQ0QuUVFRgaioKNmWmprqdP9+/frBZrPB4XAgIyMDCQkJWLZsGYKDgz94rYyvry/Kyspgt9vxzTffwGw2S3d7BQQEyNpmZ2dDrVYjOzu7S50zVCoVjh8/ju+//x65ubmIi4vDnDlz0NDQIK06vTVr1iz4+/sjMDAQBoOh28ciIvdRCSGEp4MgInIlm82G1NRU3L59GzqdTiqvr6+HTqdDdXU1kpKSPjhGQUEBysrKcO3aNbfFee7cOaSlpaG1tVV2zRARuRev+SGiL15paSk0Gg1iY2Nx+/ZtLF26FN99952U+HR0dODZs2fIz8/Ht99++9HE562//voLGo0Gv/76KxYtWuTSmOPj43Hnzh2XjklEzuHKDxF98Xbt2oW1a9fi3r17CA0NRXp6OgoLC6Vb0t+usMTFxeHgwYNISEj46JgtLS3SPy4MCwtD7969XRpzQ0MDOjo6AACDBw/mbfBEnxGTHyIiIlIUftUgIiIiRWHyQ0RERIrC5IeIiIgUhckPERERKQqTHyIiIlIUJj9ERESkKEx+iIiISFGY/BAREZGi/A+cwmA9GFG9awAAAABJRU5ErkJggg==\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Mean value of HOMO energies in training set: -5.74 eV\n", - "Mean value of HOMO energies in validation set: -5.72 eV\n", - "Mean value of HOMO energies in test set: -5.77 eV\n" - ] - } - ], + "outputs": [], "source": [ "plt.hist(Y_test, bins=20, density=False, alpha=0.5, facecolor='red', label='test set')\n", "plt.hist(Y_val, bins=20, density=False, alpha=0.5, facecolor='orange', label='validation set')\n", @@ -562,7 +359,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": null, "metadata": { "id": "mFuHLTSsXc_X" }, @@ -614,7 +411,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "metadata": { "id": "-Hg5y-eaoWuw" }, @@ -751,7 +548,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", @@ -806,78 +603,7 @@ "id": "rQNzguz0oFUM", "outputId": "9f10889f-3123-4c4d-f9d6-9ec5c19c18c0" }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "GPU available: False, used: False\n", - "TPU available: False, using: 0 TPU cores\n", - "IPU available: False, using: 0 IPUs\n", - "HPU available: False, using: 0 HPUs\n", - "\n", - " | Name | Type | Params\n", - "--------------------------------------\n", - "0 | conv1_1 | Conv2d | 200 \n", - "1 | conv1_2 | Conv2d | 3.6 K \n", - "2 | conv1_3 | Conv2d | 724 \n", - "3 | pool | MaxPool2d | 0 \n", - "4 | fc1 | Linear | 197 \n", - "--------------------------------------\n", - "4.7 K Trainable params\n", - "0 Non-trainable params\n", - "4.7 K Total params\n", - "0.019 Total estimated model params size (MB)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Sanity Checking: 0it [00:00, ?it/s]torch.Size([500, 1, 29, 29])\n", - "torch.Size([1000, 1, 29, 29]) \n", - "Epoch 0: 67%|██████▋ | 16/24 [00:00<00:00, 31.42it/s, loss=9.1, v_num=2] \n", - "Validation: 0it [00:00, ?it/s]\u001b[A\n", - "Validation: 0%| | 0/8 [00:00" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", @@ -1212,356 +911,7 @@ "id": "m7AEMR2M_cMi", "outputId": "2b4a0512-ede6-42c3-9a79-ad89aa673483" }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "INFO: GPU available: False, used: False\n", - "INFO:lightning.pytorch.utilities.rank_zero:GPU available: False, used: False\n", - "INFO: TPU available: False, using: 0 TPU cores\n", - "INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores\n", - "INFO: IPU available: False, using: 0 IPUs\n", - "INFO:lightning.pytorch.utilities.rank_zero:IPU available: False, using: 0 IPUs\n", - "INFO: HPU available: False, using: 0 HPUs\n", - "INFO:lightning.pytorch.utilities.rank_zero:HPU available: False, using: 0 HPUs\n", - "INFO: \n", - " | Name | Type | Params\n", - "--------------------------------------\n", - "0 | conv1_1 | Conv2d | 200 \n", - "1 | conv1_2 | Conv2d | 3.6 K \n", - "2 | conv1_3 | Conv2d | 724 \n", - "3 | pool | MaxPool2d | 0 \n", - "4 | fc1 | Linear | 591 \n", - "--------------------------------------\n", - "5.1 K Trainable params\n", - "0 Non-trainable params\n", - "5.1 K Total params\n", - "0.021 Total estimated model params size (MB)\n", - "INFO:lightning.pytorch.callbacks.model_summary:\n", - " | Name | Type | Params\n", - "--------------------------------------\n", - "0 | conv1_1 | Conv2d | 200 \n", - "1 | conv1_2 | Conv2d | 3.6 K \n", - "2 | conv1_3 | Conv2d | 724 \n", - "3 | pool | MaxPool2d | 0 \n", - "4 | fc1 | Linear | 591 \n", - "--------------------------------------\n", - "5.1 K Trainable params\n", - "0 Non-trainable params\n", - "5.1 K Total params\n", - "0.021 Total estimated model params size (MB)\n" - ] - }, - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "da19202d546040338698cd8021ad3d20", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "Sanity Checking: | | 0/? [00:00" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", @@ -1702,7 +1032,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.12" + "version": "3.9.18" }, "widgets": { "application/vnd.jupyter.widget-state+json": { @@ -5814,5 +5144,5 @@ } }, "nbformat": 4, - "nbformat_minor": 1 + "nbformat_minor": 4 } diff --git a/AI4Spec_Tutorial3.ipynb b/AI4Spec_Tutorial3.ipynb index 01fe11e..eccb327 100644 --- a/AI4Spec_Tutorial3.ipynb +++ b/AI4Spec_Tutorial3.ipynb @@ -36,7 +36,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": { "id": "qdphsPPRwwNF" }, @@ -66,7 +66,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -74,36 +74,7 @@ "id": "jTZ_1hpqLBK5", "outputId": "86bd19b3-a735-4334-cbd0-e5416e7bc1db" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "--2024-03-04 21:47:36-- https://zenodo.org/records/10069732/files/coulomb_7000.npz\n", - "Resolving zenodo.org (zenodo.org)... 188.184.103.159, 188.184.98.238, 188.185.79.172, ...\n", - "Connecting to zenodo.org (zenodo.org)|188.184.103.159|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 47096264 (45M) [application/octet-stream]\n", - "Saving to: ‘coulomb_7000.npz.3’\n", - "\n", - "coulomb_7000.npz.3 100%[===================>] 44.91M 43.4MB/s in 1.0s \n", - "\n", - "2024-03-04 21:47:38 (43.4 MB/s) - ‘coulomb_7000.npz.3’ saved [47096264/47096264]\n", - "\n", - "--2024-03-04 21:47:38-- https://zenodo.org/records/10069732/files/spectra_7000.npz\n", - "Resolving zenodo.org (zenodo.org)... 188.184.98.238, 188.185.79.172, 188.184.103.159, ...\n", - "Connecting to zenodo.org (zenodo.org)|188.184.98.238|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 16800264 (16M) [application/octet-stream]\n", - "Saving to: ‘spectra_7000.npz.1’\n", - "\n", - "spectra_7000.npz.1 100%[===================>] 16.02M 27.7MB/s in 0.6s \n", - "\n", - "2024-03-04 21:47:39 (27.7 MB/s) - ‘spectra_7000.npz.1’ saved [16800264/16800264]\n", - "\n" - ] - } - ], + "outputs": [], "source": [ "!wget https://zenodo.org/records/10069732/files/coulomb_7000.npz\n", "!wget https://zenodo.org/records/10069732/files/spectra_7000.npz" @@ -111,7 +82,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": { "id": "6tTcc8Mtwrmd" }, @@ -140,7 +111,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -148,17 +119,7 @@ "id": "aHF8AUmF54MI", "outputId": "3517a7da-9b21-4070-e13f-8d9f99cc19e3" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Number of molecules: 7000\n", - "(7000, 29, 29)\n", - "(7000, 300)\n" - ] - } - ], + "outputs": [], "source": [ "print(\"Number of molecules:\", len(y))\n", "print(x.shape)\n", @@ -176,7 +137,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -184,89 +145,7 @@ "id": "pst07BLjt3WC", "outputId": "1d32167a-bb45-4a69-e39a-fcdf89b886e4" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[3.04832655e-50 3.06806541e-49 3.01570308e-48 2.89489880e-47\n", - " 2.71393245e-46 2.48476615e-45 2.22173804e-44 1.94008617e-43\n", - " 1.65451241e-42 1.37797038e-41 1.12080633e-40 8.90311753e-40\n", - " 6.90676183e-39 5.23272240e-38 3.87170047e-37 2.79767105e-36\n", - " 1.97429655e-35 1.36065796e-34 9.15812126e-34 6.01983591e-33\n", - " 3.86441500e-32 2.42272269e-31 1.48335295e-30 8.86964285e-30\n", - " 5.17950923e-29 2.95387378e-28 1.64519016e-27 8.94872406e-27\n", - " 4.75364755e-26 2.46611708e-25 1.24945682e-24 6.18229421e-24\n", - " 2.98743827e-23 1.40983754e-22 6.49770612e-22 2.92463671e-21\n", - " 1.28559645e-20 5.51897315e-20 2.31383714e-19 9.47388692e-19\n", - " 3.78830135e-18 1.47938690e-17 5.64208946e-17 2.10145037e-16\n", - " 7.64397590e-16 2.71544189e-15 9.42069175e-15 3.19187739e-14\n", - " 1.05616263e-13 3.41300286e-13 1.07711944e-12 3.31980474e-12\n", - " 9.99269925e-12 2.93747957e-11 8.43313118e-11 2.36442237e-10\n", - " 6.47416495e-10 1.73127204e-09 4.52137115e-09 1.15318456e-08\n", - " 2.87244641e-08 6.98763623e-08 1.66010220e-07 3.85182633e-07\n", - " 8.72825255e-07 1.93160511e-06 4.17485972e-06 8.81252655e-06\n", - " 1.81675787e-05 3.65793027e-05 7.19316555e-05 1.38151756e-04\n", - " 2.59149940e-04 4.74801636e-04 8.49668367e-04 1.48516488e-03\n", - " 2.53572527e-03 4.22911400e-03 6.89028598e-03 1.09670833e-02\n", - " 1.70546068e-02 2.59134880e-02 3.84758353e-02 5.58317942e-02\n", - " 7.91899922e-02 1.09807092e-01 1.48885504e-01 1.97443802e-01\n", - " 2.56170828e-01 3.25280562e-01 4.04388994e-01 4.92434925e-01\n", - " 5.87662791e-01 6.87677324e-01 7.89568351e-01 8.90091688e-01\n", - " 9.85881892e-01 1.07366742e+00 1.15046005e+00 1.21369831e+00\n", - " 1.26133673e+00 1.29188598e+00 1.30441906e+00 1.29856294e+00\n", - " 1.27449226e+00 1.23293286e+00 1.17517169e+00 1.10305917e+00\n", - " 1.01898532e+00 9.25812377e-01 8.26755415e-01 7.25214793e-01\n", - " 6.24577682e-01 5.28015551e-01 4.38308338e-01 3.57722671e-01\n", - " 2.87962132e-01 2.30194342e-01 1.85145792e-01 1.53243404e-01\n", - " 1.34773990e-01 1.30029832e-01 1.39410665e-01 1.63458596e-01\n", - " 2.02812115e-01 2.58077177e-01 3.29626272e-01 4.17349180e-01\n", - " 5.20390261e-01 6.36914793e-01 7.63949362e-01 8.97337122e-01\n", - " 1.03183720e+00 1.16137946e+00 1.27946313e+00 1.37966429e+00\n", - " 1.45619670e+00 1.50445740e+00 1.52148592e+00 1.50627521e+00\n", - " 1.45989189e+00 1.38539118e+00 1.28754114e+00 1.17239846e+00\n", - " 1.04679665e+00 9.17816086e-01 7.92301269e-01 6.76476230e-01\n", - " 5.75687297e-01 4.94278128e-01 4.35579837e-01 4.01982955e-01\n", - " 3.95050348e-01 4.15631312e-01 4.63945630e-01 5.39619629e-01\n", - " 6.41671100e-01 7.68453295e-01 9.17577916e-01 1.08584207e+00\n", - " 1.26918468e+00 1.46269456e+00 1.66068688e+00 1.85685766e+00\n", - " 2.04451835e+00 2.21690461e+00 2.36754512e+00 2.49066792e+00\n", - " 2.58161409e+00 2.63722198e+00 2.65614236e+00 2.63904499e+00\n", - " 2.58868297e+00 2.50979162e+00 2.40881264e+00 2.29345224e+00\n", - " 2.17209955e+00 2.05314985e+00 1.94429207e+00 1.85183097e+00\n", - " 1.78011868e+00 1.73116656e+00 1.70449494e+00 1.69725495e+00\n", - " 1.70462551e+00 1.72045175e+00 1.73805563e+00 1.75111926e+00\n", - " 1.75452489e+00 1.74503446e+00 1.72170946e+00 1.68600602e+00\n", - " 1.64152600e+00 1.59345511e+00 1.54776583e+00 1.51029852e+00\n", - " 1.48585265e+00 1.47741924e+00 1.48566489e+00 1.50874117e+00\n", - " 1.54244571e+00 1.58071046e+00 1.61634601e+00 1.64193563e+00\n", - " 1.65075386e+00 1.63758480e+00 1.59933509e+00 1.53537073e+00\n", - " 1.44755098e+00 1.33997717e+00 1.21851288e+00 1.09015746e+00\n", - " 9.62364843e-01 8.42392979e-01 7.36750948e-01 6.50784387e-01\n", - " 5.88413430e-01 5.52015072e-01 5.42427659e-01 5.59049749e-01\n", - " 6.00007200e-01 6.62367853e-01 7.42389080e-01 8.35786963e-01\n", - " 9.38015891e-01 1.04454457e+00 1.15111073e+00 1.25393486e+00\n", - " 1.34987480e+00 1.43650903e+00 1.51214600e+00 1.57576779e+00\n", - " 1.62692648e+00 1.66561768e+00 1.69215693e+00 1.70708023e+00\n", - " 1.71108118e+00 1.70498582e+00 1.68975514e+00 1.66649687e+00\n", - " 1.63646428e+00 1.60102216e+00 1.56156833e+00 1.51941175e+00\n", - " 1.47562245e+00 1.43088142e+00 1.38536501e+00 1.33869787e+00\n", - " 1.28999757e+00 1.23801709e+00 1.18136972e+00 1.11880143e+00\n", - " 1.04946289e+00 9.73130335e-01 8.90333266e-01 8.02364765e-01\n", - " 7.11173257e-01 6.19156982e-01 5.28899436e-01 4.42891995e-01\n", - " 3.63287987e-01 2.91722086e-01 2.29213393e-01 1.76153932e-01\n", - " 1.32370091e-01 9.72352617e-02 6.98083893e-02 4.89748052e-02\n", - " 3.35710311e-02 2.24822835e-02 1.47084137e-02 9.39969931e-03\n", - " 5.86764000e-03 3.57764687e-03 2.13060362e-03 1.23927728e-03\n", - " 7.04022632e-04 3.90616401e-04 2.11667822e-04 1.12020009e-04\n", - " 5.78987926e-05 2.92263096e-05 1.44081090e-05 6.93692371e-06\n", - " 3.26176376e-06 1.49783072e-06 6.71733203e-07 2.94207757e-07\n", - " 1.25844473e-07 5.25698216e-08 2.14467199e-08 8.54490087e-09\n", - " 3.32487066e-09 1.26346701e-09 4.68893659e-10 1.69944071e-10\n", - " 6.01532096e-11 2.07937425e-11 7.01984527e-12 2.31442654e-12]\n" - ] - } - ], + "outputs": [], "source": [ "print(y[500,:])" ] @@ -282,7 +161,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", @@ -291,18 +170,7 @@ "id": "NIM-5LzW5IoS", "outputId": "6e5bdeb5-76a0-4bc9-f587-6cb0c717edb5" }, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "# first we generate the energy x-axis in the range [-30, 0]eV\n", "x_ticks = np.linspace(0,-30, 300)\n", @@ -329,7 +197,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": { "id": "wmnl8ZzAwrmm" }, @@ -365,7 +233,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": { "id": "8cXiavHCuDVM" }, @@ -392,7 +260,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", @@ -401,18 +269,7 @@ "id": "t-NggJYtXGGs", "outputId": "301254b3-550f-4247-9194-961cbe50087f" }, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "plot = plt.figure(1)\n", "plot_mean_and_1std(plot, Y_train, color=\"green\", label=\"train\")\n", @@ -443,20 +300,11 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "metadata": { "id": "7GUgjaEUzqkx" }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/PUHTI_TYKKY_VU7CbXh/miniconda/envs/env1/lib/python3.7/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", - " from .autonotebook import tqdm as notebook_tqdm\n" - ] - } - ], + "outputs": [], "source": [ "from torch import nn\n", "from torch.optim import Adam\n", @@ -500,7 +348,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "metadata": { "id": "mFuHLTSsXc_X" }, @@ -598,7 +446,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", @@ -852,289 +700,6 @@ "outputId": "dc4c5416-36da-4c1d-db18-8bdea7326864" }, "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "GPU available: True (cuda), used: False\n", - "TPU available: False, using: 0 TPU cores\n", - "IPU available: False, using: 0 IPUs\n", - "HPU available: False, using: 0 HPUs\n", - "\n", - " | Name | Type | Params\n", - "--------------------------------------\n", - "0 | conv1_1 | Conv2d | 200 \n", - "1 | conv1_2 | Conv2d | 3.6 K \n", - "2 | conv1_3 | Conv2d | 724 \n", - "3 | pool | MaxPool2d | 0 \n", - "4 | fc1 | Linear | 59.1 K\n", - "--------------------------------------\n", - "63.6 K Trainable params\n", - "0 Non-trainable params\n", - "63.6 K Total params\n", - "0.255 Total estimated model params size (MB)\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Epoch 0: 80%|████████ | 32/40 [00:01<00:00, 30.22it/s, loss=0.275, v_num=6]\n", - "Validation: 0it [00:00, ?it/s]\u001b[A\n", - "Validation: 0%| | 0/8 [00:00)\n" - ] - } - ], + "outputs": [], "source": [ "X_test = torch.Tensor(X_test).reshape(-1, 29, 29).unsqueeze(1).float()\n", "Y_test = torch.Tensor(Y_test).float()\n", @@ -1200,7 +757,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" @@ -1208,15 +765,7 @@ "id": "q76RVoRjtNnc", "outputId": "3f8051f5-4526-40bb-fceb-a8858a47ff24" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "tensor(0.1033, grad_fn=)\n" - ] - } - ], + "outputs": [], "source": [ "X_train_subset = torch.Tensor(X_train)[[500,1500],:].reshape(-1, 29, 29).unsqueeze(1).float()\n", "Y_train_subset = torch.Tensor(Y_train)[[500,1500],:].float()\n", @@ -1237,7 +786,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", @@ -1246,18 +795,7 @@ "id": "DrTsiyuDBjap", "outputId": "fa29da0b-28f4-4453-e5ac-3f7e86b6181b" }, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "x_ticks = np.linspace(0,-30, 300)\n", "\n", @@ -1287,7 +825,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", @@ -1296,28 +834,7 @@ "id": "IZMqh36xGEmF", "outputId": "7a969154-e43e-455d-e68a-e105323b3497" }, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "plt.plot(x_ticks, Y_test[50,:], color=\"black\", label=\"Reference\")\n", "plt.plot(x_ticks, Y_pred_test[50,:].detach().numpy(), color=\"red\", label=\"Prediction\")\n", @@ -1356,7 +873,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.12" + "version": "3.9.18" }, "widgets": { "application/vnd.jupyter.widget-state+json": { @@ -8888,5 +8405,5 @@ } }, "nbformat": 4, - "nbformat_minor": 1 + "nbformat_minor": 4 }