Login/Sign Up for IBM Cloud: https://ibm.biz/BdfhxH/
Hands-On Guide: https://github.com/IBMDeveloperMEA/AI-Integrity-Improving-AI-models-with-Cortex-Certifai/blob/main/README.md
Slides: https://github.com/IBMDeveloperMEA/AI-Integrity-Improving-AI-models-with-Cortex-Certifai
Workshop Replay: https://www.crowdcast.io/e/integrityinai
This Repo is for the upcoming webinar AI Integrity: Improving AI models with Cortex Certifai - Register for the live stream and access the replay – https://www.crowdcast.io/e/integrityinai
Sign-up/Login to IBM Cloud - https://ibm.biz/BdfhxH/
If you are an existing user please login to IBM Cloud
And if you are not, don't worry! We have got you covered! There are 3 steps to create your account on IBM Cloud:
-
Put your email and password.
-
You get a verification link with the registered email to verify your account.
-
Fill the personal information fields. ** Please make sure you select the country you are in when asked at any step of the registration process.
Explainability of AI models is a difficult task which is made simpler by Cortex Certifai. It evaluates AI models for robustness, fairness, and explainability, and allows users to compare different models or model versions for these qualities. Certifai can be applied to any black-box model including machine learning models, predictive models and works with a variety of input datasets.
Data Scientists can create model scan definitions, which are comprised of trained models that they want to evaluate for the parameters listed below.
Performance Metric: (e.g. Accuracy)
Robustness: How the model generalizes on new data.
Fairness by group: measures the bias in the data.
Explainability: measures the explanations provided for each model.
Explanations: display the change that must occur in a dataset with given restrictions to obtain a different outcome.
Business decision makers are able to view the evaluation comparison through visualizations and scores to select the best models for business goals and to identify whether or not models meet thresholds for robustness, fairness, and/or explainability. Data Scientists can use the evaluation results for analysis to provide more trustworthy AI models.
This code pattern demonstrates how to use Certifai Toolkit for creating scans to evaluate the performance of multiple predictive models using IBM Watson Studio platform.
- Log in to Watson Studio powered by spark, initiate Cloud Object Storage, and create a project.
- Upload the .csv data file to Object Storage.
- Load the Data File in Watson Studio Notebook.
- Install Cortex Certifai Toolkit in the Watson Studio Notebook.
- Visualization for explainability and interpretability of AI Model for the three different types of Users.
-
IBM Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
-
IBM Cloud Object Storage: An IBM Cloud service that provides an unstructured cloud data store to build and deliver cost effective apps and services with high reliability and fast speed to market. This code pattern uses Cloud Object Storage.
- Artificial Intelligence: Any system which can mimic cognitive functions that humans associate with the human mind, such as learning and problem solving.
- Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.
- Analytics: Analytics delivers the value of data for the enterprise.
- Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
We can run the scan using Cortex Certifai using Watson Studio and command line interface. This code pattern demonstrates how to run the scan using Watson studio on two different machine learning techniques, Regression & Classification.
Download Certifai Toolkit
Toolkit Edition: You can signup for free use of the Certifai Toolkit on the CognitiveScale website. A download link will be provided in the confirmation email.
- Create an account with IBM Cloud
- Create a new Watson Studio project
- Add Data
- Create the notebook
- Insert the data as dataframe
- Run the notebook
- Analyze the results
Sign up for IBM Cloud. By clicking on create a free account you will get 30 days trial account.
Sign up for IBM's Watson Studio.
Click on New Project and select per below.
Define the project by giving a Name and hit 'Create'.
Clone this repo
Navigate to data/assets and save the file by name german_credit_eval.csv
on the disk. The dataset will be available under the Certifai toolkit which was downloaded in the previous step.
Click on Assets and select Browse and add the csv file from your file system.
- Open IBM Watson Studio.
- Go to the project and click on Add
- Click on
Create notebook
to create a notebook. - Select the
From URL
tab. - Enter a name for the notebook.
- Optionally, enter a description for the notebook.
- Enter this Notebook URL : https://github.com/IBM/blackbox-ai-models-explained-using-cortexcertifai/blob/main/notebooks/WS_classifier.ipynb
- Select the runtime (8 vCPU and 32GB RAM)
- Click the
Create
button.
After the notebook is imported, click on Not Trusted
and select the option as Yes to trust the source of the notebook.
This notebook has been created to demonstrate the steps for building the model using Watson Studio platform. For other usecases, the notebook has to be created from scratch.
Click on 0010 icon at the top right side which will bring up the data assets tab.
Click on Insert to code dropdown and select the option Insert Pandas Dataframe.
When a notebook is executed, what is actually happening is that each code cell in the notebook is executed, in order, from top to bottom.
Each code cell is selectable and is preceded by a tag in the left margin. The tag
format is In [x]:
. Depending on the state of the notebook, the x
can be:
- A blank, this indicates that the cell has never been executed.
- A number, this number represents the relative order this code step was executed.
- A
*
, this indicates that the cell is currently executing.
There are several ways to execute the code cells in your notebook:
- One cell at a time.
- Select the cell, and then press the
Play
button in the toolbar.
- Select the cell, and then press the
- Batch mode, in sequential order.
- From the
Cell
menu bar, there are several options available. For example, you canRun All
cells in your notebook, or you canRun All Below
, that will start executing from the first cell under the currently selected cell, and then continue executing all cells that follow.
- From the
After we run all cells in the notebook, the scan results are uploaded onto object storage which can be downloaded by following these steps. Login to IBM Cloud, navigate to Dashboard
on the left hand side and click on Storage
. Click on the bucket name which is an extension of the project name in Watson Studio and select the scan_results.csv
file for downloading it.
- Create a folder in your local file system, Download this repo into the folder and unzip it.
Please make sure that you have installed Python version 3.6 or higher.
- Open a command prompt, CD into the subfolder of notebooks and type
Jupyter Notebook
. When the notebook is launched, select the notebook by nameregressor.ipynb
and run all the cells using top down approach. - After we run the cells, the scan is complete and results are stored in the current directory of the notebook under
reports
folder. - Open a new command prompt, CD into the
reports
folder and type the commandcertifai console reports
. This will start the flask server and the UI is ready for review. - Launch the UI at http://localhost:8000/ and the scan reports along with comparitive analysis are ready for review and analysis.
Note : The scan results are dependent on the input dataset. If we change the input data, the scan results change accordingly.
As per business requirement, we can choose the best model for production deployment. The scan result files in csv format are also available for review under certifai-scan-results folder.
This code pattern will be very helpful for developers, machine learning engineers, data scientists, architects to compare multiple models and evaluate under different criteria to select the best model as per their requirement. We can also run remote scans from Red Hat Open Shift cluster provided there is a storage allocated from Amazon S3, GCP or Azure.
Click here to know about Cortex Certifai
Click here for additional information
This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.
Check the ASL FAQ link for more details
Thank you,
Sbusiso Mkhombe
Cloud Engineer, Hybrid Cloud Build Team
IBM Technology Sales