|
| 1 | +# Fabric Semantic Model Audit |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This tool is designed to provide a comprehensive audit of your Fabric semantic models. |
| 6 | + |
| 7 | +The tool consists of three main components: |
| 8 | + |
| 9 | +1. **The Notebook:** |
| 10 | + - Captures model metadata, query logs, dependencies, unused columns, cold cache performance, and resident statistics. |
| 11 | + - Generates star schema tables (DIM_ModelObject, DIM_Model, DIM_Report, DIM_User, FACT_ModelObjectQueryCount, FACT_ModelLogs, FACT_ModelObjectStatistics) stored in a lakehouse. |
| 12 | + - Includes robust error handling, scheduling, and clean-up functions to support continuous monitoring. |
| 13 | + |
| 14 | +1. **The Power BI Template (PBIT File):** |
| 15 | + - Creates an interactive report from the star schema tables generated by the notebook. |
| 16 | + - Allows you to explore model performance, usage trends, and metadata changes through intuitive visuals. |
| 17 | + - Provides a ready-to-use template that you can customize further in Power BI Desktop. |
| 18 | + |
| 19 | +1. **The PowerPoint File:** |
| 20 | + - Contains the background images and design elements used in the Power BI template. |
| 21 | + |
| 22 | +## Requirements |
| 23 | + |
| 24 | +1. **Workspace Monitoring:** |
| 25 | + - Ensure that Workspace Monitoring is enabled in your Fabric environment. |
| 26 | + - Refer to [this blog post](https://blog.fabric.microsoft.com/blog/announcing-public-preview-of-workspace-monitoring) for setup guidance. |
| 27 | + |
| 28 | +1. **Scheduled Execution:** |
| 29 | + - Schedule the notebook to run several times a day (e.g., six times) for detailed historical tracking. |
| 30 | + - Update run parameters (model names, workspaces, logging settings) at the top of the notebook. |
| 31 | + |
| 32 | +1. **Lakehouse Attachment:** |
| 33 | + - Attach the appropriate Lakehouse in Fabric to store logs and historical data in Delta tables. |
| 34 | + |
| 35 | +## Key Features |
| 36 | + |
| 37 | +1. **Model Object & Metadata Capture:** |
| 38 | + - Retrieves and standardizes the latest columns and measures using Semantic Link and Semantic Link Labs. |
| 39 | + - Captures dependencies among model objects to get a comprehensive view of object usage. |
| 40 | + |
| 41 | +1. **Query Log Collection:** |
| 42 | + - Captures both summary query counts and detailed DAX query logs. |
| 43 | + |
| 44 | +1. **Unused Column Identification:** |
| 45 | + - Compares lakehouse and model metadata to identify unused columns in your model's source lakehouse. |
| 46 | + - Removing unused columns will result in greater data compression and performance. |
| 47 | + |
| 48 | +1. **Cold Cache & Resident Statistics:** |
| 49 | + - Deploys a cloned model to measure cold cache performance. |
| 50 | + - Records detailed resident statistics (e.g., memory load, sizes) for each column. |
| 51 | + |
| 52 | +1. **Star Schema Generation:** |
| 53 | + - Produces a set of star schema tables, making it easy to integrate with reporting tools. |
| 54 | + |
| 55 | +1. **Integrated Reporting Assets:** |
| 56 | + - **Power BI Template (PBIT):** Quickly generate an interactive report from the captured data. |
| 57 | + - **PowerPoint File:** Provides background images and design elements used in the report. |
| 58 | + |
| 59 | +## Why Use This Tool? |
| 60 | + |
| 61 | +- **Comprehensive Auditing:** |
| 62 | + Automates the collection of historical metadata, query logs, and performance data for your Fabric semantic models. |
| 63 | + |
| 64 | +- **Actionable Insights:** |
| 65 | + Identify obsolete columns, understand query performance trends, and monitor usage patterns to optimize your models. |
| 66 | + |
| 67 | +- **Quickstart Reporting:** |
| 68 | + The provided PBIT file allows you to quickly start to analyze model's audit data logs. |
| 69 | + |
| 70 | +- **Scalability and Flexibility:** |
| 71 | + The tool is designed to support multiple models and run at scheduled intervals, making it suitable for continuous monitoring in large-scale environments. |
| 72 | + |
| 73 | +## Getting Started |
| 74 | + |
| 75 | +1. Download the notebook from GitHub and upload to a Fabric workspace. |
| 76 | + |
| 77 | + |
| 78 | +1. Attach a Lakehouse that will be used to save the logs. |
| 79 | + |
| 80 | + |
| 81 | +1. Update the list of models you want to audit. |
| 82 | + |
| 83 | + |
| 84 | +1. Configure the rest of the settings in the config cell. There are a lot of options, so read carefully. 🙂 |
| 85 | + |
| 86 | + |
| 87 | +1. Run the notebook and collect the logs. Under the collect_model_statistics() cell, you can track along with the testing if you want to understand what is happening. |
| 88 | + |
| 89 | + |
| 90 | +1. After the first run has finished, download the PBIT file and connect to your lakehouse. |
| 91 | + |
| 92 | + |
| 93 | +## Troubleshooting & Tips |
| 94 | + |
| 95 | +- **Workspace Monitoring Setup:** |
| 96 | + Verify that your Fabric environment is properly configured if you experience issues with monitoring data. |
| 97 | + |
| 98 | +- **Capacity Considerations:** |
| 99 | + Cold cache performance testing requires cloning and refreshing models. This feature is only recommended for Direct Lake or Import models. Ensure these operations do not impact production workloads. |
| 100 | + |
| 101 | +- **Parameter Adjustments:** |
| 102 | + Customize parameters like `max_workers` and date range checking based on your model size and available resources. |
| 103 | + |
| 104 | +- **Run History Cleanup:** |
| 105 | + Use the built-in functions to clean up incomplete runs or force-delete historical tables if necessary, but exercise caution as this will remove past data. |
0 commit comments