Skip to content

Commit 1487a38

Browse files
Merge pull request #53 from DAXNoobJustin/SemanticModelAudit
Add the Semantic Model Audit tool
2 parents e544cfc + dfe0582 commit 1487a38

10 files changed

+3861
-0
lines changed

tools/SemanticModelAudit/README.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# Fabric Semantic Model Audit
2+
3+
## Overview
4+
5+
This tool is designed to provide a comprehensive audit of your Fabric semantic models.
6+
7+
The tool consists of three main components:
8+
9+
1. **The Notebook:**
10+
- Captures model metadata, query logs, dependencies, unused columns, cold cache performance, and resident statistics.
11+
- Generates star schema tables (DIM_ModelObject, DIM_Model, DIM_Report, DIM_User, FACT_ModelObjectQueryCount, FACT_ModelLogs, FACT_ModelObjectStatistics) stored in a lakehouse.
12+
- Includes robust error handling, scheduling, and clean-up functions to support continuous monitoring.
13+
14+
1. **The Power BI Template (PBIT File):**
15+
- Creates an interactive report from the star schema tables generated by the notebook.
16+
- Allows you to explore model performance, usage trends, and metadata changes through intuitive visuals.
17+
- Provides a ready-to-use template that you can customize further in Power BI Desktop.
18+
19+
1. **The PowerPoint File:**
20+
- Contains the background images and design elements used in the Power BI template.
21+
22+
## Requirements
23+
24+
1. **Workspace Monitoring:**
25+
- Ensure that Workspace Monitoring is enabled in your Fabric environment.
26+
- Refer to [this blog post](https://blog.fabric.microsoft.com/blog/announcing-public-preview-of-workspace-monitoring) for setup guidance.
27+
28+
1. **Scheduled Execution:**
29+
- Schedule the notebook to run several times a day (e.g., six times) for detailed historical tracking.
30+
- Update run parameters (model names, workspaces, logging settings) at the top of the notebook.
31+
32+
1. **Lakehouse Attachment:**
33+
- Attach the appropriate Lakehouse in Fabric to store logs and historical data in Delta tables.
34+
35+
## Key Features
36+
37+
1. **Model Object & Metadata Capture:**
38+
- Retrieves and standardizes the latest columns and measures using Semantic Link and Semantic Link Labs.
39+
- Captures dependencies among model objects to get a comprehensive view of object usage.
40+
41+
1. **Query Log Collection:**
42+
- Captures both summary query counts and detailed DAX query logs.
43+
44+
1. **Unused Column Identification:**
45+
- Compares lakehouse and model metadata to identify unused columns in your model's source lakehouse.
46+
- Removing unused columns will result in greater data compression and performance.
47+
48+
1. **Cold Cache & Resident Statistics:**
49+
- Deploys a cloned model to measure cold cache performance.
50+
- Records detailed resident statistics (e.g., memory load, sizes) for each column.
51+
52+
1. **Star Schema Generation:**
53+
- Produces a set of star schema tables, making it easy to integrate with reporting tools.
54+
55+
1. **Integrated Reporting Assets:**
56+
- **Power BI Template (PBIT):** Quickly generate an interactive report from the captured data.
57+
- **PowerPoint File:** Provides background images and design elements used in the report.
58+
59+
## Why Use This Tool?
60+
61+
- **Comprehensive Auditing:**
62+
Automates the collection of historical metadata, query logs, and performance data for your Fabric semantic models.
63+
64+
- **Actionable Insights:**
65+
Identify obsolete columns, understand query performance trends, and monitor usage patterns to optimize your models.
66+
67+
- **Quickstart Reporting:**
68+
The provided PBIT file allows you to quickly start to analyze model's audit data logs.
69+
70+
- **Scalability and Flexibility:**
71+
The tool is designed to support multiple models and run at scheduled intervals, making it suitable for continuous monitoring in large-scale environments.
72+
73+
## Getting Started
74+
75+
1. Download the notebook from GitHub and upload to a Fabric workspace.
76+
![sma-upload-notebook](media/sma-upload-notebook.png)
77+
78+
1. Attach a Lakehouse that will be used to save the logs.
79+
![sma-attach-lakehouse](media/sma-attach-lakehouse.png)
80+
81+
1. Update the list of models you want to audit.
82+
![sma-define-audit-models](media/sma-define-audit-models.png)
83+
84+
1. Configure the rest of the settings in the config cell. There are a lot of options, so read carefully. 🙂
85+
![sma-configure-additional-args](media/sma-configure-additional-args.png)
86+
87+
1. Run the notebook and collect the logs. Under the collect_model_statistics() cell, you can track along with the testing if you want to understand what is happening.
88+
![sma-track-run](media/sma-track-run.png)
89+
90+
1. After the first run has finished, download the PBIT file and connect to your lakehouse.
91+
![sma-report](media/sma-report.png)
92+
93+
## Troubleshooting & Tips
94+
95+
- **Workspace Monitoring Setup:**
96+
Verify that your Fabric environment is properly configured if you experience issues with monitoring data.
97+
98+
- **Capacity Considerations:**
99+
Cold cache performance testing requires cloning and refreshing models. This feature is only recommended for Direct Lake or Import models. Ensure these operations do not impact production workloads.
100+
101+
- **Parameter Adjustments:**
102+
Customize parameters like `max_workers` and date range checking based on your model size and available resources.
103+
104+
- **Run History Cleanup:**
105+
Use the built-in functions to clean up incomplete runs or force-delete historical tables if necessary, but exercise caution as this will remove past data.
Loading
Loading
Loading
121 KB
Loading
71.9 KB
Loading
Loading

0 commit comments

Comments
 (0)