diff --git a/accelerators/CICD/Git-base-deployments/README.md b/accelerators/CICD/Git-base-deployments/README.md
new file mode 100644
index 0000000..db604f9
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/README.md
@@ -0,0 +1,183 @@
+# Microsoft Fabric CI/CD - Option 1 : Git based deployments
+
+
+
+
+With this option, all deployments originate from the Git repository. Each stage in the release pipeline has a dedicated primary branch (in the diagram, these stages are Dev, Test, and Prod), which feeds the appropriate workspace in Fabric.
+
+Once a PR to the Dev branch is approved and merged:
+
+1. A release pipeline is triggered to update the content of the Dev workspace. This process also can include a Build pipeline to run unit tests, but the actual upload of files is done directly from the repo into the workspace, using Fabric Git APIs. You might need to call other Fabric APIs for post-deployment operations that set specific configurations for this workspace, or ingest data.
+2. A PR is then created to the Test branch. In most cases, the PR is created using a release branch that can cherry pick the content to move into the next stage. The PR should include the same review and approval processes as any other in your team or organization.
+3. Another Build and release pipeline is triggered to update the Test workspace, using a process similar to the one described in the first step.
+4. A PR is created to Prod branch, using a process similar to the one described in step 2.
+5. Another Build and release pipeline is triggered to update the Prod workspace, using a process similar to the one described in the first step.
+
+
+
+When should you consider using option #1?
+
+
+- When you want to use your Git repo as the single source of truth, and the origin of all deployments.
+
+- When your team follows Gitflow as the branching strategy, including multiple primary branches.
+
+- The upload from the repo goes directly into the workspace, as we don’t need build environments to alter the files before deployments. You can change this by calling APIs or running items in the workspace after deployment.
+
+For more information about Microsoft Fabric CI/CD workflow options, please visit the official documentation at this page:
+https://learn.microsoft.com/en-us/fabric/cicd/manage-deployment#option-1---git--based-deployments
+
+
+## Setup
+
+
+
+
+### Prerequisites
+
+
+
+1. Create 3 Microsoft Fabric workspaces, one for each environment: DEV, TEST, PROD. Minor adjustments to the setup and scripts are necessary if additional environments are introduced.
+
+ The workspace identifiers will be required in the setup. To find a workspace id, check the url and extract the guid after "groups/" as shown in the picture below:
+
+ 
+
+
+2. Create a Microsoft Fabric workspace for controling the CI/CD process. It will contain the pre and post deployment notebook. For more information about this workspace, see the documenation in the **cicd-workspace** folder. The user running the CI/CD process should be an Admin or Member of the workspaces
+
+3. Create an azure devops project and a repository.
+
+ 3.1 In this project, create the following structure:
+
+ 
+
+ 3.2 From the folder **project-workspace** in this **git repo**, download the .py and .yml files and import them in the **pipeline-scripts** folder of your devops project/repository. In the screenshot above, the repository name is also called **project-workspace**.
+
+ 3.3 Keep the workspace folder empty. It will be used to git enable the Microsoft Fabric DEV workspace.
+
+ 3.4 Create 3 Variable Groups and their respective variables.
+
+
+
+ | Variable Group name | Variable Names | Variable Value | Comment |
+ | -------- | ------- | ------- | ------- |
+ | DynamicGroup | FeatureBranch | | |
+ | GroupDevOps | ConflictResolutionPolicy | PreferRemote | |
+ | GroupDevOps | InitializationStrategy | PreferRemote | |
+ | GroupDevOps | MappingConnectionsFileName | mapping_connections.json | This json file will hold the mapping of connections between different stages |
+ | GroupDevOps | OnelakeRolesFileName | onelake_roles.json | This json file will hold the list of roles you wish to create in the target lakehouses |
+ | GroupDevOps | OnelakeRulesFileName | onelake_rules.json | This json file will hold the rules applied to the lakehouse tables/shortcut/folders defined for a role |
+ | GroupDevOps | OnelakeEntraMembersFileName | onelake_entra_members.json | This json file will hold the Entra ID principles assigned to a role |
+ | GroupDevOps | OnelakeItemMembersFileName | onelake_item_members.json | This json file will hold the the lakehouse tables/shortcut/folders defined for a role |
+ | GroupDevOps | OrganizationName | | |
+ | GroupDevOps | ProjectName | | |
+ | GroupDevOps | RepositoryName | | |
+ | GroupDevOps | Stage1BrancheName | main | |
+ | GroupDevOps | Stage2BrancheName | | |
+ | GroupDevOps | Stage3BrancheName | | |
+ | GroupFabricWorkspaces | CiCdLakehouseId | | |
+ | GroupFabricWorkspaces | CiCdWorkspaceId | | |
+ | GroupFabricWorkspaces | Stage1WorkspaceId | | |
+ | GroupFabricWorkspaces | Stage2WorkspaceId | | |
+ | GroupFabricWorkspaces | Stage3WorkspaceId | | |
+
+
+
+ The files **mapping_connections.json**, **onelake_roles.json**, **onelake_rules.json**, **onelake_entra_members.json** and **onelake_item_members.json** are all secure files uploaded in your project, at this location: Pipeline/Library/Secure Files.
+ A helper notebook called **nb_extract_lakehouse_access.ipynb** helps you extract the onelake roles defined in your lakehouses and generate the json files in your cicd lakehouse (see CI CD Workspace - Read me). Using your Onelake explorer, you can download the files and make the required modifications.
+ The Yaml Pipelines expects the presence of these files even if empty (when you do not wish to change the connections in your Fabric items or create custom onelake roles in your target lakehouses).
+
+
+
+
+ The variable group **DynamicGroup** requires additional permission as the variable **FeatureBranch** it contains will be updated by a pipeline execution.
+ Grant the "Administrator" permission to the Build Service as show in the following screenshot:
+
+
+ 
+
+
+
+### Git enable the DEV workspace
+
+
+
+In the settings of your DEV workspace, tab Git Integration, connect your workspace to the main branch of your Azure DevOps repository. Make sure to use a folder as shown in the following screenshot.
+
+
+
+
+
+
+
+### Create a branch policy on the **main** branch
+
+
+
+This policy is required to block commits on the main branch. It is also required to get the name of source branch used in a PR.
+
+This policy is required to get the feature branch name during the automatic trigger of the **ci-get-set-feature-branch** pipeline.
+
+The following screenshot shows the branch policy setup:
+
+
+
+
+
+
+### Create the required yaml pipelines
+
+From the Pipeline tab in Azure DevOps, create the following pipelines by selecting an existing yml file.
+
+
+
+ci-get-set-feature-branch
+
+To create this pipeline, select the **ci-get-set-feature-branch.yml** file located in /main/pipeline-scripts.
+This pipeline is automatically triggered during the PR request (and before completing the PR) to merge from the feature branch to the main branch, to identify on the fly the feature branch name used by a Fabric developer.
+The feature branch name is then stored in the variable **FeatureBranch** in the variable group **DynamicGroup**. The feature branch name is required as part of the **ci-update-workspace-dev** pipeline pre and post deployment steps.
+
+PS: it is mandatory as part of this solution that the developer gives the branch a name that matches the FEATURE workspace name, as shown in the following screenshot:
+
+
+
+
+
+For a programmatical branch out experience, please check the **amazing work** done by [Nick Hurt](https://www.linkedin.com/in/nick-hurt/) here:
+
+https://github.com/microsoft/fabric-toolbox/tree/main/accelerators/CICD/Branch-out-to-new-workspace
+
+
+
+
+
+ci-update-workspace-dev
+
+To create this pipeline, select the **ci-update-workspace-dev.yml** file located in /main/pipeline-scripts.
+This pipeline is automatically triggered when the PR is completed. It will promote the content of the FEATURE workspace to the DEV workspace, and will apply a similar logic to the **cd-update-workspace-test-prod** pipeline, meaning it will run pre and post deployment steps before and after the Git update of the DEV workspace.
+
+
+
+cd-update-workspace-test-prod
+
+To create this pipeline, select the **cd-update-workspace-test-prod.yml** file located in /main/pipeline-scripts.
+This pipeline is manually triggered after a PR is made between a source and a target branch.
+
+
+
+To deploy DEV to TEST, proceed like the following:
+- Make a PR between the **main** and **test** branch
+- Manually run the pipeline by selecting the test branch as source and uncheck the **PROD** stage.
+
+
+
+To deploy TEST to PROD, proceed like the following:
+- Make a PR between the **test** and **prod** branch
+- Manually run the pipeline by selecting the prod branch as source and uncheck the **TEST** stage.
+
+The following screenshot shows the selection to deploy to TEST:
+
+
+
+
diff --git a/accelerators/CICD/Git-base-deployments/cicd-workspace/CICD_Workspace_README.md b/accelerators/CICD/Git-base-deployments/cicd-workspace/CICD_Workspace_README.md
new file mode 100644
index 0000000..ada0545
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/cicd-workspace/CICD_Workspace_README.md
@@ -0,0 +1,120 @@
+# The CI/CD workspace
+
+
+
+In this solution, the cicd-workspace folder serves as a placeholder for a collection of notebooks that execute logic within a CI/CD Fabric workspace.
+
+
+
+The following notebooks should be imported in a Fabric workspace of your choice:
+
+- nb_cicd_pre_deployment
+- nb_cicd_pre_update_lakehouses
+- nb_cicd_pre_update_warehouses
+- nb_cicd_post_deployment
+- nb_cicd_post_update_data_pipelines
+- nb_cicd_post_update_notebooks
+- nb_cicd_post_update_semantic_models
+- nb_helper
+- nb_extract_lakehouse_access
+- nb_prepare_cicd_workspace
+
+
+
+
+These notebooks will record their execution in a CI/CD lakehouse, which must be created in advance. Refer to the notebook **nb_prepare_cicd_workspace** for guidance on setting up the lakehouse.
+Additionally, they will perform pre- and post-deployment activities essential to the CI/CD process.
+The following sections provide a detailed explanation of these activities.
+
+## Pre deployment activities
+
+
+
+These activities are initiated by the YAML pipelines during the execution of the **Run pre deployment steps - Lakehouses & Warehouses** step. This step runs the **pre_deployment.py** python script, which then triggers the execution of the **nb_cicd_pre_deployment** notebook in the CI/CD workspace.
+
+
+
+The **Run pre-deployment steps - Lakehouses & Warehouses** step retrieves the appropriate inputs from variable groups based on the scenario (CI or CD). During the execution of **pre_deployment.py**, these variables are properly formatted and passed as parameters (JSON body) when triggering the **nb_cicd_pre_deployment** notebook via the Fabric REST API (**jobs/instances?jobType=RunNotebook**).
+
+
+
+The notebook **nb_cicd_pre_deployment** creates a DAG in which 2 other notebooks are consequently called using **mssparkutils.notebook.runMultiple** and in the following order of precedence:
+
+- nb_cicd_pre_update_lakehouses
+- nb_cicd_pre_update_warehouses
+
+The notebook **nb_cicd_pre_update_lakehouses** performs the following activities:
+
+- Creates the lakehouse(s) in the target workspace if required
+- Identifies managed tables, shortcuts (in the table and file sections of the lakehouse), folders, OneLake access roles, sql objects created against the SQL Analytical endpoints of the lakehouse (views, functions, stored procedures, rls related objects like security policies and predicates) in the source lakehouse.
+- Handles the seeding of tables in the target lakehouse in full or incremental mode. The incremental mode handles changes at the managed tables level (new tables, altered tables: new columns, deleted columns, altered data types )
+- Handles the creation of shortcuts, folders, security roles in the target lakehouse
+- Handles the creation of the sql objects
+
+The notebook **nb_cicd_pre_update_warehouses** performs the following activities:
+
+- Identifies changes in the source Warehouses
+- Apply the changes on the target warehouses
+
+This code executes only when an incremental change is deployed, meaning it is not required during an initial deployment.
+
+It is crucial that the pre-deployment step for the lakehouse is executed, as the subsequent step related to the Git update might fail if the Warehouse depends on the lakehouse.
+
+
+
+## Post deployment activities
+
+
+
+These activities are triggered by the YAML pipelines during the execution of the **Run post-deployment steps - Notebooks & Data Pipelines & Semantic Models/Reports** step. This step runs the **post_deployment.py** python script, which in turn triggers the execution of the **nb_cicd_post_deployment** notebook in the CI/CD workspace.
+
+The **Run post-deployment steps - Notebooks & Data Pipelines & Semantic Models/Reports** step retrieves the necessary inputs from the variables stored in different variable groups, depending on the scenario (CI or CD). These variables are properly formatted during the execution of **post_deployment.py** and passed as parameters (JSON body) when the **nb_cicd_post_deployment notebook** is executed via the Fabric REST API (**jobs/instances?jobType=RunNotebook**).
+The notebook **nb_cicd_post_deployment** creates a DAG in which 2 other notebooks are consequently called using **mssparkutils.notebook.runMultiple** and in the following order of precedence:
+
+
+
+The notebook **nb_cicd_post_deployment** creates a DAG in which 2 other notebooks are consequently called using **mssparkutils.notebook.runMultiple** and in the following order of precedence:
+
+- nb_cicd_post_update_data_pipelines
+- nb_cicd_post_update_notebooks
+- nb_cicd_post_update_semantic_models
+
+PS: configure the parallelism required for the notebook execution based on your capacity thresholds. More info about that in the official documentation:
+https://learn.microsoft.com/en-us/fabric/data-engineering/spark-job-concurrency-and-queueing
+
+
+
+
+- The notebook **nb_cicd_post_update_data_pipelines** performs the following activities: iterates over the existing list of data factory pipelines in the target workspace and changes the connections in each of them based on the mapping provided (source connection -> target connection).
+- The notebook **nb_cicd_post_update_notebooks** performs the following activities: iterates over the existing list of notebook in the target workspace and changes the default lakehouse and known warehouses in the notebook definition.
+- The notebook **nb_cicd_post_update_semantic_models** performs the following activities: iterates over the existing list of semantic models in the target workspace and changes the direct lake connection (when the semantic model is a default or custom semantic model with a direct lake mode), or changes the connections based on the mapping provided (source connection -> target connection) if the semantic model uses Direct Query or Import mode.
+
+Each notebook performs the required activity only if at least one item of the required type is present in the target workspace.
+
+Without the post deployment activities, the items mentioned above will point to the lower environments (sql connections, lakehouses, warehouses, etc..).
+
+
+## Helper notebooks
+
+- The **nb_helper** notebook contains a set of funtions required during the execution of the pre and post deployment notebooks listed above.
+
+- The **nb_prepare_cicd_workspace** notebook can help setting up the CI/CD workspace and rebind the notebooks listed above to the CI/CD lakehouse. The steps described in the notebook can be performed manually.
+
+- The **nb_extract_lakehouse_access** notebook can help extracting onelake roles defined in source lakehouses (DEV workspace), by generating 4 json files: onelake_roles.json, onelake_rules.json, onelake_entra_members.json, onelake_item_members.json. These files can be used as templates for customized roles in higher environment (TEST & PROD workspaces)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_post_deployment.ipynb b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_post_deployment.ipynb
new file mode 100644
index 0000000..aeda924
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_post_deployment.ipynb
@@ -0,0 +1 @@
+{"cells":[{"cell_type":"markdown","source":["**Libraries**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"c2ee3b3a-f369-4f98-8e88-a04e999870cf"},{"cell_type":"code","source":["import pandas as pd\n","from datetime import datetime, timedelta\n","import sempy.fabric as fabric"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"151a5189-710d-4f0a-ad72-5ad9c422c817"},{"cell_type":"markdown","source":["**Define a logging dataframe**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"4788f8ff-57e3-44dc-82d2-225b1f99ba38"},{"cell_type":"code","source":["dfLogging = pd.DataFrame(columns = ['LoadId','NotebookId', 'NotebookName', 'WorkspaceId', 'CellId', 'Timestamp', 'ElapsedTime', 'Message', 'ErrorMessage'])\n","vContext = mssparkutils.runtime.context\n","vNotebookId = vContext[\"currentNotebookId\"]\n","vLogNotebookName = vContext[\"currentNotebookName\"]\n","vWorkspaceId = vContext[\"currentWorkspaceId\"] # where the notebook is running, to not confuse with source and target workspaces"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"121fc594-15d2-47d2-b4d0-f9609f50791b"},{"cell_type":"markdown","source":["**Parameters --> convert to code for debugging the notebook. otherwise, keep commented as parameters are passed from DevOps pipelines**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"f8421d0a-f588-4f23-bd13-4696e6f2fcc5"},{"cell_type":"markdown","source":["pSourceWorkspaceId = \"\"\n","pTargetWorkspaceId = \"\"\n","pTargetStage = \"Stage1\"\n","pDebugMode = \"yes\"\n","pTimeoutPerCellInSeconds = \"600\"\n","pTimeoutInSeconds = \"900\"\n","pProjectName = \"fabric-cicd\"\n","pFeatureBranch = \"NA\"\n","pMappingConnections = ''"],"metadata":{"nteract":{"transient":{"deleting":false}}},"id":"22da0c79-bfea-43ea-9d57-583771c4f22f"},{"cell_type":"markdown","source":["**Check if the source workspace passed from DevOps equals the feature branch name**\n","- This is a specific handling when a PR is done from the feature branch"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"331987f9-7061-4338-8e4a-934f0974fb5a"},{"cell_type":"code","source":["if pSourceWorkspaceId == pFeatureBranch:\n"," pSourceWorkspaceId = fabric.resolve_workspace_id(workspace=pFeatureBranch)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"209a146b-fa1c-41bc-b05a-e258608ae3b9"},{"cell_type":"markdown","source":["**Define the DAG**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"6a8b6198-1624-4008-9f6d-a4133b4650ab"},{"cell_type":"code","source":["dagList = []\n","\n","# add to the DAG list nb_cicd_post_update_data_pipelines\n","dagList.append({\n"," \"name\": \"nb_cicd_post_update_data_pipelines\",\n"," \"path\": \"nb_cicd_post_update_data_pipelines\",\n"," \"timeoutPerCellInSeconds\": int(pTimeoutPerCellInSeconds),\n"," \"args\": {\n"," \"useRootDefaultLakehouse\": True,\n"," \"pSourceWorkspaceId\":pSourceWorkspaceId,\n"," \"pTargetWorkspaceId\":pTargetWorkspaceId,\n"," \"pTargetStage\":pTargetStage,\n"," \"pDebugMode\":pDebugMode,\n"," \"pProjectName\":pProjectName,\n"," \"pMappingConnections\": pMappingConnections\n"," }\n"," })\n","\n","# add to the DAG list nb_cicd_post_update_notebooks\n","dagList.append({\n"," \"name\": \"nb_cicd_post_update_notebooks\",\n"," \"path\": \"nb_cicd_post_update_notebooks\",\n"," \"timeoutPerCellInSeconds\": int(pTimeoutPerCellInSeconds),\n"," \"args\": {\n"," \"useRootDefaultLakehouse\": True,\n"," \"pSourceWorkspaceId\":pSourceWorkspaceId,\n"," \"pTargetWorkspaceId\":pTargetWorkspaceId,\n"," \"pDebugMode\":pDebugMode\n"," }\n"," })\n","\n","# add to the DAG list nb_cicd_post_update_semantic_models\n","dagList.append({\n"," \"name\": \"nb_cicd_post_update_semantic_models\",\n"," \"path\": \"nb_cicd_post_update_semantic_models\",\n"," \"timeoutPerCellInSeconds\": int(pTimeoutPerCellInSeconds),\n"," \"args\": {\n"," \"useRootDefaultLakehouse\": True,\n"," \"pSourceWorkspaceId\":pSourceWorkspaceId,\n"," \"pTargetWorkspaceId\":pTargetWorkspaceId,\n"," \"pTargetStage\":pTargetStage,\n"," \"pDebugMode\":pDebugMode,\n"," \"pProjectName\":pProjectName,\n"," \"pMappingConnections\": pMappingConnections\n"," }\n"," })\n","\n","DAG = { \"activities\": dagList,\"concurrency\": 1, \"timeoutInSeconds\": int(pTimeoutInSeconds) }\n"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"cb40613c-1d2b-4edf-8d1e-de25de315fec"},{"cell_type":"markdown","source":["**Run multiple**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"3345dfe0-84c7-4c33-9742-b726e28ef82e"},{"cell_type":"code","source":["try:\n"," mssparkutils.notebook.runMultiple(DAG, {\"displayDAGViaGraphviz\": True})\n","\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'running the DAG', datetime.now(), None, vMessage, ''] \n","\n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'running the DAG', datetime.now(), None, vMessage, str(e)]\n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"21fdc174-a457-4bf3-9b38-5fb7c8d8a68d"},{"cell_type":"markdown","source":["**Logging**"],"metadata":{"nteract":{"transient":{"deleting":false}}},"id":"89352377-24c6-4f57-9ba1-18532e7587ba"},{"cell_type":"code","source":["try:\n"," # perform the conversion of columns\n"," dfLogging = dfLogging.astype({\n"," \"LoadId\": \"string\",\t\n"," \"NotebookId\": \"string\", \t\n"," \"NotebookName\": \"string\", \n"," \"WorkspaceId\": \"string\", \n"," \"CellId\": \"string\", \n"," \"Timestamp\": \"datetime64[ns]\", \n"," \"ElapsedTime\": \"string\", \n"," \"Message\": \"string\", \n"," \"ErrorMessage\" : \"string\"\n"," })\n","\n"," # save panda dataframe to a spark dataframe \n"," sparkDF_Logging = spark.createDataFrame(dfLogging) \n","\n"," # save to the lakehouse\n"," sparkDF_Logging.write.mode(\"append\").format(\"delta\").option(\"mergeSchema\", \"true\").saveAsTable(\"staging.notebook_logging\")\n","\n","except Exception as e:\n"," vMessage = \"saving logs to the lakehouse failed\"\n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"2daf311c-d307-4ddd-83eb-0d7853214cc8"},{"cell_type":"markdown","source":["**Exit notebook**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"4443d25b-5e32-4a66-baf0-4e6da76595ef"},{"cell_type":"code","source":["mssparkutils.notebook.exit(f\"Notebook <{vLogNotebookName}> run successfully. Check logging table in CI/CD lakehouse for more details.\")"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"e97c7158-a8fa-4913-84a0-024246cf9275"}],"metadata":{"language_info":{"name":"python"},"kernelspec":{"name":"synapse_pyspark","language":"Python","display_name":"Synapse PySpark"},"kernel_info":{"name":"synapse_pyspark"},"widgets":{},"microsoft":{"language":"python","language_group":"synapse_pyspark","ms_spell_check":{"ms_spell_check_language":"en"}},"nteract":{"version":"nteract-front-end@1.0.0"},"synapse_widget":{"version":"0.1","state":{}},"spark_compute":{"compute_id":"/trident/default","session_options":{"conf":{"spark.synapse.nbs.session.timeout":"1200000"}}},"dependencies":{"lakehouse":{},"environment":{}}},"nbformat":4,"nbformat_minor":5}
\ No newline at end of file
diff --git a/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_post_update_data_pipelines.ipynb b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_post_update_data_pipelines.ipynb
new file mode 100644
index 0000000..ba43989
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_post_update_data_pipelines.ipynb
@@ -0,0 +1 @@
+{"cells":[{"cell_type":"markdown","source":["**Helper notebook**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"2b41137c-c931-4ab3-8164-7011a881004a"},{"cell_type":"code","source":["%run nb_helper"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"},"jupyter":{"outputs_hidden":true}},"id":"45df1b7b-aa26-459d-a5fe-b8e435704198"},{"cell_type":"markdown","source":["**Define a logging dataframe**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"37b323db-d161-430e-99f7-f4f538fad214"},{"cell_type":"code","source":["dfLogging = pd.DataFrame(columns = ['LoadId','NotebookId', 'NotebookName', 'WorkspaceId', 'SourceWorkspaceName','TargetWorkspaceName','Item', 'CellId', 'Timestamp', 'ElapsedTime', 'Message', 'ErrorMessage'])\n","vContext = mssparkutils.runtime.context\n","vNotebookId = vContext[\"currentNotebookId\"]\n","vLogNotebookName = vContext[\"currentNotebookName\"]\n","vWorkspaceId = vContext[\"currentWorkspaceId\"] # where the notebook is running, to not confuse with source and target workspaces"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"fc585dc4-692b-4741-a978-b38f890273c3"},{"cell_type":"markdown","source":["**Parameters --> convert to code for debugging the notebook. otherwise, keep commented as parameters are passed from DevOps pipelines**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"8737f968-0f8f-4547-b548-c63dfd99b7ca"},{"cell_type":"markdown","source":["pSourceWorkspaceId = \"\"\n","pTargetWorkspaceId = \"\"\n","pTargetStage = \"Stage2\"\n","pDebugMode = \"yes\"\n","pProjectName = \"fabric-cicd\"\n","pMappingConnections = ''"],"metadata":{"nteract":{"transient":{"deleting":false}}},"id":"eed9f97f-a570-45cf-b270-334335d9adc7"},{"cell_type":"markdown","source":["**Resolve source and target workspace**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"957716f1-81f3-4f78-8a73-a97f58788db0"},{"cell_type":"code","source":["vSourceWorkspaceName = fabric.resolve_workspace_name(pSourceWorkspaceId)\n","vTargetWorkspaceName = fabric.resolve_workspace_name(pTargetWorkspaceId)\n","vSourceWorkspaceId = pSourceWorkspaceId\n","vTargetWorkspaceId = pTargetWorkspaceId"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"67120adb-033f-48b1-9476-928593c6090d"},{"cell_type":"markdown","source":["**List of data pipelines in source workspace**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"a912284e-2e7e-4f88-bfa3-87cf210e38c4"},{"cell_type":"code","source":["df_source_data_pipelines = labs.list_data_pipelines(workspace=vSourceWorkspaceName)"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"5c0fc35f-73f8-450a-836c-cc29f4e033d5"},{"cell_type":"markdown","source":["**Verify that there is a least one data pipeline in the source workspace**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"71feb824-6baa-4336-a707-761ac07697a6"},{"cell_type":"code","source":["if df_source_data_pipelines.empty:\n"," vMessage = f\"workspace have 0 data pipeline. post-update is not required.\"\n","\n"," # Display an exit message\n"," display(Markdown(\"### ✅ Notebook execution stopped successfully!\"))\n","\n"," # Exit without error\n"," mssparkutils.notebook.exit(vMessage)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"f0fc4a5c-5a55-40d4-825a-8e5bd5e0554f"},{"cell_type":"markdown","source":["**Get the connections mapping between Stages and list existing fabric connections**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"75302767-6169-4f18-a3ff-41ee98a6f600"},{"cell_type":"code","source":["# get the mapping of connections between stages\n","mapping_connections_json = json.loads(pMappingConnections)\n","df_mapping_connections = pd.DataFrame(mapping_connections_json)\n","\n","# get the list of existing connections in the tenant. the list will be used for lookups \n","df_existing_connections = labs.list_connections()"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"},"collapsed":false},"id":"8df6f768-a06c-468f-8bd8-e5bb402f32fd"},{"cell_type":"markdown","source":["**Functions**\n","- validate_stage_connection_id\n","- find_connection_id\n","- update_pipeline_connections\n","- update_linked_services\n","- update_notebooks\n","- update_fabric_pipelines\n","- update_semantic_models\n","- update_data_pipeline_definition"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"b402021a-292c-4e01-bc2d-751b1c5f6367"},{"cell_type":"code","source":["def validate_stage_connection_id(connectionId):\n","\n"," if connectionId in df_existing_connections['Connection Id'].values:\n"," vMessage = f\"connection id <{connectionId}> is valid>\"\n"," print(f\"{vMessage}\") \n"," vConnectionValidation = \"valid\"\n"," else:\n"," vMessage = f\"connection id <{connectionId}> is unvalid>\"\n"," print(f\"{vMessage}\") \n"," vConnectionValidation = \"unvalid\"\n"," return vConnectionValidation"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"a57af11e-f747-41e8-94a4-6b7385ee3629"},{"cell_type":"code","source":["# function to find a connection id based on the target stage\n","# the csv provided with the mapping between stages is used, with the assomption of 4 stages maximun (dev, test, uat, prod)\n","def find_connection_id(devConnectionId, targetStage):\n","\n"," global df_mapping_connections\n","\n"," vMessage = f\"dev connection id is <{devConnectionId}>\"\n"," print(f\"{vMessage}\") \n","\n"," # filter the DataFrame based on a condition\n"," df_mapping_connections_filtered = df_mapping_connections[(df_mapping_connections['ConnectionStage1'] == devConnectionId)]\n","\n"," # extract the value of a target connection id\n"," # if the target connection cannot be found assign it the dev connection to avoid breaking the json definition of the pipeline\n"," if not df_mapping_connections_filtered.empty:\n","\n"," first_row = df_mapping_connections_filtered.iloc[0] # Get the first matching row\n","\n"," if targetStage == \"Stage2\":\n"," targetConnectionId = first_row[\"ConnectionStage2\"]\n","\n"," elif targetStage == \"Stage3\":\n"," targetConnectionId = first_row[\"ConnectionStage3\"]\n"," else:\n"," targetConnectionId = first_row[\"ConnectionStage4\"]\n","\n"," # if the stage column in the mapping has no value, assing NA\n"," targetConnectionId = \"NA\" if pd.isna(targetConnectionId) or targetConnectionId == \"\" else targetConnectionId\n","\n"," # validate that the stage connection exists\n"," vConnectionValidation = validate_stage_connection_id(targetConnectionId)\n","\n"," # if the validation of the connection fails , keep the dev connection\n"," if vConnectionValidation == \"unvalid\":\n"," targetConnectionId = devConnectionId\n","\n"," else:\n"," \n"," vMessage = f\"no valid connection found in the mapping matching the condition, source connection will be kept\"\n"," print(f\"{vMessage}\") \n","\n"," # assign the dev connection to the target connection\n"," targetConnectionId = devConnectionId\n","\n","\n"," # return the found values\n"," return targetConnectionId"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"e58d08ae-8d9b-450f-8e52-8a03323662dd"},{"cell_type":"code","source":["# function to parse the json of the pipeline and update connections\n","def update_pipeline_connections(obj, stage):\n","\n"," if isinstance(obj, dict):\n"," for key, value in obj.items():\n"," # if the key is a connection\n"," if key == \"connection\":\n"," \n"," # find the dev connection id (Stage1) \n"," devConnectionId = value\n","\n"," # lookup the requested stage connection id\n"," targetConnectionId = find_connection_id(devConnectionId = devConnectionId, targetStage=stage)\n","\n"," obj[key] = targetConnectionId\n"," else:\n"," update_pipeline_connections(value, stage)\n"," \n"," elif isinstance(obj, list):\n"," for item in obj:\n"," update_pipeline_connections(item, stage)\n"," \n"," # return pl_json"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"5ffe41db-bbc3-42ff-b7e2-50d1e5624ff9"},{"cell_type":"code","source":["# function to parse the json of the pipeline and update the Warehouse and Lakehouse linked services\n","def update_linked_services(obj):\n","\n"," if isinstance(obj, dict): # If the object is a dictionary\n","\n"," if \"linkedService\" in obj and isinstance(obj[\"linkedService\"], dict):\n"," properties = obj[\"linkedService\"].get(\"properties\", {})\n"," \n"," if properties.get(\"type\") == \"DataWarehouse\":\n"," \n"," type_properties = properties.get(\"typeProperties\", {})\n","\n"," # get the source values\n"," source_artifactId = type_properties.get(\"artifactId\", \"Not Found\")\n"," source_workspaceId = type_properties.get(\"workspaceId\", \"Not Found\")\n"," source_endpoint = type_properties.get(\"endpoint\", \"Not Found\")\n","\n"," # get the target values \n"," source_artifact_name = fabric.resolve_item_name(item_id=source_artifactId, workspace=vSourceWorkspaceId)\n"," target_artifact_id = fabric.resolve_item_id(item_name=source_artifact_name, type='Warehouse', workspace=vTargetWorkspaceId)\n"," artifact_url = f\"v1/workspaces/{vTargetWorkspaceId}/warehouses/{target_artifact_id}\"\n"," response = client.get(artifact_url)\n"," target_endpoint = response.json()['properties']['connectionString']\n"," target_values = {\n"," \"endpoint\": f\"{target_endpoint}\",\n"," \"artifactId\": f\"{target_artifact_id}\",\n"," \"workspaceId\": f\"{vTargetWorkspaceId}\"\n"," }\n","\n"," # update the properties using the target values\n"," type_properties[\"endpoint\"] = target_values[\"endpoint\"]\n"," type_properties[\"artifactId\"] = target_values[\"artifactId\"]\n"," type_properties[\"workspaceId\"] = target_values[\"workspaceId\"]\n","\n"," if properties.get(\"type\") == \"Lakehouse\":\n"," \n"," type_properties = properties.get(\"typeProperties\", {})\n","\n"," # get the source values\n"," source_artifactId = type_properties.get(\"artifactId\", \"Not Found\")\n"," source_workspaceId = type_properties.get(\"workspaceId\", \"Not Found\")\n","\n","\n"," # get the target values \n"," source_artifact_name = fabric.resolve_item_name(item_id = source_artifactId, workspace=vSourceWorkspaceId)\n"," target_artifact_id = fabric.resolve_item_id(item_name = source_artifact_name, type='Lakehouse', workspace=vTargetWorkspaceId)\n"," target_values = {\n"," \"artifactId\": f\"{target_artifact_id}\",\n"," \"workspaceId\": f\"{vTargetWorkspaceId}\"\n"," }\n","\n"," # update the properties using the target values\n"," type_properties[\"artifactId\"] = target_values[\"artifactId\"]\n"," type_properties[\"workspaceId\"] = target_values[\"workspaceId\"]\n"," \n"," # Recursively search all keys in the dictionary\n"," for key in obj:\n"," update_linked_services(obj[key])\n"," \n"," elif isinstance(obj, list): # If the object is a list, iterate over elements\n"," for item in obj:\n"," update_linked_services(item)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"876f80b0-5c93-431e-9fb8-2293f7a682d2"},{"cell_type":"code","source":["# function to parse the json of the pipeline and update notebooks\n","def update_notebooks(obj):\n"," if isinstance(obj, dict): # If the object is a dictionary\n"," if obj.get(\"type\") == \"TridentNotebook\":\n"," type_properties = obj.get(\"typeProperties\", {})\n","\n"," # get the source values\n"," source_notebook_id = type_properties.get(\"notebookId\", \"Not Found\")\n"," vSourceWorkspaceId = type_properties.get(\"workspaceId\", \"Not Found\")\n","\n"," # get the target values \n"," source_notebook_name = fabric.resolve_item_name(item_id=source_notebook_id, workspace=vSourceWorkspaceId)\n"," target_notebook_id = fabric.resolve_item_id(item_name=source_notebook_name, type='Notebook', workspace=vTargetWorkspaceId)\n"," target_values = {\n"," \"notebookId\": f\"{target_notebook_id}\",\n"," \"workspaceId\": f\"{vTargetWorkspaceId}\"\n"," }\n","\n"," # update the properties using the target values\n"," type_properties[\"notebookId\"] = target_values[\"notebookId\"]\n"," type_properties[\"workspaceId\"] = target_values[\"workspaceId\"]\n","\n"," # Recursively search all keys in the dictionary\n"," for key in obj:\n"," update_notebooks(obj[key])\n","\n"," elif isinstance(obj, list): # If the object is a list, iterate over elements\n"," for item in obj:\n"," update_notebooks(item)\n","\n"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"78be19ba-2fdf-4a02-85e9-46c5c0c9324b"},{"cell_type":"code","source":["# function to parse the json of the pipeline and update invoked fabric pipelines\n","def update_fabric_pipelines(obj):\n"," if isinstance(obj, dict): # If the object is a dictionary\n"," if obj.get(\"type\") == \"InvokePipeline\":\n"," type_properties = obj.get(\"typeProperties\", {})\n","\n"," # get the source values\n"," operation_type = type_properties.get(\"operationType\", \"Not Found\")\n","\n"," if operation_type == \"InvokeFabricPipeline\":\n"," source_pipeline_id = type_properties.get(\"pipelineId\", \"Not Found\")\n"," vSourceWorkspaceId = type_properties.get(\"workspaceId\", \"Not Found\")\n","\n"," # get the target values \n"," source_pipeline_name = fabric.resolve_item_name(item_id=source_pipeline_id, workspace=vSourceWorkspaceId)\n"," target_pipeline_id = fabric.resolve_item_id(item_name=source_pipeline_name, type='DataPipeline', workspace=vTargetWorkspaceId)\n"," target_values = {\n"," \"pipelineId\": f\"{target_pipeline_id}\",\n"," \"workspaceId\": f\"{vTargetWorkspaceId}\"\n"," }\n","\n"," # update the properties using the target values\n"," type_properties[\"pipelineId\"] = target_values[\"pipelineId\"]\n"," type_properties[\"workspaceId\"] = target_values[\"workspaceId\"]\n","\n"," # Recursively search all keys in the dictionary\n"," for key in obj:\n"," update_fabric_pipelines(obj[key])\n","\n"," elif isinstance(obj, list): # If the object is a list, iterate over elements\n"," for item in obj:\n"," update_fabric_pipelines(item)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"9c7eb454-f25c-4317-9c7c-a7d0ebb8dab1"},{"cell_type":"code","source":["# function to parse the json of the pipeline and update semantic models\n","def update_semantic_models(obj):\n"," if isinstance(obj, dict): # If the object is a dictionary\n"," if obj.get(\"type\") == \"PBISemanticModelRefresh\":\n"," type_properties = obj.get(\"typeProperties\", {})\n","\n"," # get the source values\n"," operation_type = type_properties.get(\"operationType\", \"Not Found\")\n","\n"," source_dataset_id = type_properties.get(\"datasetId\", \"Not Found\")\n"," vSourceWorkspaceId = type_properties.get(\"groupId\", \"Not Found\")\n","\n"," # get the target values \n"," source_dataset_name = fabric.resolve_item_name(item_id=source_dataset_id, workspace=vSourceWorkspaceId)\n"," target_dataset_id = fabric.resolve_item_id(item_name=source_dataset_name, type='SemanticModel', workspace=vTargetWorkspaceId)\n"," target_values = {\n"," \"datasetId\": f\"{target_dataset_id}\",\n"," \"groupId\": f\"{vTargetWorkspaceId}\"\n"," }\n","\n"," # update the properties using the target values\n"," type_properties[\"datasetId\"] = target_values[\"datasetId\"]\n"," type_properties[\"groupId\"] = target_values[\"groupId\"]\n","\n"," # Recursively search all keys in the dictionary\n"," for key in obj:\n"," update_semantic_models(obj[key])\n","\n"," elif isinstance(obj, list): # If the object is a list, iterate over elements\n"," for item in obj:\n"," update_semantic_models(item)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"e58bca5a-c9f9-4048-b111-bf652e497dcd"},{"cell_type":"code","source":["# function to update the data pipeline definition\n","def update_data_pipeline_definition(\n"," name: str, pipelineContent: dict, workspace: Optional[str] = None\n","):\n"," \"\"\"\n"," Updates an existing data pipeline with a new definition.\n","\n"," Parameters\n"," ----------\n"," name : str\n"," The name of the data pipeline.\n"," pipelineContent : dict\n"," The data pipeline content (not in Base64 format).\n"," workspace : str, default=None\n"," The name of the workspace.\n"," Defaults to None which resolves to the workspace of the attached lakehouse\n"," or if no lakehouse attached, resolves to the workspace of the notebook.\n"," \"\"\"\n","\n"," # resolve the workspace name and id\n"," (vWorkspace, vWorkspaceId) = resolve_workspace_name_and_id(workspace)\n","\n"," # get the pipeline payload\n"," vPipelinePayload = base64.b64encode(json.dumps(pipelineContent).encode('utf-8')).decode('utf-8')\n"," \n"," # resolve the pipeline id\n"," vPipelineId = fabric.resolve_item_id(item_name=name, type=\"DataPipeline\", workspace=vWorkspace)\n","\n"," # prepare the request body\n"," vRequestBody = {\n"," \"definition\": {\n"," \"parts\": [\n"," {\n"," \"path\": \"pipeline-content.json\",\n"," \"payload\": vPipelinePayload,\n"," \"payloadType\": \"InlineBase64\"\n"," }\n"," ]\n"," }\n"," }\n","\n"," # response\n"," vResponse = client.post(\n"," f\"v1/workspaces/{vWorkspaceId}/items/{vPipelineId}/updateDefinition\",\n"," json=vRequestBody,\n"," )\n","\n"," lro(client, vResponse, return_status_code=True)\n","\n"," print(f\"{icons.green_dot} The '{name}' pipeline was updated within the '{vWorkspace}' workspace.\")\n"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"90bc6df9-4d7e-4145-9aca-6ec60ebf0a92"},{"cell_type":"markdown","source":["**Replacement of linked services, connections, notebooks, fabric pipelines, etc..**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"b0b6f71a-0e44-4470-be6e-0bed5c50a0d3"},{"cell_type":"code","source":["\n","# get the list of data pipelines in the target workspace\n","df_pipeline = labs.list_data_pipelines(vTargetWorkspaceName)\n","\n","# iterate over the data pipelines\n","for index, row in df_pipeline.iterrows():\n","\n"," vPipelineName = row['Data Pipeline Name']\n","\n"," # retrieve the pipeline json definition\n"," vPipelineJson = json.loads(labs.get_data_pipeline_definition(vPipelineName, vSourceWorkspaceName))\n"," # print(json.dumps(vPipelineJson, indent=4))\n","\n","\n"," # update linked services\n"," try:\n"," update_linked_services(vPipelineJson.get(\"properties\", {}).get(\"activities\", []))\n"," \n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vPipelineName, 'update linked services', datetime.now(), None, vMessage, ''] \n"," except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vPipelineName, 'update linked services', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))\n"," \n","\n"," # update connections\n"," try:\n"," update_pipeline_connections(vPipelineJson, pTargetStage)\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vPipelineName, 'update connections', datetime.now(), None, vMessage, ''] \n"," except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vPipelineName, 'update connections', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))\n","\n"," # update notebooks\n"," try:\n"," update_notebooks(vPipelineJson)\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vPipelineName, 'update notebooks', datetime.now(), None, vMessage, ''] \n"," except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vPipelineName, 'update notebooks', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))\n","\n"," # update fabric pipeline \n"," try:\n"," update_fabric_pipelines(vPipelineJson)\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vPipelineName, 'update fabric pipeline', datetime.now(), None, vMessage, ''] \n"," except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vPipelineName, 'update fabric pipeline', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))\n","\n"," # update semantic models\n"," try:\n"," update_semantic_models(vPipelineJson)\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vPipelineName, 'update semantic models', datetime.now(), None, vMessage, ''] \n"," except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vPipelineName, 'update semantic models', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))\n","\n"," # update pipeline definition\n"," try:\n"," update_data_pipeline_definition(name=vPipelineName,pipelineContent=vPipelineJson, workspace=vTargetWorkspaceName)\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vPipelineName, 'update pipeline definition', datetime.now(), None, vMessage, ''] \n"," except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vPipelineName, 'update pipeline definition', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))\n","\n"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"773ca126-b15d-496d-b1b4-4796c665a7a1"},{"cell_type":"markdown","source":["**Logging**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"451e555d-94ce-4881-8d35-56da120dee20"},{"cell_type":"code","source":["try:\n"," # perform the conversion of columns\n"," dfLogging = dfLogging.astype({\n"," \"LoadId\": \"string\",\t\n"," \"NotebookId\": \"string\", \t\n"," \"NotebookName\": \"string\", \n"," \"WorkspaceId\": \"string\", \n"," \"CellId\": \"string\", \n"," \"Timestamp\": \"datetime64[ns]\", \n"," \"ElapsedTime\": \"string\", \n"," \"Message\": \"string\", \n"," \"ErrorMessage\" : \"string\"\n"," })\n","\n"," # save panda dataframe to a spark dataframe \n"," sparkDF_Logging = spark.createDataFrame(dfLogging) \n","\n"," # save to the lakehouse\n"," sparkDF_Logging.write.mode(\"append\").format(\"delta\").option(\"mergeSchema\", \"true\").saveAsTable(\"staging.notebook_logging_cicd\")\n","\n","except Exception as e:\n"," vMessage = \"saving logs to the lakehouse failed\"\n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"10f47229-de50-4df4-869b-21d4c9b0f489"}],"metadata":{"language_info":{"name":"python"},"kernelspec":{"name":"synapse_pyspark","language":"Python","display_name":"Synapse PySpark"},"kernel_info":{"name":"synapse_pyspark"},"microsoft":{"language":"python","language_group":"synapse_pyspark","ms_spell_check":{"ms_spell_check_language":"en"}},"nteract":{"version":"nteract-front-end@1.0.0"},"synapse_widget":{"version":"0.1","state":{}},"spark_compute":{"compute_id":"/trident/default","session_options":{"conf":{"spark.synapse.nbs.session.timeout":"1200000"}}},"dependencies":{"lakehouse":{},"environment":{}}},"nbformat":4,"nbformat_minor":5}
\ No newline at end of file
diff --git a/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_post_update_notebooks.ipynb b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_post_update_notebooks.ipynb
new file mode 100644
index 0000000..63d6bb8
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_post_update_notebooks.ipynb
@@ -0,0 +1 @@
+{"cells":[{"cell_type":"markdown","source":["**Helper notebook**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"50b46991-79b4-43ed-a3c1-45caa9767963"},{"cell_type":"code","source":["%run nb_helper"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"b7e81de0-32ed-4552-b145-399179df0802"},{"cell_type":"markdown","source":["**Define a logging dataframe**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"6720a804-bf22-4e4e-b036-2b55ca989c01"},{"cell_type":"code","source":["dfLogging = pd.DataFrame(columns = ['LoadId','NotebookId', 'NotebookName', 'WorkspaceId', 'SourceWorkspaceName','TargetWorkspaceName','Item', 'CellId', 'Timestamp', 'ElapsedTime', 'Message', 'ErrorMessage'])\n","vContext = mssparkutils.runtime.context\n","vNotebookId = vContext[\"currentNotebookId\"]\n","vLogNotebookName = vContext[\"currentNotebookName\"]\n","vWorkspaceId = vContext[\"currentWorkspaceId\"] # where the notebook is running, to not confuse with source and target workspaces"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"10231433-2527-4504-8e95-22b1933657e2"},{"cell_type":"markdown","source":["**Parameters --> convert to code for debugging the notebook. otherwise, keep commented as parameters are passed from DevOps pipelines**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"e871a7f2-f46f-4d25-9f5e-4b482692db38"},{"cell_type":"markdown","source":["pSourceWorkspaceId = \"\"\n","pTargetWorkspaceId = \"\"\n","pDebugMode = \"yes\""],"metadata":{"nteract":{"transient":{"deleting":false}}},"id":"d53db2fe-ece6-4d3b-9186-125a40e9fcd7"},{"cell_type":"markdown","source":["**Resolve the source and target workspaces**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"f3e6b114-71a6-4a1a-b249-2beddde1b0c8"},{"cell_type":"code","source":["vSourceWorkspaceName = fabric.resolve_workspace_name(pSourceWorkspaceId)\n","vTargetWorkspaceName = fabric.resolve_workspace_name(pTargetWorkspaceId)\n","vSourceWorkspaceId = pSourceWorkspaceId\n","vTargetWorkspaceId = pTargetWorkspaceId"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"26ccafd9-5ce4-42d8-9769-d5417801ffc4"},{"cell_type":"markdown","source":["**List of notebooks in source workspace --> semantic link labs have no function as of 22.02.2025**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"be1db7ae-d044-43f7-b80b-d60094dd7517"},{"cell_type":"code","source":["df_source_items = fabric.list_items(workspace=vSourceWorkspaceName)\n","df_source_notebooks = df_source_items[df_source_items['Type']=='Notebook']"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"7a2cd073-cafc-47e3-9ee9-c47cbb8274cc"},{"cell_type":"markdown","source":["**Verify that there is a least one notebook in the source workspace**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"70d5da33-3dcc-403e-ae0c-23836edcfeb9"},{"cell_type":"code","source":["if df_source_notebooks.empty:\n"," vMessage = f\"workspace have 0 notebook. post-update is not required.\"\n","\n"," # Display an exit message\n"," display(Markdown(\"### ✅ Notebook execution stopped successfully!\"))\n","\n"," # Exit without error\n"," mssparkutils.notebook.exit(vMessage)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"ff555d58-ae4d-4134-ad91-8753303f9878"},{"cell_type":"markdown","source":["**Update notebook dependencies**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"d82934ea-5272-43be-9087-8f976eac8b99"},{"cell_type":"code","source":["# get the list of data pipelines in the target workspace\n","df_notebooks = notebookutils.notebook.list(workspaceId=vTargetWorkspaceId)\n","# df_notebooks\n","\n","for notebook in df_notebooks:\n","\n"," # get the notebook id and display name\n"," vNotebookId = notebook.id\n"," vNotebookName = notebook.displayName\n"," \n","\n"," # get the current notebook definition\n"," vNotebookDefinition = notebookutils.notebook.getDefinition(name=vNotebookName, workspaceId=vSourceWorkspaceId) \n"," vNotebookJson = json.loads(vNotebookDefinition)\n","\n"," # update lakehouse dependencies\n"," try:\n","\n"," # check and remove any attached lakehouses\n"," if 'dependencies' in vNotebookJson['metadata'] \\\n"," and 'lakehouse' in vNotebookJson['metadata']['dependencies'] \\\n"," and vNotebookJson['metadata'][\"dependencies\"][\"lakehouse\"] is not None:\n","\n"," vCurrentLakehouse = vNotebookJson['metadata']['dependencies']['lakehouse']\n","\n"," if 'default_lakehouse_name' in vCurrentLakehouse:\n","\n"," vNotebookJson['metadata']['dependencies']['lakehouse'] = {}\n"," print(f\"attempting to update notebook <{vNotebookName}> with new default lakehouse: {vCurrentLakehouse['default_lakehouse_name']} in workspace <{vTargetWorkspaceName}>.\")\n","\n"," # update new notebook definition after removing existing lakehouses and with new default lakehouseId\n"," notebookutils.notebook.updateDefinition(\n"," name = vNotebookName,\n"," content = json.dumps(vNotebookJson), \n"," defaultLakehouse = vCurrentLakehouse['default_lakehouse_name'],\n"," defaultLakehouseWorkspace = vTargetWorkspaceId,\n"," workspaceId = vTargetWorkspaceId\n"," )\n","\n"," print(f\"updated notebook <{vNotebookName}> in workspace <{vTargetWorkspaceName}>.\")\n","\n"," else:\n"," print(f'no default lakehouse set for notebook <{vNotebookName}>, ignoring.')\n","\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vNotebookName, 'update lakehouse dependencies', datetime.now(), None, vMessage, ''] \n"," except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vNotebookName, 'update lakehouse dependencies', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))\n","\n"," # update warehouse dependencies\n"," try:\n"," if 'dependencies' in vNotebookJson['metadata'] and 'warehouse' in vNotebookJson['metadata']['dependencies']:\n"," \n"," #fetch existing details\n"," vCurrentWarehouse = vNotebookJson['metadata']['dependencies']['warehouse']\n"," vCurrentWarehouseId = vCurrentWarehouse['default_warehouse']\n"," vCurrentWarehouseName = fabric.resolve_item_name(item_id = vCurrentWarehouseId, workspace=vSourceWorkspaceId)\n"," vTargetWarehouseId = fabric.resolve_item_id(item_name = vCurrentWarehouseName, type='Warehouse', workspace=vTargetWorkspaceId)\n","\n"," if 'default_warehouse' in vCurrentWarehouse:\n","\n"," print(f\"attempting to update notebook {vNotebookName} with new default warehouse: {vTargetWarehouseId} in {vTargetWorkspaceName}\")\n"," \n"," # update new notebook definition after removing existing lakehouses and with new default lakehouseId\n"," vNotebookJson['metadata']['dependencies']['warehouse']['default_warehouse'] = vTargetWarehouseId\n"," for warehouse in vNotebookJson['metadata']['dependencies']['warehouse']['known_warehouses']:\n"," if warehouse['id'] == vCurrentWarehouseId:\n"," warehouse['id'] = vTargetWarehouseId\n"," # print(json.dumps(vNotebookJson, indent=4))\n"," notebookutils.notebook.updateDefinition(\n"," name = vNotebookName,\n"," content = json.dumps(vNotebookJson),\n"," workspaceId = vTargetWorkspaceId\n"," )\n"," print(f\"updated notebook {vNotebookName} in {vTargetWorkspaceName}\")\n","\n"," else:\n"," print(f\"no default warehouse was found in the source notebook {vNotebookName} there cannot set default for target\")\n","\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vNotebookName, 'update warehouse dependencies', datetime.now(), None, vMessage, ''] \n"," except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vNotebookName, 'update warehouse dependencies', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"7c850199-4380-4369-8dba-a6631136d052"},{"cell_type":"markdown","source":["**Logging**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"82f8bea3-217b-490a-8fb3-8cd3eacd6fba"},{"cell_type":"code","source":["try:\n"," # perform the conversion of columns\n"," dfLogging = dfLogging.astype({\n"," \"LoadId\": \"string\",\t\n"," \"NotebookId\": \"string\", \t\n"," \"NotebookName\": \"string\", \n"," \"WorkspaceId\": \"string\", \n"," \"SourceWorkspaceName\" : \"string\",\n"," \"TargetWorkspaceName\" : \"string\",\n"," \"Item\":\"string\",\n"," \"CellId\": \"string\", \n"," \"Timestamp\": \"datetime64[ns]\", \n"," \"ElapsedTime\": \"string\", \n"," \"Message\": \"string\", \n"," \"ErrorMessage\" : \"string\"\n"," })\n","\n"," # save panda dataframe to a spark dataframe \n"," sparkDF_Logging = spark.createDataFrame(dfLogging) \n","\n"," # save to the lakehouse\n"," sparkDF_Logging.write.mode(\"append\").format(\"delta\").option(\"mergeSchema\", \"true\").saveAsTable(\"staging.notebook_logging_cicd\")\n","\n","except Exception as e:\n"," vMessage = \"saving logs to the lakehouse failed\"\n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"9d6d6709-0ac8-482a-a8ef-be2ddcb4ce2a"}],"metadata":{"language_info":{"name":"python"},"kernelspec":{"name":"synapse_pyspark","language":"Python","display_name":"Synapse PySpark"},"kernel_info":{"name":"synapse_pyspark"},"widgets":{},"microsoft":{"language":"python","language_group":"synapse_pyspark","ms_spell_check":{"ms_spell_check_language":"en"}},"nteract":{"version":"nteract-front-end@1.0.0"},"synapse_widget":{"version":"0.1","state":{}},"spark_compute":{"compute_id":"/trident/default","session_options":{"conf":{"spark.synapse.nbs.session.timeout":"1200000"}}},"dependencies":{"lakehouse":{},"environment":{}}},"nbformat":4,"nbformat_minor":5}
\ No newline at end of file
diff --git a/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_post_update_semantic_models.ipynb b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_post_update_semantic_models.ipynb
new file mode 100644
index 0000000..fc71061
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_post_update_semantic_models.ipynb
@@ -0,0 +1 @@
+{"cells":[{"cell_type":"markdown","source":["**Helper notebook**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"50b46991-79b4-43ed-a3c1-45caa9767963"},{"cell_type":"code","source":["%run nb_helper"],"outputs":[{"output_type":"display_data","data":{"application/vnd.livy.statement-meta+json":{"session_id":"220708dc-1330-4d2e-a379-bb193810ff23","spark_pool":null,"state":"finished","execution_finish_time":"2025-02-23T14:03:46.7763146Z","execution_start_time":"2025-02-23T14:03:37.4132795Z","parent_msg_id":"f6949997-599b-4d61-9300-2d50e981f38d","queued_time":"2025-02-23T14:03:37.2496055Z","livy_statement_state":"available","statement_ids":[82,83,84,85,86,87,88,89,90,91,92,93],"session_start_time":null,"normalized_state":"finished","statement_id":93},"text/plain":"StatementMeta(, 220708dc-1330-4d2e-a379-bb193810ff23, 93, Finished, Available, Finished)"},"metadata":{}},{"output_type":"stream","name":"stdout","text":["Warning: In reference nb_helper run, the default lakehouse of the main notebook will be the effective default lakehouse of the session during reference run. Recommend using absolute path to read/write lakehouse in the referenced notebooks.\n"]}],"execution_count":47,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"b7e81de0-32ed-4552-b145-399179df0802"},{"cell_type":"markdown","source":["**Define a logging dataframe**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"ae7a14a6-b3d7-4a75-9c26-df584197e083"},{"cell_type":"code","source":["dfLogging = pd.DataFrame(columns = ['LoadId','NotebookId', 'NotebookName', 'WorkspaceId', 'SourceWorkspaceName','TargetWorkspaceName','Item', 'CellId', 'Timestamp', 'ElapsedTime', 'Message', 'ErrorMessage'])\n","vContext = mssparkutils.runtime.context\n","vNotebookId = vContext[\"currentNotebookId\"]\n","vLogNotebookName = vContext[\"currentNotebookName\"]\n","vWorkspaceId = vContext[\"currentWorkspaceId\"] # where the notebook is running, to not confuse with source and target workspaces"],"outputs":[{"output_type":"display_data","data":{"application/vnd.livy.statement-meta+json":{"session_id":"220708dc-1330-4d2e-a379-bb193810ff23","spark_pool":null,"state":"finished","execution_finish_time":"2025-02-23T14:03:47.1741892Z","execution_start_time":"2025-02-23T14:03:46.9123716Z","parent_msg_id":"ff52f9bc-14c7-44f2-93d2-9baf9448e166","queued_time":"2025-02-23T14:03:36.9723489Z","livy_statement_state":"available","statement_ids":[94],"session_start_time":null,"normalized_state":"finished","statement_id":94},"text/plain":"StatementMeta(, 220708dc-1330-4d2e-a379-bb193810ff23, 94, Finished, Available, Finished)"},"metadata":{}}],"execution_count":48,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"1c029698-13c1-4981-9b99-239b79b7c42b"},{"cell_type":"markdown","source":["**Parameters --> convert to code for debugging the notebook. otherwise, keep commented as parameters are passed from DevOps pipelines**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"e871a7f2-f46f-4d25-9f5e-4b482692db38"},{"cell_type":"markdown","source":["pSourceWorkspaceId = \"35f100c1-d910-482c-9763-aaf500918816\"\n","pTargetWorkspaceId = \"2fc80f23-1f8f-4c00-b2de-507863e8def4\"\n","pTargetStage = \"Stage3\"\n","pDebugMode = \"yes\"\n","pProjectName = \"fabric-cicd\"\n","pMappingConnections = '[{\"ConnectionStage0\":\"0c0702d4-9c1d-435a-a03e-9635e1fbded8\",\"ConnectionStage1\":\"0c0702d4-9c1d-435a-a03e-9635e1fbded8\",\"ConnectionStage2\":\"feb079dc-6fe7-4f0c-9537-33d7fa72fcb4\",\"ConnectionStage3\":\"feb079dc-6fe7-4f0c-9537-33d7fa72fcb4\"},{\"ConnectionStage0\":\"a24fefc1-e5f4-4606-a3e1-a337b7056627\",\"ConnectionStage1\":\"a24fefc1-e5f4-4606-a3e1-a337b7056627\",\"ConnectionStage2\":null,\"ConnectionStage3\":null},{\"ConnectionStage0\":\"0c0702d4-9c1d-435a-a03e-9635e1fbded8\",\"ConnectionStage1\":\"0c0702d4-9c1d-435a-a03e-9635e1fbded8\",\"ConnectionStage2\":null,\"ConnectionStage3\":null},{\"ConnectionStage0\":\"b8d19a81-9f45-4eed-aef0-314a28c1b16f\",\"ConnectionStage1\":\"b8d19a81-9f45-4eed-aef0-314a28c1b16f\",\"ConnectionStage2\":null,\"ConnectionStage3\":null},{\"ConnectionStage0\":\"2c52c32b-1d27-4de6-852c-9fd8be27cad1\",\"ConnectionStage1\":\"39e95e92-8338-4ad9-8a97-14b39388349b\",\"ConnectionStage2\":null,\"ConnectionStage3\":null},{\"ConnectionStage0\":\"Sql.Database(''rs-synapse-dev-ondemand.sql.azuresynapse.net'', ''misc'')\",\"ConnectionStage1\":\"Sql.Database(''rs-synapse-dev-ondemand.sql.azuresynapse.net'', ''misc'')\",\"ConnectionStage2\":\"Sql.Database(''rs-synapse-dev-ondemand.sql.azuresynapse.net'', ''misc_new'')\",\"ConnectionStage3\":\"Sql.Database(''rs-synapse-dev-ondemand.sql.azuresynapse.net'', ''misc_new'')\"}]'\n"],"metadata":{"nteract":{"transient":{"deleting":false}}},"id":"8474d65d-0ead-43f6-93eb-418dc180d866"},{"cell_type":"markdown","source":["**Resolve source and target workspace**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"01d01936-704f-4403-b7c9-7f75d3d6f8ad"},{"cell_type":"code","source":["vSourceWorkspaceName = fabric.resolve_workspace_name(pSourceWorkspaceId)\n","vTargetWorkspaceName = fabric.resolve_workspace_name(pTargetWorkspaceId)\n","vSourceWorkspaceId = pSourceWorkspaceId\n","vTargetWorkspaceId = pTargetWorkspaceId"],"outputs":[{"output_type":"display_data","data":{"application/vnd.livy.statement-meta+json":{"session_id":"220708dc-1330-4d2e-a379-bb193810ff23","spark_pool":null,"state":"finished","execution_finish_time":"2025-02-23T14:03:47.9492584Z","execution_start_time":"2025-02-23T14:03:47.704142Z","parent_msg_id":"8d341e35-f272-40c2-8b08-0f71f26b0f90","queued_time":"2025-02-23T14:03:37.1789427Z","livy_statement_state":"available","statement_ids":[96],"session_start_time":null,"normalized_state":"finished","statement_id":96},"text/plain":"StatementMeta(, 220708dc-1330-4d2e-a379-bb193810ff23, 96, Finished, Available, Finished)"},"metadata":{}}],"execution_count":50,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"fe985d3b-334d-44fa-af89-1df0aecefa5d"},{"cell_type":"markdown","source":["**List of semantic models in source workspace**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"e5eb5ac2-276d-4a7c-9abd-e31e56e76532"},{"cell_type":"code","source":["df_source_semantic_models = fabric.list_datasets(workspace=vSourceWorkspaceName) "],"outputs":[{"output_type":"display_data","data":{"application/vnd.livy.statement-meta+json":{"session_id":"220708dc-1330-4d2e-a379-bb193810ff23","spark_pool":null,"state":"finished","execution_finish_time":"2025-02-23T14:03:48.84726Z","execution_start_time":"2025-02-23T14:03:48.5952192Z","parent_msg_id":"afa79b52-b75c-401c-9bfb-8e603a804f5d","queued_time":"2025-02-23T14:03:37.3474108Z","livy_statement_state":"available","statement_ids":[98],"session_start_time":null,"normalized_state":"finished","statement_id":98},"text/plain":"StatementMeta(, 220708dc-1330-4d2e-a379-bb193810ff23, 98, Finished, Available, Finished)"},"metadata":{}}],"execution_count":52,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"d2b105e2-4b0a-4791-bb75-0a42d846fc06"},{"cell_type":"markdown","source":["**Verify that there is a least one semantic model in the source workspace**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"dea24dd2-f2d9-4464-9936-932afbf7a953"},{"cell_type":"code","source":["if df_source_semantic_models.empty:\n"," vMessage = f\"workspace have 0 semantic model. post-update is not required.\"\n","\n"," # Display an exit message\n"," display(Markdown(\"### ✅ Notebook execution stopped successfully!\"))\n","\n"," # Exit without error\n"," mssparkutils.notebook.exit(vMessage)"],"outputs":[{"output_type":"display_data","data":{"application/vnd.livy.statement-meta+json":{"session_id":"220708dc-1330-4d2e-a379-bb193810ff23","spark_pool":null,"state":"finished","execution_finish_time":"2025-02-23T14:03:49.2619297Z","execution_start_time":"2025-02-23T14:03:49.0223076Z","parent_msg_id":"8068dba7-028b-4efe-9010-7e4151e5f5b2","queued_time":"2025-02-23T14:03:37.4649241Z","livy_statement_state":"available","statement_ids":[99],"session_start_time":null,"normalized_state":"finished","statement_id":99},"text/plain":"StatementMeta(, 220708dc-1330-4d2e-a379-bb193810ff23, 99, Finished, Available, Finished)"},"metadata":{}}],"execution_count":53,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"bd564dd7-00f5-4736-8026-f09a47900ede"},{"cell_type":"markdown","source":["**Get the connections mapping between Stages and list existing fabric connections**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"948cb91a-c635-4acd-9129-f2c8abc56142"},{"cell_type":"code","source":["# get the mapping of connections between stages\n","mapping_connections_json = json.loads(pMappingConnections)\n","df_mapping_connections = pd.DataFrame(mapping_connections_json)\n","\n","# get the list of existing connections in the tenant. the list will be used for lookups \n","df_existing_connections = labs.list_connections()"],"outputs":[{"output_type":"display_data","data":{"application/vnd.livy.statement-meta+json":{"session_id":"220708dc-1330-4d2e-a379-bb193810ff23","spark_pool":null,"state":"finished","execution_finish_time":"2025-02-23T14:03:50.2318672Z","execution_start_time":"2025-02-23T14:03:49.3877026Z","parent_msg_id":"382dff60-b8aa-4fbe-ad27-597797c2fff9","queued_time":"2025-02-23T14:03:37.5887578Z","livy_statement_state":"available","statement_ids":[100],"session_start_time":null,"normalized_state":"finished","statement_id":100},"text/plain":"StatementMeta(, 220708dc-1330-4d2e-a379-bb193810ff23, 100, Finished, Available, Finished)"},"metadata":{}}],"execution_count":54,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"39c3ece4-ec94-4090-a77f-138adf4eaf9a"},{"cell_type":"markdown","source":["**Functions**\n","- validate_stage_connection_id\n","- find_connection_id\n","- update_partition_source_expression"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"11923075-3df4-48d2-9955-34b7cc3d2993"},{"cell_type":"code","source":["def validate_stage_connection_id(connectionId):\n","\n"," if connectionId in df_existing_connections['Connection Id'].values:\n"," vMessage = f\"connection id <{connectionId}> is valid>\"\n"," print(f\"{vMessage}\") \n"," vConnectionValidation = \"valid\"\n"," else:\n"," vMessage = f\"connection id <{connectionId}> is unvalid>\"\n"," print(f\"{vMessage}\") \n"," vConnectionValidation = \"unvalid\"\n"," return vConnectionValidation"],"outputs":[{"output_type":"display_data","data":{"application/vnd.livy.statement-meta+json":{"session_id":"220708dc-1330-4d2e-a379-bb193810ff23","spark_pool":null,"state":"finished","execution_finish_time":"2025-02-23T14:03:50.6312457Z","execution_start_time":"2025-02-23T14:03:50.3567775Z","parent_msg_id":"4e030c86-1b48-4043-a670-810c90a82d2e","queued_time":"2025-02-23T14:03:37.6807067Z","livy_statement_state":"available","statement_ids":[101],"session_start_time":null,"normalized_state":"finished","statement_id":101},"text/plain":"StatementMeta(, 220708dc-1330-4d2e-a379-bb193810ff23, 101, Finished, Available, Finished)"},"metadata":{}}],"execution_count":55,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"e05e9dbd-8b38-4191-9406-532796c5a0d5"},{"cell_type":"code","source":["# function to find a connection id based on the target stage\n","# the csv provided with the mapping between stages is used, with the assomption of 4 stages maximun (dev, test, uat, prod)\n","def find_connection_id(devConnectionId, targetStage, validateConnection):\n","\n"," global df_mapping_connections\n","\n"," vMessage = f\"dev connection id is <{devConnectionId}>\"\n"," print(f\"{vMessage}\") \n","\n"," # filter the DataFrame based on a condition\n"," df_mapping_connections_filtered = df_mapping_connections[(df_mapping_connections['ConnectionStage1'] == devConnectionId)]\n","\n"," # extract the value of a target connection id\n"," # if the target connection cannot be found assign it the dev connection to avoid breaking the json definition of the pipeline\n"," if not df_mapping_connections_filtered.empty:\n","\n"," first_row = df_mapping_connections_filtered.iloc[0] # Get the first matching row\n","\n"," if targetStage == \"Stage2\":\n"," targetConnectionId = first_row[\"ConnectionStage2\"]\n"," elif targetStage == \"Stage3\":\n"," targetConnectionId = first_row[\"ConnectionStage3\"]\n"," else:\n"," targetConnectionId = first_row[\"ConnectionStage4\"]\n","\n"," # if the stage column in the mapping has no value, assing NA\n"," targetConnectionId = \"NA\" if pd.isna(targetConnectionId) or targetConnectionId == \"\" else targetConnectionId\n","\n","\n"," if validateConnection == \"yes\":\n","\n"," # validate that the stage connection exists\n"," vConnectionValidation = validate_stage_connection_id(targetConnectionId)\n","\n"," # if the validation of the connection fails , keep the dev connection\n"," if vConnectionValidation == \"unvalid\":\n"," targetConnectionId = devConnectionId\n","\n"," else:\n"," \n"," vMessage = f\"no valid connection found in the mapping matching the condition, source connection will be kept.\"\n"," print(f\"{vMessage}\") \n","\n"," # assign the dev connection to the target connection\n"," targetConnectionId = devConnectionId\n","\n","\n"," # return the found values\n"," return targetConnectionId"],"outputs":[{"output_type":"display_data","data":{"application/vnd.livy.statement-meta+json":{"session_id":"220708dc-1330-4d2e-a379-bb193810ff23","spark_pool":null,"state":"finished","execution_finish_time":"2025-02-23T14:03:51.0918565Z","execution_start_time":"2025-02-23T14:03:50.8249335Z","parent_msg_id":"628859f7-a06a-4703-8f67-2f45dfcb7af6","queued_time":"2025-02-23T14:03:37.8002077Z","livy_statement_state":"available","statement_ids":[102],"session_start_time":null,"normalized_state":"finished","statement_id":102},"text/plain":"StatementMeta(, 220708dc-1330-4d2e-a379-bb193810ff23, 102, Finished, Available, Finished)"},"metadata":{}}],"execution_count":56,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"1de4d5fa-459b-421b-92f9-566faa05aef4"},{"cell_type":"code","source":["# function for semantic models where the connection to the source system and database is an M code\n","def update_partition_source_expression(obj, targetStage):\n"," \n"," # iterate on tables \n"," for table in obj.get(\"model\", {}).get(\"tables\", []):\n","\n"," # iterate on partitions\n"," for partition in table.get(\"partitions\", []):\n","\n"," # extract the source and from the source the expression\n"," source = partition.get(\"source\", {})\n"," expression = source.get(\"expression\", [])\n"," \n"," # M expression are multi lines, iterate over lines and extract the pattern that matches \"Source = \"\n"," for i, line in enumerate(expression):\n"," \n"," # pattern\n"," vMatch = re.match(r'\\s*Source\\s*=\\s*(.*),', line)\n"," \n"," # if there is a match\n"," if vMatch:\n","\n"," # set the indentation\n"," vIndentation = \" \"\n","\n"," # get the connection\n"," # Power BI has hundreds of connectors and each has specifics exprections\n"," # extracting values based on each connector requires knowledge of the syntax\n"," # for simplicity, extract the full expression\n"," devConnectionId = vMatch.group(1).strip()\n"," # print(devConnectionId)\n","\n"," # get the mapping connection expression\n"," targetConnectionId = find_connection_id(devConnectionId=devConnectionId, targetStage=targetStage, validateConnection = 'no')\n"," print(f\"devConnectionId <{devConnectionId}>, targetConnectionId <{targetConnectionId}>\")\n","\n"," # set the expression\n"," expression[i] = f'{vIndentation}Source = {targetConnectionId},'\n"," break # stop the iteratin after the first match\n"," \n"," # return the \n"," return obj"],"outputs":[{"output_type":"display_data","data":{"application/vnd.livy.statement-meta+json":{"session_id":"220708dc-1330-4d2e-a379-bb193810ff23","spark_pool":null,"state":"finished","execution_finish_time":"2025-02-23T14:03:51.5463324Z","execution_start_time":"2025-02-23T14:03:51.2167428Z","parent_msg_id":"b535c213-9a52-44d1-8592-7581e18dc78e","queued_time":"2025-02-23T14:03:37.9216279Z","livy_statement_state":"available","statement_ids":[103],"session_start_time":null,"normalized_state":"finished","statement_id":103},"text/plain":"StatementMeta(, 220708dc-1330-4d2e-a379-bb193810ff23, 103, Finished, Available, Finished)"},"metadata":{}}],"execution_count":57,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"f7aeb147-3b1b-4ccb-80d4-a54bfe4cca0e"},{"cell_type":"markdown","source":["**Update direct lake model lakehouse connection**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"31638131-4528-4305-a2fd-cd29955d6b58"},{"cell_type":"code","source":["# get the list of semantic models in the workspace\n","df_target_semantic_models = fabric.list_datasets(workspace=vTargetWorkspaceName)\n","\n","# iterate over each dataset in the dataframe\n","for index, row in df_target_semantic_models.iterrows():\n","\n"," # get the semantic model name\n"," vSemanticModelName = row['Dataset Name']\n","\n","\n"," # update the connection of semantic models \n"," try:\n","\n"," # Check if the dataset is not the default semantic model\n"," if not labs.is_default_semantic_model(vSemanticModelName, vTargetWorkspaceId):\n"," \n"," print(f'updating semantic model <{vSemanticModelName}> connection in workspace <{vTargetWorkspaceName}>.')\n","\n"," # check if the semantic model has a direct lake lakehouse\n"," try:\n"," vDatasetDirectLakehouse=labs.directlake.get_direct_lake_lakehouse(\n"," dataset=vSemanticModelName, \n"," workspace= vTargetWorkspaceName,\n"," )\n"," vValidationDirectLake = \"valid\"\n"," \n"," except Exception as e:\n"," if \"SQL Endpoint not found\" in str(e):\n"," vValidationDirectLake = \"unvalid\"\n"," \n","\n"," # if the semantic lake has a direct lake lakehouse, update the connection and refresh it\n"," if vValidationDirectLake == \"valid\":\n","\n"," print(f'semantic model <{vSemanticModelName}> has a direct lake connection. using model.bim instead')\n"," \n"," # update the connection\n"," labs.directlake.update_direct_lake_model_connection(\n"," dataset=vSemanticModelName, \n"," workspace= vTargetWorkspaceName,\n"," source=labs.directlake.get_direct_lake_source(vSemanticModelName, workspace=vTargetWorkspaceName)[1], \n"," source_type=labs.directlake.get_direct_lake_source(vSemanticModelName, workspace=vTargetWorkspaceName)[0], \n"," source_workspace=vTargetWorkspaceName\n"," )\n"," \n"," # refresh the semantic mode (metadata only)\n"," labs.refresh_semantic_model(dataset=vSemanticModelName, workspace=vTargetWorkspaceName)\n","\n"," else:\n"," print(f'semantic model <{vSemanticModelName}> has no direct lake connection. using the json structure instead')\n","\n"," # get the current definition as in the source workspace\n"," semantic_model_json = labs.get_semantic_model_bim(dataset=vSemanticModelName, workspace=vSourceWorkspaceName)\n","\n"," # print(json.dumps(semantic_model_json, indent=4))\n","\n"," # replace M expressions using the connection mapping\n"," semantic_model_json_new = update_partition_source_expression(semantic_model_json, pTargetStage)\n","\n"," \n"," # update the semantic model from the new json structure\n"," labs.update_semantic_model_from_bim(\n"," dataset=vSemanticModelName, \n"," bim_file=semantic_model_json_new, \n"," workspace=vTargetWorkspaceName\n"," )\n"," \n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vSemanticModelName, 'update semantic model connection', datetime.now(), None, vMessage, ''] \n"," \n"," except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vSemanticModelName, 'update semantic model connection', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))\n"],"outputs":[{"output_type":"display_data","data":{"application/vnd.livy.statement-meta+json":{"session_id":"220708dc-1330-4d2e-a379-bb193810ff23","spark_pool":null,"state":"finished","execution_finish_time":"2025-02-23T14:05:51.602706Z","execution_start_time":"2025-02-23T14:05:43.1046401Z","parent_msg_id":"6334b920-0736-4863-8162-ed3813d9ee5a","queued_time":"2025-02-23T14:05:42.7362932Z","livy_statement_state":"available","statement_ids":[105],"session_start_time":null,"normalized_state":"finished","statement_id":105},"text/plain":"StatementMeta(, 220708dc-1330-4d2e-a379-bb193810ff23, 105, Finished, Available, Finished)"},"metadata":{}},{"output_type":"stream","name":"stdout","text":["updating semantic model connection in workspace .\nsemantic model has no direct lake connection. using the json structure instead\nNone\n\ndev connection id is \nSql.Database(\".\", \"AdventureWorksDW\") Sql.Database(\"localhost\", \"AdventureWorksDW\")\nNone\n\ndev connection id is \nSql.Database(\".\", \"AdventureWorksDW\") Sql.Database(\"localhost\", \"AdventureWorksDW\")\nNone\n\ndev connection id is \nSql.Database(\".\", \"AdventureWorksDW\") Sql.Database(\"localhost\", \"AdventureWorksDW\")\nNone\n\ndev connection id is \nSql.Database(\"rs-synapse-dev-ondemand.sql.azuresynapse.net\", \"misc\") Sql.Database(\"rs-synapse-dev-ondemand.sql.azuresynapse.net\", \"misc_new\")\nNone\n to semantic model <{vSemanticModelName}> in workspace <{vTargetWorkspaceName}>.')\n","\n"," labs.report.report_rebind(\n"," report=vReportName,\n"," dataset=vSemanticModelName, \n"," report_workspace=vTargetWorkspaceName, \n"," dataset_workspace=vTargetWorkspaceName\n"," )\n","\n","\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vReportName, 'update report connection', datetime.now(), None, vMessage, ''] \n"," except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, vSourceWorkspaceName, vTargetWorkspaceName, vReportName, 'update report connection', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))\n"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"7c850199-4380-4369-8dba-a6631136d052"},{"cell_type":"markdown","source":["**Logging**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"82f8bea3-217b-490a-8fb3-8cd3eacd6fba"},{"cell_type":"code","source":["try:\n"," # perform the conversion of columns\n"," dfLogging = dfLogging.astype({\n"," \"LoadId\": \"string\",\t\n"," \"NotebookId\": \"string\", \t\n"," \"NotebookName\": \"string\", \n"," \"WorkspaceId\": \"string\", \n"," \"SourceWorkspaceName\" : \"string\",\n"," \"TargetWorkspaceName\" : \"string\",\n"," \"Item\":\"string\",\n"," \"CellId\": \"string\", \n"," \"Timestamp\": \"datetime64[ns]\", \n"," \"ElapsedTime\": \"string\", \n"," \"Message\": \"string\", \n"," \"ErrorMessage\" : \"string\"\n"," })\n","\n"," # save panda dataframe to a spark dataframe \n"," sparkDF_Logging = spark.createDataFrame(dfLogging) \n","\n"," # save to the lakehouse\n"," sparkDF_Logging.write.mode(\"append\").format(\"delta\").option(\"mergeSchema\", \"true\").saveAsTable(\"staging.notebook_logging_cicd\")\n","\n","except Exception as e:\n"," vMessage = \"saving logs to the lakehouse failed\"\n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"9d6d6709-0ac8-482a-a8ef-be2ddcb4ce2a"}],"metadata":{"language_info":{"name":"python"},"kernelspec":{"name":"synapse_pyspark","language":"Python","display_name":"Synapse PySpark"},"kernel_info":{"name":"synapse_pyspark"},"widgets":{},"microsoft":{"language":"python","language_group":"synapse_pyspark","ms_spell_check":{"ms_spell_check_language":"en"}},"nteract":{"version":"nteract-front-end@1.0.0"},"synapse_widget":{"version":"0.1","state":{}},"spark_compute":{"compute_id":"/trident/default","session_options":{"conf":{"spark.synapse.nbs.session.timeout":"1200000"}}},"dependencies":{"lakehouse":{"default_lakehouse":"d1073252-d492-4168-85ee-9dc395278b29","known_lakehouses":[{"id":"d1073252-d492-4168-85ee-9dc395278b29"}],"default_lakehouse_name":"cicdlakehouse","default_lakehouse_workspace_id":"4d4452c6-7faf-46b4-81fc-21f5bcc6bd42"},"environment":{}}},"nbformat":4,"nbformat_minor":5}
\ No newline at end of file
diff --git a/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_pre_deployment.ipynb b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_pre_deployment.ipynb
new file mode 100644
index 0000000..9634e01
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_pre_deployment.ipynb
@@ -0,0 +1 @@
+{"cells":[{"cell_type":"markdown","source":["**Libraries**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"c2ee3b3a-f369-4f98-8e88-a04e999870cf"},{"cell_type":"code","source":["import pandas as pd\n","from datetime import datetime, timedelta\n","import sempy.fabric as fabric"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"151a5189-710d-4f0a-ad72-5ad9c422c817"},{"cell_type":"markdown","source":["**Define a logging dataframe**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"4788f8ff-57e3-44dc-82d2-225b1f99ba38"},{"cell_type":"code","source":["dfLogging = pd.DataFrame(columns = ['LoadId','NotebookId', 'NotebookName', 'WorkspaceId', 'CellId', 'Timestamp', 'ElapsedTime', 'Message', 'ErrorMessage'])\n","vContext = mssparkutils.runtime.context\n","vNotebookId = vContext[\"currentNotebookId\"]\n","vLogNotebookName = vContext[\"currentNotebookName\"]\n","vWorkspaceId = vContext[\"currentWorkspaceId\"] # where the notebook is running, to not confuse with source and target workspaces"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"121fc594-15d2-47d2-b4d0-f9609f50791b"},{"cell_type":"markdown","source":["**Parameters --> convert to code for debugging the notebook. otherwise, keep commented as parameters are passed from DevOps pipelines**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"f8421d0a-f588-4f23-bd13-4696e6f2fcc5"},{"cell_type":"markdown","source":["pToken = \"\"\n","pSqlToken = \"\"\n","pSourceWorkspaceId = \"\"\n","pTargetWorkspaceId = \"\"\n","pDebugMode = \"yes\"\n","pFeatureBranch = \"\"\n","pOnelakeRoles = ''\n","pOnelakeRules = ''\n","pOnelakeEntraMembers = ''\n","pOnelakeItemMembers = ''"],"metadata":{"nteract":{"transient":{"deleting":false}}},"id":"22da0c79-bfea-43ea-9d57-583771c4f22f"},{"cell_type":"markdown","source":["**Access token**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"b3eda308-20f6-4ff1-87a3-5c71b228f3a1"},{"cell_type":"code","source":["vScope = \"https://analysis.windows.net/powerbi/api\"\n","\n","# get the access token \n","if pDebugMode == \"yes\":\n"," # in debug mode, use the token of the current user\n"," vAccessToken = mssparkutils.credentials.getToken(vScope)\n"," vSqlAccessToken = vAccessToken\n","else:\n"," # when the code is run from the pipelines, to token is generated in a previous step and passed as a parameter to the notebook\n"," vAccessToken = pToken \n"," vSqlAccessToken = pSqlToken"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"f3095581-b39f-40a9-8523-6800e182f4e9"},{"cell_type":"markdown","source":["**Check if the source workspace passed from DevOps equals the feature branch name**\n","- This is a specific handling when a PR is done from the feature branch"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"3cc40df1-4833-4b4a-8f16-a4c7399d0a0d"},{"cell_type":"code","source":["if pSourceWorkspaceId == pFeatureBranch:\n"," pSourceWorkspaceId = fabric.resolve_workspace_id(workspace=pFeatureBranch)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"7a0a085c-fa84-4fe1-af36-b089366d95a9"},{"cell_type":"markdown","source":["**Define the DAG**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"6a8b6198-1624-4008-9f6d-a4133b4650ab"},{"cell_type":"code","source":["dagList = []\n","\n","# add to the DAG list nb_cicd_pre_update_lakehouses\n","dagList.append({\n"," \"name\": \"nb_cicd_pre_update_lakehouses\",\n"," \"path\": \"nb_cicd_pre_update_lakehouses\",\n"," \"timeoutPerCellInSeconds\": 300,\n"," \"args\": {\n"," \"useRootDefaultLakehouse\": True,\n"," \"pToken\": vAccessToken,\n"," \"pSqlToken\": vSqlAccessToken,\n"," \"pSourceWorkspaceId\":pSourceWorkspaceId,\n"," \"pTargetWorkspaceId\":pTargetWorkspaceId,\n"," \"pDebugMode\":pDebugMode,\n"," \"pOnelakeRoles\":pOnelakeRoles,\n"," \"pOnelakeRules\":pOnelakeRules,\n"," \"pOnelakeEntraMembers\":pOnelakeEntraMembers,\n"," \"pOnelakeItemMembers\":pOnelakeItemMembers,\n"," }\n"," })\n","\n","# add to the DAG list nb_cicd_pre_update_warehouses\n","dagList.append({\n"," \"name\": \"nb_cicd_pre_update_warehouses\",\n"," \"path\": \"nb_cicd_pre_update_warehouses\",\n"," \"timeoutPerCellInSeconds\": 300,\n"," \"args\": {\n"," \"useRootDefaultLakehouse\": True,\n"," \"pSqlToken\": vSqlAccessToken,\n"," \"pSourceWorkspaceId\":pSourceWorkspaceId,\n"," \"pTargetWorkspaceId\":pTargetWorkspaceId,\n"," \"pDebugMode\":pDebugMode\n"," },\n"," \"dependencies\": [\"nb_cicd_pre_update_lakehouses\"]\n"," })\n","\n","DAG = { \"activities\": dagList,\"concurrency\": 2, \"timeoutInSeconds\": 900 }\n"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"cb40613c-1d2b-4edf-8d1e-de25de315fec"},{"cell_type":"markdown","source":["**Run multiple**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"3345dfe0-84c7-4c33-9742-b726e28ef82e"},{"cell_type":"code","source":["try:\n"," mssparkutils.notebook.runMultiple(DAG, {\"displayDAGViaGraphviz\": True})\n","\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'running the DAG', datetime.now(), None, vMessage, ''] \n","\n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'running the DAG', datetime.now(), None, vMessage, str(e)]\n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"21fdc174-a457-4bf3-9b38-5fb7c8d8a68d"},{"cell_type":"markdown","source":["**Logging**"],"metadata":{"nteract":{"transient":{"deleting":false}}},"id":"89352377-24c6-4f57-9ba1-18532e7587ba"},{"cell_type":"code","source":["try:\n"," # perform the conversion of columns\n"," dfLogging = dfLogging.astype({\n"," \"LoadId\": \"string\",\t\n"," \"NotebookId\": \"string\", \t\n"," \"NotebookName\": \"string\", \n"," \"WorkspaceId\": \"string\", \n"," \"CellId\": \"string\", \n"," \"Timestamp\": \"datetime64[ns]\", \n"," \"ElapsedTime\": \"string\", \n"," \"Message\": \"string\", \n"," \"ErrorMessage\" : \"string\"\n"," })\n","\n"," # save panda dataframe to a spark dataframe \n"," sparkDF_Logging = spark.createDataFrame(dfLogging) \n","\n"," # save to the lakehouse\n"," sparkDF_Logging.write.mode(\"append\").format(\"delta\").option(\"mergeSchema\", \"true\").saveAsTable(\"staging.notebook_logging\")\n","\n","except Exception as e:\n"," vMessage = \"saving logs to the lakehouse failed\"\n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"2daf311c-d307-4ddd-83eb-0d7853214cc8"},{"cell_type":"markdown","source":["**Exit notebook**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"4fcc01a8-abab-420b-b7ae-d72208eabcfd"},{"cell_type":"code","source":["mssparkutils.notebook.exit(f\"Notebook <{vLogNotebookName}> run successfully. Check logging table in CI/CD lakehouse for more details.\")"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"e97c7158-a8fa-4913-84a0-024246cf9275"}],"metadata":{"language_info":{"name":"python"},"kernelspec":{"name":"synapse_pyspark","language":"Python","display_name":"Synapse PySpark"},"kernel_info":{"name":"synapse_pyspark"},"widgets":{},"microsoft":{"language":"python","language_group":"synapse_pyspark","ms_spell_check":{"ms_spell_check_language":"en"}},"nteract":{"version":"nteract-front-end@1.0.0"},"synapse_widget":{"version":"0.1","state":{}},"spark_compute":{"compute_id":"/trident/default","session_options":{"conf":{"spark.synapse.nbs.session.timeout":"1200000"}}},"dependencies":{"lakehouse":{},"environment":{}}},"nbformat":4,"nbformat_minor":5}
\ No newline at end of file
diff --git a/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_pre_update_lakehouses.ipynb b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_pre_update_lakehouses.ipynb
new file mode 100644
index 0000000..a6071b2
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_pre_update_lakehouses.ipynb
@@ -0,0 +1 @@
+{"cells":[{"cell_type":"markdown","source":["**Helper notebook**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"c2ee3b3a-f369-4f98-8e88-a04e999870cf"},{"cell_type":"code","source":["%run nb_helper"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"151a5189-710d-4f0a-ad72-5ad9c422c817"},{"cell_type":"markdown","source":["**Define a logging dataframe**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"4788f8ff-57e3-44dc-82d2-225b1f99ba38"},{"cell_type":"code","source":["dfLogging = pd.DataFrame(columns = ['LoadId','NotebookId', 'NotebookName', 'WorkspaceId', 'CellId', 'Timestamp', 'ElapsedTime', 'Message', 'ErrorMessage'])\n","vContext = mssparkutils.runtime.context\n","vNotebookId = vContext[\"currentNotebookId\"]\n","vLogNotebookName = vContext[\"currentNotebookName\"]\n","vWorkspaceId = vContext[\"currentWorkspaceId\"] # where the notebook is running, to not confuse with source and target workspaces"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"121fc594-15d2-47d2-b4d0-f9609f50791b"},{"cell_type":"markdown","source":["**Parameters --> convert to code for debugging the notebook. otherwise, keep commented as parameters are passed from DevOps pipelines**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"f8421d0a-f588-4f23-bd13-4696e6f2fcc5"},{"cell_type":"code","source":["pToken = \"\"\n","pSqlToken = \"\"\n","pSourceWorkspaceId = \"\"\n","pTargetWorkspaceId = \"\"\n","pDebugMode = \"yes\"\n","pOnelakeRoles = ''\n","pOnelakeRules = ''\n","pOnelakeEntraMembers = ''\n","pOnelakeItemMembers = ''"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"22da0c79-bfea-43ea-9d57-583771c4f22f"},{"cell_type":"markdown","source":["**Resolve source and target workspace ids**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"11309f2b-0e81-45ac-8863-c8de477f4093"},{"cell_type":"code","source":["vSourceWorkspaceName = fabric.resolve_workspace_name(pSourceWorkspaceId)\n","vTargetWorkspaceName = fabric.resolve_workspace_name(pTargetWorkspaceId)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"4637da24-8f50-480f-bf88-2c97a00a331b"},{"cell_type":"markdown","source":["**List source and target lakehouses**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"426304ae-789e-428b-a6fd-a3506ecfdbd9"},{"cell_type":"code","source":["df_source_lakehouses = labs.list_lakehouses(workspace=vSourceWorkspaceName)\n","df_target_lakehouses = labs.list_lakehouses(workspace=vTargetWorkspaceName)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"982de745-074d-4d85-89b5-9a81be826d5b"},{"cell_type":"markdown","source":["**Verify that there is a least one lakehouse in the source**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"86be88a9-34b0-4f99-812e-6c322a6e9cc6"},{"cell_type":"code","source":["if df_source_lakehouses.empty:\n"," vMessage = f\"workspace have 0 lakehouse. pre-update is not required.\"\n","\n"," # Display an exit message\n"," display(Markdown(\"### ✅ Notebook execution stopped successfully!\"))\n","\n"," # Exit without error\n"," mssparkutils.notebook.exit(vMessage)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"7323d7d7-a55d-4946-827f-8f743b58bde8"},{"cell_type":"markdown","source":["**Variables related to the logic**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"f35a0c6a-fc49-4862-b52f-4b6a887c8476"},{"cell_type":"code","source":["vApiVersion = \"v1\"\n","vShortcutConflictPolicy = \"Abort\"\n","if pOnelakeRoles == \"\":\n"," vCustomRoles = \"no\"\n","else:\n"," vCustomRoles = \"yes\""],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"},"tags":[]},"id":"668a4422-7fe8-4d2e-bdc0-4399f6908180"},{"cell_type":"markdown","source":["**Resolve source and target workspace ids**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"f89dd60e-920e-4f53-bf45-c461d9c6dac3"},{"cell_type":"code","source":["vSourceWorkspaceId = pSourceWorkspaceId\n","vTargetWorkspaceId = pTargetWorkspaceId"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"c3785a2b-a42b-4049-80d2-fa71a23a2a27"},{"cell_type":"markdown","source":["**Access token**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"b3eda308-20f6-4ff1-87a3-5c71b228f3a1"},{"cell_type":"code","source":["vScope = \"https://analysis.windows.net/powerbi/api\"\n","\n","# get the access token \n","if pDebugMode == \"yes\":\n"," # in debug mode, use the token of the current user\n"," vAccessToken = mssparkutils.credentials.getToken(vScope)\n"," vSqlAccessToken = vAccessToken\n","else:\n"," # when the code is run from DevOps, the token passed as a parameter\n"," vAccessToken = pToken \n"," vSqlAccessToken = pSqlToken"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"f3095581-b39f-40a9-8523-6800e182f4e9"},{"cell_type":"markdown","source":["**Base Url and Headers**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"fddaf6ce-2a97-4dba-a0b4-fca5205acbb8"},{"cell_type":"code","source":["vBaseUrl = f\"https://api.fabric.microsoft.com/{vApiVersion}/\"\n","vHeaders = {'Authorization': f'Bearer {vAccessToken}'}"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"7d9ffe60-f9f8-4a43-bc88-62b328dfdec0"},{"cell_type":"markdown","source":["**Functions**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"40817457-ef07-4ac4-90e1-4d68a93c9e4e"},{"cell_type":"code","source":["def input_for_full_deployment(df_source_lakehouse_columns):\n","\n"," # get the dataframe passed as parameter\n"," df = df_source_lakehouse_columns\n","\n"," # if the column has space in it, replace it with an underscore\n"," df[\"Column Name\"] = df[\"Column Name\"].str.replace(' ', '_')\n","\n"," # concat the column name and data tupe\n"," df[\"ColumnNameDataType\"] = df[\"Column Name\"] + \" \" + df[\"Data Type\"]\n","\n"," # group the columns\n"," df_grouped = df.groupby([\"WorkspaceTargetName\", \"Lakehouse Name\", \"Table Name\"])['ColumnNameDataType'].agg(','.join).reset_index()\n","\n"," # generate the sql statement\n"," df_grouped[\"SqlStatement\"] = \"CREATE TABLE \" + df_grouped[\"Lakehouse Name\"] + \".\" + df_grouped[\"Table Name\"] + \"(\" + df_grouped[\"ColumnNameDataType\"] + \")\"\n","\n"," # return the dataframe\n"," return df_grouped"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"a61f3026-9518-44c2-b564-8dfd6f2f900b"},{"cell_type":"code","source":["def input_for_incremental_deployment(df_source_lakehouse_columns_incremental, incremental_type):\n","\n","\n"," # get the dataframe passed as parameter\n"," df = df_source_lakehouse_columns_incremental\n","\n"," # if the column has space in it, replace it with an underscore\n"," df[\"Column Name\"] = df[\"Column Name\"].str.replace(' ', '_')\n","\n"," # concat the column name and data tupe\n"," df[\"ColumnNameDataType\"] = df[\"Column Name\"] + \" \" + df[\"Data Type\"]\n","\n","\n"," if incremental_type == \"alter table add column\":\n","\n"," # group the columns\n"," df_grouped = df.groupby([\"WorkspaceTargetName\", \"Lakehouse Name\", \"Table Name\"])['ColumnNameDataType'].agg(','.join).reset_index()\n","\n"," # generate the sql statement for adding columns\n"," df_grouped[\"SqlStatement\"] = \"ALTER TABLE \" + df_grouped[\"Lakehouse Name\"] + \".\" + df_grouped[\"Table Name\"] + \" ADD COLUMNS(\" + df_grouped[\"ColumnNameDataType\"] + \")\"\n","\n"," elif incremental_type == \"alter table drop column\":\n","\n"," # ALTER TABLE ALTER COLUMN does not work as of 02.2025\n"," # add this statement before droping the columns: ALTER TABLE data_types SET TBLPROPERTIES ('delta.columnMapping.mode' = 'name', 'delta.minReaderVersion' = '2','delta.minWriterVersion' = '5') \n","\n"," df_0 = df.groupby([\"WorkspaceTargetName\", \"Lakehouse Name\", \"Table Name\"])['Column Name'].agg(','.join).reset_index()\n"," df_0['SqlStatement'] = \"ALTER TABLE \" + df_0[\"Lakehouse Name\"] + \".\" + df_0[\"Table Name\"] + \" SET TBLPROPERTIES ('delta.columnMapping.mode'='name', 'delta.minReaderVersion'='2','delta.minWriterVersion'='5');\"\n"," df_0.drop(columns='Column Name', inplace=True)\n","\n"," df_1 = df.groupby([\"WorkspaceTargetName\", \"Lakehouse Name\", \"Table Name\"])['Column Name'].agg(','.join).reset_index()\n"," df_1[\"SqlStatement\"] = \"ALTER TABLE \" + df_1[\"Lakehouse Name\"] + \".\" + df_1[\"Table Name\"] + \" DROP COLUMNS(\" + df_1[\"Column Name\"] + \")\"\n","\n","\n"," df_union = pd.concat([df_0, df_1], ignore_index=True)\n"," df_grouped = df_union.groupby([\"WorkspaceTargetName\", \"Lakehouse Name\", \"Table Name\"])['SqlStatement'].apply(lambda x: '\\n'.join(x)).reset_index()\n"," \n","\n"," else:\n"," # ALTER TABLE ALTER COLUMN does not work as of 02.2025\n"," # add this statement before droping the columns: ALTER TABLE data_types SET TBLPROPERTIES ('delta.columnMapping.mode' = 'name', 'delta.minReaderVersion' = '2','delta.minWriterVersion' = '5')\n"," # the alternative is the following logic (example with 2 columns changing data types):\n"," # ALTER TABLE data_types SET TBLPROPERTIES ('delta.columnMapping.mode' = 'name', 'delta.minReaderVersion' = '2','delta.minWriterVersion' = '5')\n"," # ALTER TABLE data_types ADD COLUMNS (C_New INT, D_New INT)\n"," # UPDATE data_types SET C_New = C, D_New = C\n"," # ALTER TABLE data_types DROP COLUMNS (C,D)\n"," # ALTER TABLE data_types RENAME COLUMN C_New TO C;\n"," # ALTER TABLE data_types RENAME COLUMN D_New TO D;\n","\n"," # add the required columns for the logic\n"," df['InputForAddingColumns'] = df['Column Name'] + '_New ' + df['Data Type']\n"," df['InputForUpdatingColumns'] = df['Column Name'] + '_New =' + df['Column Name']\n"," df['InputForRenamingColumns'] = df['Column Name'] + '_New TO ' + df['Column Name']\n","\n"," df_0 = df.groupby([\"WorkspaceTargetName\", \"Lakehouse Name\", \"Table Name\"])['Column Name'].agg(','.join).reset_index()\n"," df_0['SqlStatement'] = \"ALTER TABLE \" + df_0[\"Lakehouse Name\"] + \".\" + df_0[\"Table Name\"] + \" SET TBLPROPERTIES ('delta.columnMapping.mode'='name', 'delta.minReaderVersion'='2','delta.minWriterVersion'='5');\"\n"," df_0.drop(columns='Column Name', inplace=True)\n","\n"," df_1 = df.groupby([\"WorkspaceTargetName\", \"Lakehouse Name\", \"Table Name\"])['InputForAddingColumns'].agg(','.join).reset_index()\n"," df_1['SqlStatement'] = 'ALTER TABLE ' + df_1[\"Lakehouse Name\"] + '.' + df_1[\"Table Name\"] + ' ADD COLUMNS (' + df_1['InputForAddingColumns'] +');'\n"," df_1.drop(columns='InputForAddingColumns', inplace=True)\n","\n"," df_2 = df.groupby([\"WorkspaceTargetName\", \"Lakehouse Name\", \"Table Name\"])['InputForUpdatingColumns'].agg(','.join).reset_index()\n"," df_2['SqlStatement'] = 'UPDATE ' + df_2[\"Lakehouse Name\"] + '.' + df_2[\"Table Name\"] + ' SET ' + df_2['InputForUpdatingColumns'] + ';'\n"," df_2.drop(columns='InputForUpdatingColumns', inplace=True)\n","\n"," df_3 = df.groupby([\"WorkspaceTargetName\", \"Lakehouse Name\", \"Table Name\"])['Column Name'].agg(','.join).reset_index()\n"," df_3['SqlStatement'] = 'ALTER TABLE ' + df_2[\"Lakehouse Name\"] + '.' + df_2[\"Table Name\"] + ' DROP COLUMNS (' + df_3['Column Name'] +');'\n"," df_3.drop(columns='Column Name', inplace=True)\n","\n"," def generate_sql(group):\n"," return \";\\n\".join([f\"ALTER TABLE {row['Lakehouse Name']}.{row['Table Name']} RENAME COLUMN {row['InputForRenamingColumns']}\" for _, row in group.iterrows()]) + \";\"\n","\n"," df_4 = df.groupby([\"WorkspaceTargetName\", \"Lakehouse Name\", \"Table Name\"]).apply(generate_sql).reset_index(name='SqlStatement')\n","\n"," df_union = pd.concat([df_0, df_1, df_2, df_3, df_4], ignore_index=True)\n"," df_grouped = df_union.groupby([\"WorkspaceTargetName\", \"Lakehouse Name\", \"Table Name\"])['SqlStatement'].apply(lambda x: '\\n'.join(x)).reset_index()\n","\n","\n"," # return the dataframe\n"," return df_grouped"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"8efca722-b5f5-4f5a-a2ee-2e0dd517afae"},{"cell_type":"code","source":["def shortcut_body(target_type, name, path, target_location, target_subpath, target_connection_id):\n","\n"," vTargetType = \"\"\n"," if target_type == \"AdlsGen2\":\n"," vTargetType = \"adlsGen2\"\n"," elif target_type == \"AmazonS3\":\n"," vTargetType = \"amazonS3\"\n"," elif target_type == \"GoogleCloudStorage\":\n"," vTargetType = \"googleCloudStorage\"\n"," elif target_type == \"S3Compatible\":\n"," vTargetType = \"s3Compatible\"\n"," else:\n"," vTargetType = \"\"\n","\n"," shortcut_body = {\n"," \"name\": name,\n"," \"path\": path\n"," }\n","\n"," shortcut_specific_template_temp = {\n"," \"target\": {\n"," f\"{vTargetType}\": {\n"," \"location\": \"{location_}\",\n"," \"subpath\": \"{subpath_}\",\n"," \"connectionId\": \"{connectionId_}\"\n"," }\n"," }\n"," }\n","\n"," inputs = {\n"," \"location_\": target_location,\n"," \"subpath_\": target_subpath,\n"," \"connectionId_\": target_connection_id\n"," }\n","\n"," # replace the placeholders\n"," shortcut_specific_template = replace_placeholders_in_json(shortcut_specific_template_temp, inputs)\n"," \n"," # inject the specific template\n"," shortcut_body.update(shortcut_specific_template)\n"," # print(json.dumps(shortcut_body, indent=4))\n","\n"," return shortcut_body"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"e8531147-ecf7-4e0f-8d4c-7888c8b709f3"},{"cell_type":"code","source":["def onelake_shortcut_body(target_type, name, path, target_onelake_workspace_id, target_one_lake_item_id, target_onelake_path):\n","\n"," shortcut_body = {\n"," \"name\": name,\n"," \"path\": path\n"," }\n","\n"," shortcut_specific_template_temp = {\n"," \"target\": {\n"," \"oneLake\": {\n"," \"workspaceId\": \"{workspaceId_}\",\n"," \"itemId\": \"{itemId_}\",\n"," \"path\": \"{path_}\"\n"," }\n"," }\n"," }\n","\n"," inputs = {\n"," \"workspaceId_\": target_onelake_workspace_id,\n"," \"itemId_\": target_one_lake_item_id,\n"," \"path_\": target_onelake_path\n"," }\n","\n"," # replace the placeholders\n"," shortcut_specific_template = replace_placeholders_in_json(shortcut_specific_template_temp, inputs)\n"," \n"," # inject the specific template\n"," shortcut_body.update(shortcut_specific_template)\n"," # print(json.dumps(shortcut_body, indent=4))\n","\n"," return shortcut_body"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"080a9c11-1eff-43db-b2c2-d06a7fb4fdcc"},{"cell_type":"code","source":["def create_lakehouse(lakehouse_name, url, headers, operation, workspace_target_id, item_type, sleep_in_seconds, debug_mode):\n"," \n"," # create the json body\n"," body = {\n"," \"displayName\": f\"{lakehouse_name}\",\n"," \"type\": \"Lakehouse\",\n"," \"description\": f\"Lakehouse {lakehouse_name} created by fabric deployment notebook\"\n"," }\n","\n"," # create the lakehouse\n"," create_or_update_fabric_item(url, headers, body, 'post', operation, workspace_target_id, lakehouse_name, item_type, sleep_in_seconds, debug_mode)"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"30c2d4d0-cc7c-4fd6-ad49-1db625af298b"},{"cell_type":"code","source":["def create_notebook(notebook_name, url, headers, operation, workspace_target_id, item_type, sleep_in_seconds, debug_mode):\n","\n"," # create the json body\n"," body = {\n"," \"displayName\": f\"{notebook_name}\",\n"," \"type\": \"Notebook\",\n"," \"description\": f\"Notebook {notebook_name} created by fabric deployment notebook\"\n"," }\n","\n"," # create the notebook\n"," create_or_update_fabric_item(url, headers, body, 'post', operation, workspace_target_id, notebook_name, item_type, sleep_in_seconds, debug_mode)"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"82f2d13b-1f0a-4aba-af6f-5a140da2b9f0"},{"cell_type":"code","source":["# create the alchemy engine\n","def create_sqlalchemy_engine(connection_string : str):\n"," token = vSqlAccessToken\n"," SQL_COPT_SS_ACCESS_TOKEN = 1256\n","\n"," # the following code is required to structure the token for pyodbc.connect\n"," exptoken = b'';\n"," for i in bytes(token, \"UTF-8\"):\n"," exptoken += bytes({i});\n"," exptoken += bytes(1);\n"," tokenstruct = struct.pack(\"=i\", len(exptoken)) + exptoken;\n","\n"," return sqlalchemy.create_engine(\"mssql+pyodbc://\", creator=lambda: pyodbc.connect(connection_string, attrs_before = { SQL_COPT_SS_ACCESS_TOKEN:bytearray(tokenstruct) }))"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"8e3fa6a7-2e43-4acd-9512-175a8624ff74"},{"cell_type":"markdown","source":["**Identify source lakehouses**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"861bdf0d-c691-49a6-a021-4f96b7bcbe23"},{"cell_type":"code","source":["try:\n","\n"," # define the source lakehouse column dataframe\n"," df_source_lakehouse_columns = pd.DataFrame()\n"," df_source_lakehouse_tables = pd.DataFrame()\n","\n"," # iterate over the source lakehouses \n"," for lakehouse in df_source_lakehouses['Lakehouse Name']:\n","\n"," # 1. get the tables \n"," df = labs.lakehouse.get_lakehouse_tables(lakehouse = lakehouse, workspace = vSourceWorkspaceName)\n","\n"," # append the rows to the dataframe\n"," df_source_lakehouse_tables = pd.concat([df_source_lakehouse_tables, df], ignore_index=True)\n","\n","\n"," # 2. get the tables and columns\n"," df = labs.lakehouse.get_lakehouse_columns(lakehouse = lakehouse, workspace = vSourceWorkspaceName)\n","\n"," # append the rows to the dataframe\n"," df_source_lakehouse_columns = pd.concat([df_source_lakehouse_columns, df], ignore_index=True)\n","\n"," # add the target workspace to the lakehouse columns dataframe\n"," df_source_lakehouse_columns[\"WorkspaceTargetName\"] = vTargetWorkspaceName\n","\n"," # keep the required columns in the lakehouse tables datafreme\n"," columns_to_drop = [\"Workspace Name\", \"Format\", \"Type\", \"Location\"]\n"," df_source_lakehouse_tables = df_source_lakehouse_tables.drop(columns=columns_to_drop)\n","\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'identify source lakehouses', datetime.now(), None, vMessage, ''] \n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'identify source lakehouses', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"a5a89fdf-12ea-4cd0-9e5a-a35cea866569"},{"cell_type":"markdown","source":["**Create target lakehouses and notebooks --> at this stage, these would be empty shells**\n"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"a4509d76-7fae-46de-a136-85f96a6e4d5e"},{"cell_type":"code","source":["try:\n","\n"," # sleep time before checking the operation's status in post requests\n"," vSleepInSeconds = 30\n","\n"," # perform the deployment\n"," for lakehouse in df_source_lakehouses['Lakehouse Name']:\n","\n"," # set the lakehouse name, and the notebooks used to define the lakehouse content\n"," vLakehouseName = lakehouse\n"," vTargetNotebookName = \"nb_\" + vLakehouseName + \"_definition\"\n"," vTargetSqlNotebookName = \"nb_\" + vLakehouseName + \"_sql_definition\"\n","\n","\n"," # filter the target lakehouse dataframe on the current lakehouse\n"," df_target_lakehouses_in_scope = df_target_lakehouses[df_target_lakehouses['Lakehouse Name']==vLakehouseName] \n","\n"," # set the url\n"," vUrl = vBaseUrl + f\"workspaces/{vTargetWorkspaceId}/items\" \n","\n"," # if the target lakehouse dataframe is empty --> create the lakehouse\n"," if df_target_lakehouses_in_scope.empty:\n","\n","\n"," # create the lakehouse\n"," create_lakehouse(vLakehouseName, vUrl, vHeaders, \"creating\", vTargetWorkspaceId, \"lakehouse\", vSleepInSeconds, pDebugMode)\n","\n"," # create the correspondant notebook\n"," create_notebook(vTargetNotebookName, vUrl, vHeaders, \"creating\", vTargetWorkspaceId, \"notebook\", vSleepInSeconds, pDebugMode)\n","\n"," # create the correspondant sql notebook\n"," create_notebook(vTargetSqlNotebookName, vUrl, vHeaders, \"creating\", vTargetWorkspaceId, \"notebook\", vSleepInSeconds, pDebugMode)\n","\n"," else:\n"," # create the correspondant notebook\n"," create_notebook(vTargetNotebookName, vUrl, vHeaders, \"creating\", vTargetWorkspaceId, \"notebook\", vSleepInSeconds, pDebugMode)\n","\n"," # create the correspondant sql notebook\n"," create_notebook(vTargetSqlNotebookName, vUrl, vHeaders, \"creating\", vTargetWorkspaceId, \"notebook\", vSleepInSeconds, pDebugMode) \n","\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'create target lakehouses and notebooks', datetime.now(), None, vMessage, ''] \n","\n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'create target lakehouses and notebooks', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"07b2f563-7fc4-4d79-b4e6-c40002ce7441"},{"cell_type":"markdown","source":["**Source and target sql analytics endpoints**"],"metadata":{"nteract":{"transient":{"deleting":false}}},"id":"8d7f61c9-59bb-49e2-955f-66de779b1d9d"},{"cell_type":"code","source":["df_target_lakehouses = labs.list_lakehouses(workspace=vTargetWorkspaceName)\n","vSourceSqlEndpoint = df_source_lakehouses.loc[0, 'SQL Endpoint Connection String']\n","vTargetSqlEndpoint = df_target_lakehouses.loc[0, 'SQL Endpoint Connection String']"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"26b83e75-53fd-4d4a-a782-39b19276966d"},{"cell_type":"markdown","source":["**Identify source shortcuts, folders, access roles and sql objects**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"0b98147e-7e99-400d-8510-c10bc7d2dba2"},{"cell_type":"code","source":["try:\n","\n"," # define the dataframes\n"," df_shortcuts = pd.DataFrame()\n"," df_folders = pd.DataFrame()\n"," df_onelake_roles = pd.DataFrame()\n"," df_sql_objects = pd.DataFrame()\n"," df_routines = pd.DataFrame()\n","\n"," # get a token for storage\n"," vOnelakeHeaders = {\"authorization\": f\"bearer {mssparkutils.credentials.getToken('storage')}\"}\n","\n"," # iterate over the lakehouses\n"," for index, row in df_source_lakehouses.iterrows():\n","\n"," # set the lakehouse name and id\n"," vLakehouseName = row['Lakehouse Name']\n"," vLakehouseSourceId = row['Lakehouse ID']\n"," vLakehouseTargetId = labs.resolve_lakehouse_id(lakehouse=vLakehouseName, workspace=vTargetWorkspaceName)\n","\n","\n"," # 1. extract shortcuts \n"," vExtractionType = \"shortcuts\"\n"," vShortcutUrl = f\"workspaces/{vSourceWorkspaceId}/items/{vLakehouseSourceId}/shortcuts\"\n"," vUrl = vBaseUrl + vShortcutUrl\n","\n"," print(f\"extracting shortcuts for lakehouse {vLakehouseName}\")\n","\n"," # create the api global dataframe for shortcuts\n"," api_call_global_dataframe = pd.DataFrame()\n","\n"," try:\n"," \n"," # make the api call\n"," api_call_main(vUrl, vHeaders, pDebugMode, vExtractionType)\n"," api_call_global_dataframe['WorkspaceSourceId'] = vSourceWorkspaceId\n"," api_call_global_dataframe['WorkspaceTargetId'] = vTargetWorkspaceId \n"," api_call_global_dataframe['LakehouseTargetName'] = vLakehouseName\n"," api_call_global_dataframe['LakehouseTargetId'] = vLakehouseTargetId\n","\n","\n"," # concat to the correspondant dataframe\n"," df_shortcuts = pd.concat([df_shortcuts, api_call_global_dataframe], ignore_index=True)\n","\n"," # logging\n"," vMessage = f\"extracting shortcuts for lakehouse {vLakehouseName} succeeded\"\n"," print(vMessage)\n","\n"," except Exception as e:\n"," vMessage = f\"extracting shortcuts for lakehouse {vLakehouseName} failed\"\n"," print(vMessage)\n"," print(str(e))\n","\n","\n"," # 2. extract folders\n"," vExtractionType = \"file_system\"\n"," vUrl = f'https://onelake.dfs.fabric.microsoft.com/{vSourceWorkspaceName}/{vLakehouseName}.lakehouse/Files?recursive=True&resource=filesystem'\n"," print(f\"extracting folders for lakehouse {vLakehouseName}\")\n","\n"," # create the api global dataframe\n"," api_call_global_dataframe = pd.DataFrame()\n","\n"," try:\n"," \n"," # make the api call\n"," api_call_main(vUrl, vOnelakeHeaders, pDebugMode, vExtractionType)\n","\n"," api_call_global_dataframe['FolderName'] = api_call_global_dataframe['name'].replace(vLakehouseSourceId + \"/\", '', regex=True)\n"," api_call_global_dataframe_new = api_call_global_dataframe[['FolderName','isDirectory']]\n","\n"," api_call_global_dataframe_new['WorkspaceSourceId'] = vSourceWorkspaceId\n"," api_call_global_dataframe_new['WorkspaceTargetId'] = vTargetWorkspaceId \n"," api_call_global_dataframe_new['LakehouseTargetName'] = vLakehouseName\n"," api_call_global_dataframe_new['LakehouseTargetId'] = vLakehouseTargetId\n","\n"," df_folders_temp = api_call_global_dataframe_new[api_call_global_dataframe_new['isDirectory'] == 'true']\n","\n"," # concat to the correspondant dataframe\n"," df_folders = pd.concat([df_folders, df_folders_temp], ignore_index=True)\n","\n"," # logging\n"," vMessage = f\"extracting folders of lakehouse {vLakehouseName} succeeded\"\n"," print(vMessage)\n","\n"," except Exception as e:\n"," vMessage = f\"extracting files and folders of lakehouse {vLakehouseName} failed\"\n"," print(vMessage)\n"," print(str(e))\n","\n"," # 3. extract onelake access\n"," # if no custom roles are provided for the target lakehouses, use roles defined in the source lakehouses\n"," if vCustomRoles == \"no\":\n","\n"," vExtractionType = \"onelake_access\"\n"," vShortcutUrl = f\"workspaces/{vSourceWorkspaceId}/items/{vLakehouseSourceId}/dataAccessRoles\"\n"," vUrl = vBaseUrl + vShortcutUrl\n"," print(f\"extracting onelake access for lakehouse {vLakehouseName}\") \n","\n"," # create the api global dataframe for shortcuts\n"," api_call_global_dataframe = pd.DataFrame()\n","\n"," try:\n"," \n"," # make the api call\n"," api_call_main(vUrl, vHeaders, pDebugMode, vExtractionType)\n","\n"," api_call_global_dataframe['WorkspaceSourceId'] = vSourceWorkspaceId\n"," api_call_global_dataframe['WorkspaceTargetId'] = vTargetWorkspaceId \n"," api_call_global_dataframe['lakehouse'] = vLakehouseName\n"," api_call_global_dataframe['LakehouseTargetId'] = vLakehouseTargetId\n","\n","\n"," # concat to the correspondant dataframe\n"," df_onelake_roles = pd.concat([df_onelake_roles, api_call_global_dataframe], ignore_index=True)\n","\n"," # prepare the rules, entra members and item members dataframes\n"," df_role_rules = flatten_nested_json_df(df_onelake_roles[['id', 'decisionRules']].explode('decisionRules').dropna())\n"," condition_1 = (df_role_rules[\"decisionRules.permission.attributeName\"] == \"Action\") & (df_role_rules[\"decisionRules.permission.attributeValueIncludedIn\"] != \"Read\")\n"," df_role_rules_1 = df_role_rules[~condition_1]\n"," condition_2 = (df_role_rules_1[\"decisionRules.permission.attributeName\"] == \"Path\") & (df_role_rules_1[\"decisionRules.permission.attributeValueIncludedIn\"] == \"Read\")\n"," df_role_rules_2 = df_role_rules_1[~condition_2]\n"," df_role_rules = df_role_rules_2\n"," df_entra_members = flatten_nested_json_df(df_onelake_roles[['id', 'members.microsoftEntraMembers']].explode('members.microsoftEntraMembers').dropna())\n"," df_item_members = flatten_nested_json_df(df_onelake_roles[['id', 'members.fabricItemMembers']].explode('members.fabricItemMembers').dropna()) \n","\n"," # logging\n"," vMessage = f\"extracting onelake access for lakehouse {vLakehouseName} succeeded\"\n"," print(vMessage)\n","\n"," except Exception as e:\n"," vMessage = f\"extracting onelake access for lakehouse {vLakehouseName} failed\"\n"," print(vMessage)\n"," print(str(e))\n"," else: # use the parameters provided for the roles\n","\n"," # onelake roles\n"," onelake_roles = json.loads(pOnelakeRoles)\n"," df_onelake_roles = pd.DataFrame(onelake_roles)\n","\n"," # onelake rules\n"," role_rules = json.loads(pOnelakeRules)\n"," df_role_rules = pd.DataFrame(role_rules)\n","\n"," # onelake entra members\n"," entra_members = json.loads(pOnelakeEntraMembers)\n"," df_entra_members = pd.DataFrame(entra_members)\n","\n"," # onelake item members\n"," item_members = json.loads(pOnelakeItemMembers)\n"," df_item_members = pd.DataFrame(item_members) \n","\n","\n"," # 4. extaction sql objects created in the sql endpoint\n"," print(f\"extracting routines and views for lakehouse {vLakehouseName}\")\n"," vSqlStatement = \"\"\"\n"," SELECT \n"," a.ROUTINE_CATALOG AS LakehouseName, \n"," a.ROUTINE_SCHEMA AS SchemaName, \n"," a.ROUTINE_NAME AS ObjectName, \n"," '' AS DropStatement,\n"," REPLACE(a.ROUTINE_DEFINITION, 'CREATE', 'CREATE OR ALTER') AS CreateStatement,\n"," 'Routines' AS ObjectType\n"," FROM \n"," INFORMATION_SCHEMA.ROUTINES a\n"," UNION\n"," SELECT \n"," TABLE_CATALOG, \n"," TABLE_SCHEMA, \n"," TABLE_NAME, \n"," '' AS DropStatement,\n"," REPLACE(VIEW_DEFINITION, 'CREATE', 'CREATE OR ALTER') AS CreateStatement, \n"," 'View' AS ObjectType\n"," FROM \n"," INFORMATION_SCHEMA.VIEWS\n"," WHERE \n"," TABLE_SCHEMA NOT IN ('sys','queryinsights')\n"," \"\"\"\n","\n"," spark_df_sql_objects = spark.read.option(Constants.WorkspaceId, vSourceWorkspaceId).option(Constants.DatabaseName, vLakehouseName).synapsesql(vSqlStatement)\n"," df_sql_objects_temp = spark_df_sql_objects.toPandas()\n"," df_sql_objects = pd.concat([df_sql_objects, df_sql_objects_temp], ignore_index=True)\n","\n"," # 5. extraction of security policies\n"," print(f\"extracting security for lakehouse {vLakehouseName}\")\n"," vSqlStatement = f\"\"\"\n"," SELECT \n"," '{vLakehouseName}' AS LakehouseName,\n"," schema_name AS SchemaName, \n"," policy_name AS ObjectName,\n"," 'DROP SECURITY POLICY IF EXISTS ' + policy_name_new AS DropStatement,\n"," CONCAT(\n"," create_statement,\n"," policy_name_new,\n"," filter_predicate,\n"," CASE is_enabled\n"," WHEN 0 THEN ' WITH (STATE = OFF)'\n"," ELSE ' WITH (STATE = ON)'\n"," END \n"," ) AS CreateStatement,\n"," 'Security Policy' AS ObjectType\n"," FROM \n"," (\n"," SELECT \n"," schema_name, policy_name, create_statement, policy_name_new, is_enabled, STRING_AGG(filter_predicate, ',') AS filter_predicate\n"," FROM \n"," (\n"," SELECT \n"," pol_schema.name AS schema_name,\n"," pol.name as policy_name,\n"," 'CREATE SECURITY POLICY ' AS create_statement,\n"," '[' + pol_schema.name + '].[' + pol.name + ']' AS policy_name_new,\n"," pol.is_enabled,\n"," ' ADD FILTER PREDICATE ' \n"," + RIGHT(LEFT(pre.predicate_definition, LEN(pre.predicate_definition)-1),LEN(pre.predicate_definition)-2)\n"," + ' ON [' + obj_schema.name + '].[' + obj.name + ']' \n"," AS filter_predicate\n","\n"," FROM \n"," sys.security_policies pol\n"," INNER JOIN sys.schemas pol_schema \n"," ON pol_schema.schema_id = pol.schema_id\n"," INNER JOIN sys.security_predicates pre\n"," ON pre.object_id = pol.object_id\n"," INNER JOIN sys.objects obj \n"," ON obj.object_id = pre.target_object_id\n"," INNER JOIN sys.schemas obj_schema \n"," ON obj_schema.schema_id = obj.schema_id\n"," ) a\n"," GROUP BY \n"," schema_name, policy_name, create_statement, policy_name_new, is_enabled\n"," ) b\n"," \"\"\"\n","\n"," spark_df_sql_objects = spark.read.option(Constants.WorkspaceId, vSourceWorkspaceId).option(Constants.DatabaseName, vLakehouseName).synapsesql(vSqlStatement)\n"," df_sql_objects_temp = spark_df_sql_objects.toPandas()\n"," df_sql_objects = pd.concat([df_sql_objects, df_sql_objects_temp], ignore_index=True)\n","\n","\n"," # format shortcuts dataframe\n"," # get the column names\n"," columns = df_shortcuts.columns.values.tolist()\n","\n"," # iterate over the column name and rename after capitalizin the first letter\n"," for columnName in columns:\n","\n"," # split the column name, take the last item and upper case the first letter\n"," processed_column = process_column_name(columnName, '.')\n","\n"," # replace the column name in the dataframe\n"," df_shortcuts.rename(columns={columnName: processed_column}, inplace=True)\n","\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'identify shortcuts, folders and access roles', datetime.now(), None, vMessage, ''] \n","\n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'identify shortcuts, folders and access roles', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"f5e765a7-7893-4910-8a4e-4bba1ff3f3c0"},{"cell_type":"markdown","source":["**Exclude shortcuts from tables**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"30d7b884-a8e4-45ac-a5c1-84c525486db0"},{"cell_type":"code","source":["try:\n"," shortcuts_columns_subset = ['LakehouseTargetName', 'Name']\n"," df_shortcuts_tables = df_shortcuts[df_shortcuts['Path']=='/Tables'][shortcuts_columns_subset]\n"," df_shortcuts_tables.rename(columns={\"Name\":\"Table Name\",\"LakehouseTargetName\":\"Lakehouse Name\"},inplace=True)\n"," df_tables_exclude_shortcut = df_source_lakehouse_tables.merge(df_shortcuts_tables, on=df_source_lakehouse_tables.columns.tolist(), how='left', indicator=True)\n"," df_tables_exclude_shortcut = df_tables_exclude_shortcut[df_tables_exclude_shortcut['_merge'] == 'left_only'].drop(columns=['_merge'])\n"," df_source_lakehouse_columns = pd.merge(df_source_lakehouse_columns, df_tables_exclude_shortcut, on=[\"Lakehouse Name\", \"Table Name\"], how=\"inner\")\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'exclude shortcuts from tables', datetime.now(), None, vMessage, ''] \n","\n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'exclude shortcuts from tables', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"a9ec33b2-8cef-4cdf-b858-c219984deb16"},{"cell_type":"markdown","source":["**Exclude shortcuts from folders**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"aedfcece-cf90-4dce-8202-51ff4e28db79"},{"cell_type":"code","source":["try:\n"," shortcuts_columns_subset = ['WorkspaceSourceId','WorkspaceTargetId', 'LakehouseTargetName', 'LakehouseTargetId', 'Name', 'Path']\n"," df_shortcuts_folders = df_shortcuts[df_shortcuts['Path']=='/Files'][shortcuts_columns_subset]\n"," df_shortcuts_folders['FolderName'] = df_shortcuts_folders['Path'].replace(\"/\", '', regex=True) + '/' + df_shortcuts_folders['Name']\n"," df_shortcuts_folders['isDirectory'] = 'true'\n"," df_folders_exclude_shortcut = df_folders.merge(df_shortcuts_folders, on=df_folders.columns.tolist(), how='left', indicator=True)\n"," df_folders_exclude_shortcut = df_folders_exclude_shortcut[df_folders_exclude_shortcut['_merge'] == 'left_only'].drop(columns=['_merge'])\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'exclude shortcuts from folders', datetime.now(), None, vMessage, ''] \n","\n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'exclude shortcuts from folders', datetime.now(), None, vMessage, str(e)]\n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"f5dc886e-9c7c-424f-a9c2-4e9990e665b8"},{"cell_type":"code","source":["# define the dataframe containing the sql definitions for the deployment\n","df_input_deployment = pd.DataFrame()\n","\n","try:\n","\n"," # recreate the list of lakehouses \n"," df_target_lakehouses = labs.list_lakehouses(workspace = vTargetWorkspaceName)\n"," \n","\n"," # iterate over the target lakehouses \n"," for lakehouse in df_target_lakehouses['Lakehouse Name']:\n","\n"," # get the source lakehouse tables and columns \n"," df_source_lakehouse_columns_in_scope = df_source_lakehouse_columns[df_source_lakehouse_columns['Lakehouse Name']==lakehouse]\n","\n"," # get the lakehouse tables and columns\n"," # this call brings also table shortcut, they need to be excluded. df_shortcut_tables can be used to this effect.\n"," df_target_lakehouses_columns_temp = labs.lakehouse.get_lakehouse_columns(lakehouse = lakehouse, workspace = vTargetWorkspaceName)\n"," df_target_lakehouses_columns = df_target_lakehouses_columns_temp.merge(df_shortcuts_tables, on=['Lakehouse Name', 'Table Name'], how='left', indicator=True)\n"," df_target_lakehouses_columns = df_target_lakehouses_columns[df_target_lakehouses_columns['_merge'] == 'left_only'].drop(columns=['_merge'])\n","\n"," if df_target_lakehouses_columns.empty:\n","\n"," vMessage = f\"target lakehouse <{lakehouse} is empty. retrieve the full list of tables from the source lakehouse as an input for table definitions.\"\n"," print(vMessage)\n","\n"," # if the target lakehouse is empty, the source lakehouse definition is used for the target lakehouse\n"," df_target_lakehouses_columns = df_source_lakehouse_columns_in_scope\n"," # df_target_lakehouses_columns = pd.concat([df_target_lakehouses_columns, df_target_lakehouses_columns_temp], ignore_index=True)\n","\n"," # format the input for deployment\n"," df_input_deployment_temp = input_for_full_deployment(df_target_lakehouses_columns)\n","\n"," # concat to the dataframe\n"," df_input_deployment = pd.concat([df_input_deployment, df_input_deployment_temp], ignore_index=True)\n","\n"," else:\n"," \n"," vMessage = f\"target lakehouse <{lakehouse} is not empty. retrieve an increment list of changes from the source lakehouse as an input for table definitions.\"\n"," print(vMessage)\n","\n"," # align the structure of df to df_source_lakehouse_columns\n"," df_target_lakehouses_columns[\"WorkspaceTargetName\"] = vTargetWorkspaceName\n","\n"," # # replace the source workspace name by source workspace and the add the target workspace \n"," # # this will alow excluding rows in source not in target workspace\n"," # df_target_lakehouses_columns[\"WorkspaceName\"] = vSourceWorkspaceName\n"," # df_target_lakehouses_columns[\"WorkspaceTargetName\"] = vTargetWorkspaceName\n","\n"," # identify the incremental\n","\n"," # 1. tables in source but not in target lakehouse\n"," df_source_lakehouse_tables = df_source_lakehouse_columns_in_scope[['Lakehouse Name', 'Table Name']].drop_duplicates()\n"," df_target_lakehouse_tables = df_target_lakehouses_columns[['Lakehouse Name', 'Table Name']].drop_duplicates()\n"," df_tables_in_source_only_temp = pd.merge(df_source_lakehouse_tables, df_target_lakehouse_tables, on=['Lakehouse Name', 'Table Name'], how='left', indicator=True)\n"," df_tables_in_source_only_temp = df_tables_in_source_only_temp[df_tables_in_source_only_temp['_merge'] == 'left_only'].drop(columns=['_merge'])\n"," df_tables_in_source_only = pd.merge(df_source_lakehouse_columns_in_scope, df_tables_in_source_only_temp, on=['Lakehouse Name', 'Table Name'], how='inner') \n","\n"," \n"," # 2. tables in both source and target lakehouses but with a structural change (added columns, deleted columns, changed Data Type)\n"," \n"," # 2.1 find common tables\n"," df_tables_in_common = pd.merge(df_source_lakehouse_tables, df_target_lakehouse_tables, on=['Lakehouse Name', 'Table Name'], how='inner', indicator=True)\n","\n"," # 2.2 tables in common, source columns\n"," df_tables_in_common_source_columns = pd.merge(df_source_lakehouse_columns_in_scope, df_tables_in_common[['Lakehouse Name', 'Table Name']], on=['Lakehouse Name', 'Table Name'], how='inner') \n","\n"," # 2.2 tables in common, columns only in source --> ALTER TABLE ADD COLUMN\n"," df_tables_in_common_source_columns_only_temp = pd.merge(df_tables_in_common_source_columns,df_target_lakehouses_columns, on=['Lakehouse Name', 'Table Name', 'Column Name'], how='left', suffixes=('', '_target'))\n"," df_tables_in_common_source_columns_only_temp = df_tables_in_common_source_columns_only_temp[df_tables_in_common_source_columns_only_temp.isna().any(axis=1)]\n"," df_tables_in_common_source_columns_only = df_tables_in_common_source_columns_only_temp[df_tables_in_common_source_columns.columns] \n","\n"," # 2.3 tables in common, columns only in target --> ALTER TABLE DROP COLUMN\n"," # dropping columns requires the table property delta.columnMapping.mode = name\n"," # example ALTER TABLE lake.Table Name SET TBLPROPERTIES ('delta.columnMapping.mode' = 'name','delta.minReaderVersion' = '2','delta.minWriterVersion' = '5')\n"," df_tables_in_common_target_columns_only_temp = df_target_lakehouses_columns.merge(df_tables_in_common_source_columns, on=['Lakehouse Name', 'Table Name', 'Column Name'], how='left', suffixes=('', '_source'))\n"," df_tables_in_common_target_columns_only_temp = df_tables_in_common_target_columns_only_temp[df_tables_in_common_target_columns_only_temp.isna().any(axis=1)]\n"," df_tables_in_common_target_columns_only = df_tables_in_common_target_columns_only_temp[df_target_lakehouses_columns.columns]\n"," df_tables_in_common_target_columns_only\n","\n"," # 2.4, tables in common, columns in common, but data type changed --> ALTER TABLE ALTER COLUMN\n"," df_data_type_comparison = pd.merge(df_tables_in_common_source_columns, df_target_lakehouses_columns, on=[\"Lakehouse Name\", \"Table Name\", \"Column Name\"], suffixes=('', '_target'))\n"," df_data_type_comparison[\"is_different\"] = df_data_type_comparison[\"Data Type\"] != df_data_type_comparison[\"Data Type_target\"]\n"," df_tables_in_common_data_type_changed = df_data_type_comparison.loc[df_data_type_comparison[\"is_different\"], df_tables_in_common_source_columns.columns]\n","\n","\n"," if not df_tables_in_source_only.empty:\n"," df_input_deployment_1 = input_for_full_deployment(df_tables_in_source_only)\n"," else:\n"," df_input_deployment_1 = pd.DataFrame()\n","\n"," if not df_tables_in_common_source_columns_only.empty:\n"," df_input_deployment_2 = input_for_incremental_deployment(df_tables_in_common_source_columns_only, 'alter table add column')\n"," else:\n"," df_input_deployment_2 = pd.DataFrame()\n","\n"," if not df_tables_in_common_target_columns_only.empty:\n"," df_input_deployment_3 = input_for_incremental_deployment(df_tables_in_common_target_columns_only, 'alter table drop column')\n"," else:\n"," df_input_deployment_3 = pd.DataFrame()\n","\n"," if not df_tables_in_common_data_type_changed.empty:\n"," df_input_deployment_4 = input_for_incremental_deployment(df_tables_in_common_data_type_changed, 'alter table alter column')\n"," else:\n"," df_input_deployment_4 = pd.DataFrame()\n","\n"," # concatenate the different inputs if not empty\n"," dfs = [df_input_deployment_1, df_input_deployment_2, df_input_deployment_3, df_input_deployment_4]\n"," non_empty_dfs = [df for df in dfs if not df.empty]\n"," df_input_deployment_temp = pd.concat(non_empty_dfs, ignore_index=True) if non_empty_dfs else pd.DataFrame()\n"," \n"," # concat to the dataframe\n"," df_input_deployment = pd.concat([df_input_deployment, df_input_deployment_temp], ignore_index=True)\n","\n"," # # logging\n"," # vMessage = f\"succeeded\"\n"," # dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'identify tables in incremental mode', datetime.now(), None, vMessage, ''] \n","\n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'identify tables in incremental mode', datetime.now(), None, vMessage, str(e)]\n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"a903de67-b235-48e4-8dc3-7d23739e4779"},{"cell_type":"markdown","source":["**Update the notebook with the definition of the tables and run it against the target lakehouse to define the tables**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"bd58c13a-80f1-4bda-8e7f-f7f54c2554b4"},{"cell_type":"code","source":["try:\n","\n","\n"," # sleep time before checking the operation's status in post requests\n"," vSleepInSeconds = 30\n","\n","\n"," # perform the deployment\n"," for lakehouse in df_source_lakehouses['Lakehouse Name']:\n","\n"," # set the lakehouse name\n"," vLakehouseName = lakehouse\n","\n"," # get the create table statements related to the current lakehouse\n"," df_lakehouse_table_statements_current = df_input_deployment[df_input_deployment['Lakehouse Name'] == vLakehouseName]\n","\n"," # update the target notebook and run it only if there are sql statements to run\n"," if not df_lakehouse_table_statements_current.empty:\n","\n"," # define the target notebook name\n"," vTargetNotebookName = \"nb_\" + vLakehouseName + \"_definition\"\n","\n","\n"," # notebook definition template\n"," json_notebook_definition_temp = {\n"," \"nbformat\": 4,\n"," \"nbformat_minor\": 5,\n"," \"cells\": [],\n"," \"metadata\": {\n"," \"language_info\": {\n"," \"name\": \"sql\"\n"," },\n"," \"dependencies\": {\n"," \"lakehouse\": {\n"," \"default_lakehouse\": \"{default_lakehouse_}\",\n"," \"default_lakehouse_name\": \"{default_lakehouse_name_}\",\n"," \"default_lakehouse_workspace_id\": \"{default_lakehouse_workspace_id_}\"\n"," }\n"," }\n"," }\n"," }\n","\n"," # set the url\n"," vUrl = vBaseUrl + f\"workspaces/{vTargetWorkspaceId}/items\"\n","\n"," # this part of the code works for full and incremental deployment\n"," # resolve lakehouse and notebook id\n"," vLakehouseTargetId = labs.resolve_lakehouse_id(lakehouse=vLakehouseName, workspace=vTargetWorkspaceName)\n"," vNotebookTargetId = fabric.resolve_item_id( item_name=vTargetNotebookName, type=\"Notebook\", workspace=vTargetWorkspaceName)\n"," # print(vNotebookTargetId)\n","\n"," # prepare the default inputs for the notebook definition\n"," default_inputs_for_notebook_definition = {\n"," \"default_lakehouse_\" : vLakehouseTargetId,\n"," \"default_lakehouse_name_\" : vLakehouseName,\n"," \"default_lakehouse_workspace_id_\" : vTargetWorkspaceId\n"," }\n","\n","\n","\n","\n"," # add a new cell in the notebood definition \n"," for sql_statement in df_lakehouse_table_statements_current['SqlStatement']:\n"," new_cell = {\n"," \"cell_type\": \"code\",\n"," \"source\": [sql_statement]\n"," }\n"," json_notebook_definition_temp[\"cells\"].append(new_cell)\n","\n"," # get the folders of the current lakehouse\n"," df_folders_current = df_folders_exclude_shortcut[df_folders_exclude_shortcut['LakehouseTargetName'] == vLakehouseName]\n","\n"," if not df_folders_current.empty:\n"," for folder in df_folders_current['FolderName']:\n"," new_cell = {\n"," \"cell_type\": \"code\",\n"," \"source\": [\n"," f\"\"\"%%pyspark\n"," mssparkutils.fs.mkdirs('{folder}')\"\"\"\n"," ]\n"," }\n"," json_notebook_definition_temp[\"cells\"].append(new_cell)\n","\n","\n"," # replace the placeholders\n"," json_notebook_definition = replace_placeholders_in_json(json_notebook_definition_temp, default_inputs_for_notebook_definition)\n","\n"," # final json definition\n"," json_notebook_definition_new = json.loads(json.dumps(json_notebook_definition, indent=4))\n"," # print(json.dumps(json_notebook_definition, indent=4))\n","\n"," # base64 encoding for the api call\n"," json_notebook_definition_new_encoded = base64.b64encode(json.dumps(json_notebook_definition_new, indent=4).encode('utf-8')).decode('utf-8')\n","\n","\n"," # 3. update the notebook definition\n","\n"," # set the url for the update\n"," vUrl = vBaseUrl + f\"workspaces/{vTargetWorkspaceId}/notebooks/{vNotebookTargetId}/updateDefinition\"\n","\n"," # set the body\n"," vJsonBody = {\n"," \"definition\": {\n"," \"format\": \"ipynb\",\n"," \"parts\": [\n"," {\n"," \"path\": \"notebook-content.py\",\n"," \"payload\": f\"{json_notebook_definition_new_encoded}\",\n"," \"payloadType\": \"InlineBase64\"\n"," }\n"," ]\n"," }\n"," }\n","\n","\n"," # update the notebook definition\n"," # the update notebook definition as of 19.11.2024 has an issue when executin the operation url when the response status code is 202\n"," # it returns an error although the update is successful\n"," create_or_update_fabric_item(vUrl, vHeaders, vJsonBody, 'post', \"updating\", vTargetWorkspaceId, vTargetNotebookName, \"Notebook\", vSleepInSeconds, pDebugMode) \n","\n"," # 4. run the notebook\n","\n"," # set the url\n"," vUrl = vBaseUrl + f\"workspaces/{vTargetWorkspaceId}/items/{vNotebookTargetId}/jobs/instances?jobType=RunNotebook\"\n","\n"," # run the notebook\n"," create_or_update_fabric_item(vUrl, vHeaders, None, 'post', \"executing\", vTargetWorkspaceId, vTargetNotebookName, \"Notebook\", vSleepInSeconds, pDebugMode) \n","\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'create tables and folders in target lakehouses', datetime.now(), None, vMessage, ''] \n","\n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'create tables and folders in target lakehouses', datetime.now(), None, vMessage, str(e)]\n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"96b47a0f-823f-4ca9-8689-20b144398311"},{"cell_type":"markdown","source":["**Use the commented cell to check a notebook definition if required**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"a222f20f-8a5e-4813-bf55-568a51a49599"},{"cell_type":"markdown","source":["nb = json.loads(\n"," notebookutils.notebook.getDefinition(\n"," \"nb_saleslake_definition_\", #nane of the notebook\n"," workspaceId=vTargetWorkspaceId\n"," )\n",")\n","print(json.dumps(nb, indent=4))"],"metadata":{"nteract":{"transient":{"deleting":false}}},"id":"4f01f115-0a6b-424e-92ab-439e84b8b0b1"},{"cell_type":"markdown","source":["**Create shortcuts in target lakehouses**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"8860fb53-1728-4105-9c73-6165598f11e3"},{"cell_type":"code","source":["try:\n"," vSleepInSeconds = 30\n"," for index, row in df_shortcuts.iterrows():\n","\n"," # set the variables\n"," # common inputs \n"," vName = row['Name']\n"," vPath = row['Path']\n"," vTargetType = row['TargetType']\n","\n"," # specific to onelake\n"," vTargetOneLakeWorkspaceId = row['TargetOneLakeWorkspaceId']\n"," vTargetOneLakeItemId = row['TargetOneLakeItemId']\n"," vTargetOneLakePath = row['TargetOneLakePath']\n","\n"," # specific to adls gen2\n"," vTargetAdlsGen2Location = row['TargetAdlsGen2Location']\n"," vTargetAdlsGen2Subpath = row['TargetAdlsGen2Subpath']\n"," vTargetAdlsGen2ConnectionId = row['TargetAdlsGen2ConnectionId']\n","\n"," # todo\n"," # specific to AmazonS3\n"," # specific to GoogleCloudStorage\n"," # specific to S3Compatible\n","\n"," # target lakehouse id\n"," vLakehouseTargetId = row['LakehouseTargetId']\n","\n"," # shortcut url\n"," vShortcutUrl = f\"workspaces/{vTargetWorkspaceId}/items/{vLakehouseTargetId}/shortcuts?shortcutConflictPolicy={vShortcutConflictPolicy}\"\n"," vUrl = vBaseUrl + vShortcutUrl\n","\n"," # request body\n"," if vTargetType in [\"AdlsGen2\", \"AmazonS3\", \"GoogleCloudStorage\", \"S3Compatible\"]:\n"," vJsonBody = shortcut_body(vTargetType, vName, vPath, vTargetAdlsGen2Location, vTargetAdlsGen2Subpath, vTargetAdlsGen2ConnectionId)\n"," elif vTargetType == \"OneLake\":\n"," vJsonBody = onelake_shortcut_body(vTargetType, vName, vPath, vTargetOneLakeWorkspaceId, vTargetOneLakeItemId, vTargetOneLakePath)\n"," else:\n"," # use case to be implemented\n"," vJsonBody = \"\"\n","\n"," # create the shortcut\n"," create_or_update_fabric_item(vUrl, vHeaders, vJsonBody, 'post', \"creating\", vTargetWorkspaceId, vName, \"Shortcut\", vSleepInSeconds, pDebugMode) \n","\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'create shortcuts in target lakehouses', datetime.now(), None, vMessage, ''] \n","\n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'create shortcuts in target lakehouses', datetime.now(), None, vMessage, str(e)] \n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"e3b8e454-ee46-4bdd-b1ac-2cee1665d98a"},{"cell_type":"markdown","source":["**Enable onelake security on target lakehouses**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"35d2e39b-7a61-4ebb-bd03-f9c5cfe3385a"},{"cell_type":"code","source":["try:\n"," # get a onelake token\n"," vOnelakeHeaders = {\"authorization\": f\"bearer {mssparkutils.credentials.getToken('storage')}\"}\n","\n"," # iterate over the target lakehouses\n"," for index, row in df_target_lakehouses.iterrows():\n","\n"," vLakehouseTargetId = row['Lakehouse ID']\n","\n"," vUrl = f'https://onelake.dfs.fabric.microsoft.com/v1.0/workspaces/{vTargetWorkspaceId}/artifacts/{vLakehouseTargetId}/security/enable'\n","\n"," vJsonBody = {\n"," \"enableOneSecurity\":\"true\"\n"," }\n","\n"," response = requests.post(vUrl, headers=vOnelakeHeaders, json=vJsonBody)\n","\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'enable onelake security on target lakehouses', datetime.now(), None, vMessage, ''] \n","\n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'enable onelake security on target lakehouses', datetime.now(), None, vMessage, str(e)]\n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"5f72c9e3-2877-4a72-9a16-e15feaf487f3"},{"cell_type":"markdown","source":["**Identify source lakehouses onelake roles**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"c525cf5c-3ed2-48ad-b522-4ad67926e0dd"},{"cell_type":"code","source":["try:\n","\n"," # if custom roles are provided\n"," if vCustomRoles == \"yes\":\n","\n"," # load the csv files\n"," df_onelake_access = pd.read_csv(vOnelakeRolesCsvPath)\n"," df_role_rules = pd.read_csv(vRoleRulesCsvPath)\n"," df_item_members = pd.read_csv(vItemMembersCsvPath)\n"," df_entra_members = pd.read_csv(vEntraMembersCsvPath)\n"," else:\n"," # 2. prepare the inputs for the creation\n"," df_role_rules = flatten_nested_json_df(df_onelake_access[['id', 'decisionRules']].explode('decisionRules').dropna())\n"," condition_1 = (df_role_rules[\"decisionRules.permission.attributeName\"] == \"Action\") & (df_role_rules[\"decisionRules.permission.attributeValueIncludedIn\"] != \"Read\")\n"," df_role_rules_1 = df_role_rules[~condition_1]\n"," condition_2 = (df_role_rules_1[\"decisionRules.permission.attributeName\"] == \"Path\") & (df_role_rules_1[\"decisionRules.permission.attributeValueIncludedIn\"] == \"Read\")\n"," df_role_rules_2 = df_role_rules_1[~condition_2]\n"," df_role_rules = df_role_rules_2\n"," df_entra_members = flatten_nested_json_df(df_onelake_access[['id', 'members.microsoftEntraMembers']].explode('members.microsoftEntraMembers').dropna())\n"," df_item_members = flatten_nested_json_df(df_onelake_access[['id', 'members.fabricItemMembers']].explode('members.fabricItemMembers').dropna())\n","\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'identify source lakehouses onelake roles', datetime.now(), None, vMessage, ''] \n","\n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'identify source lakehouses onelake roles', datetime.now(), None, vMessage, str(e)]\n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"831e0e4b-379e-47ed-8c8a-0c05875b916d"},{"cell_type":"markdown","source":["**Create onelake roles in target lakehouses**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"faeb02fe-be45-4c64-a2a3-fc91905f1b2e"},{"cell_type":"code","source":["try:\n","\n"," vSleepInSeconds = 30\n"," for index, row in df_onelake_roles.iterrows():\n","\n"," vRoleId = row['id']\n"," vRoleName = row['name']\n"," vLakehouseTargetName = row['lakehouse']\n"," vLakehouseTargetId = labs.resolve_lakehouse_id(lakehouse=vLakehouseTargetName, workspace=vTargetWorkspaceName)\n"," # print(vRoleId, vRoleName, vLakehouseTargetName, vLakehouseTargetId)\n","\n"," # updating the DefaultReader role via the API deletes it\n"," if vRoleName != \"DefaultReader\":\n"," # role template with decision rules, entra members and item members\n"," role_template = {\n"," \"value\": [\n"," {\n"," \"name\": \"{name_}\",\n"," \"decisionRules\": [\n"," ],\n"," \"members\": {\n"," \"microsoftEntraMembers\": [\n"," ],\n"," \"fabricItemMembers\": [\n"," ]\n"," }\n"," }\n"," ]\n"," }\n","\n"," # replace the role name\n"," role_input = {\n"," \"name_\": vRoleName\n"," }\n"," role_template = replace_placeholders_in_json(role_template, role_input)\n","\n"," # handle decision rules\n"," df_rules_current = df_role_rules[df_role_rules['id']==vRoleId]\n"," for effect in df_rules_current['decisionRules.effect'].drop_duplicates():\n","\n"," vEffectName = effect\n","\n"," # replace the effect name\n"," effect_input = {\n"," \"effect_\": vEffectName\n"," }\n"," rules_template = {\n"," \"effect\": \"{effect_}\",\n"," \"permission\": [\n"," ]\n"," }\n"," rules_template = replace_placeholders_in_json(rules_template, effect_input)\n","\n"," # handle effects\n"," df_effect_current = df_rules_current[(df_rules_current['id']==vRoleId) & (df_rules_current['decisionRules.effect']==vEffectName)]\n","\n"," for attribute in df_effect_current['decisionRules.permission.attributeName'].drop_duplicates():\n","\n"," vAttributeName = attribute\n","\n"," # replace the attribute\n"," permission_input = {\n"," \"attributeName_\": attribute\n"," }\n"," permissions_template = {\n"," \"attributeName\": \"{attributeName_}\",\n"," \"attributeValueIncludedIn\": [\n"," ] \n"," }\n"," permissions_template = replace_placeholders_in_json(permissions_template, permission_input)\n","\n"," # handle attributes\n"," df_attribute_current = df_effect_current[(df_effect_current['id']==vRoleId) & (df_effect_current['decisionRules.effect']==vEffectName) & (df_effect_current['decisionRules.permission.attributeName']==vAttributeName)] \n","\n"," for attribute_included_in in df_attribute_current['decisionRules.permission.attributeValueIncludedIn']:\n"," vAttributeIncludedIn = attribute_included_in\n"," permissions_template[\"attributeValueIncludedIn\"].append(vAttributeIncludedIn)\n","\n"," # append the attributes included in to the permission template\n"," rules_template['permission'].append(permissions_template)\n","\n"," # appedn the rules template to the decision rules in the role template\n"," role_template[\"value\"][0][\"decisionRules\"].append(rules_template)\n","\n","\n"," # handle the entra members\n"," df_entra_member_current = df_entra_members[df_entra_members['id']==vRoleId]\n"," for index, row in df_entra_member_current.iterrows():\n","\n"," vTenantId = row['members.microsoftEntraMembers.tenantId']\n"," vObjectId = row['members.microsoftEntraMembers.objectId']\n","\n"," # set the member template\n"," entra_members_template = {\n"," \"tenantId\": vTenantId,\n"," \"objectId\": vObjectId\n"," }\n","\n"," # append the member template to the role template\n"," role_template[\"value\"][0][\"members\"][\"microsoftEntraMembers\"].append(entra_members_template)\n","\n","\n"," # handle the fabric item members\n"," df_item_members_current = df_item_members[df_item_members['id']==vRoleId]\n"," for item_member in df_item_members_current['members.fabricItemMembers.sourcePath'].drop_duplicates():\n"," \n"," vSourcePath = item_member # row['members.fabricItemMembers.sourcePath']\n"," vTargetPath = vTargetWorkspaceId + \"/\" + vLakehouseTargetId \n","\n"," # replace the source path\n"," items_members_template = {\n"," \"sourcePath\": vTargetPath,\n"," \"itemAccess\": [\n"," ]\n"," }\n","\n"," # handle the item access\n"," df_item_access_current = df_item_members_current[df_item_members_current['members.fabricItemMembers.sourcePath']==vSourcePath] \n"," for item_access in df_item_access_current['members.fabricItemMembers.itemAccess'].drop_duplicates():\n","\n"," vItemAccess = item_access\n","\n"," # append the item access to the member template\n"," items_members_template[\"itemAccess\"].append(vItemAccess) \n","\n"," # append the fabric item template to the role template\n"," role_template[\"value\"][0][\"members\"][\"fabricItemMembers\"].append(items_members_template)\n","\n"," # print(json.dumps(role_template, indent=4))\n"," \n"," vJsonBody = role_template\n","\n"," # url\n"," vRoleUrl = f\"workspaces/{vTargetWorkspaceId}/items/{vLakehouseTargetId}/dataAccessRoles\"\n"," vUrl = vBaseUrl + vRoleUrl\n","\n"," create_or_update_fabric_item(vUrl, vHeaders, vJsonBody, 'put', \"creating/updating\", vTargetWorkspaceId, vRoleName, \"onelake role\", vSleepInSeconds, pDebugMode) \n","\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'create onelake roles in target lakehouses', datetime.now(), None, vMessage, ''] \n","\n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'create onelake roles in target lakehouses', datetime.now(), None, vMessage, str(e)]\n"," if pDebugMode == \"yes\":\n"," print(str(e))\n"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"c509e9fc-4677-4a30-9160-3df7f0deadda"},{"cell_type":"markdown","source":["**Create the sql objects in the target SQL endpoint**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"1fcb077d-dcd8-4c82-a24a-80673ee68460"},{"cell_type":"code","source":["try:\n","\n","\n"," # sleep time before checking the operation's status in post requests\n"," vSleepInSeconds = 30\n","\n","\n"," # perform the deployment\n"," for lakehouse in df_source_lakehouses['Lakehouse Name']:\n","\n"," # set the lakehouse name\n"," vLakehouseName = lakehouse\n","\n"," # get the lakewarehouse id --> this is different than the lakehouse id\n"," vLakehouseWarehouseTargetId = df_target_lakehouses[df_target_lakehouses['Lakehouse Name']== 'lakehouse'].loc[0, 'SQL Endpoint ID']\n"," \n"," # define the target notebook name\n"," vTargetSqlNotebookName = \"nb_\" + vLakehouseName + \"_sql_definition\"\n","\n"," # get the create table statements related to the current lakehouse\n"," df_sql_objects_current = df_sql_objects[df_sql_objects['LakehouseName'] == vLakehouseName]\n","\n","\n"," # update the target notebook and run it only if there are sql statements to run\n"," if not df_sql_objects_current.empty:\n","\n","\n"," # notebook definition template --> this will be a TSQL notebook: add to it the lakewarehouse id as a default warehouse\n"," json_notebook_definition_temp = {\n"," \"nbformat\": 4,\n"," \"nbformat_minor\": 5,\n"," \"cells\": [],\n"," \"metadata\": {\n"," \"kernel_info\": {\n"," \"name\": \"sqldatawarehouse\"\n"," },\n"," \"kernelspec\": {\n"," \"name\": \"sqldatawarehouse\",\n"," \"language\": \"sqldatawarehouse\",\n"," \"display_name\": \"sqldatawarehouse\"\n"," },\n"," \"language_info\": {\n"," \"name\": \"sql\"\n"," },\n"," \"dependencies\": {\n"," \"warehouse\": {\n"," \"known_warehouses\": [\n"," {\n"," \"id\": \"{default_lakewarehouse_}\",\n"," \"type\": \"Lakewarehouse\"\n"," }\n"," ],\n"," \"default_warehouse\": \"{default_lakewarehouse_}\"\n"," },\n"," \"lakehouse\": {}\n"," }\n"," }\n"," }\n","\n"," # set the url\n"," vUrl = vBaseUrl + f\"workspaces/{vTargetWorkspaceId}/items\"\n","\n"," # this part of the code works for full and incremental deployment\n"," # resolve notebook id\n"," vTargetSqlNotebookId = fabric.resolve_item_id( item_name=vTargetSqlNotebookName, type=\"Notebook\", workspace=vTargetWorkspaceName)\n","\n"," # prepare the default inputs for the notebook definition\n"," default_inputs_for_notebook_definition = {\n"," \"default_lakewarehouse_\" : vLakehouseWarehouseTargetId\n"," }\n","\n"," # check if there are security policies defined\n"," # if yes:\n"," # 1. add a cell to drop the security policy --> this will allow altering the predicate function\n"," # 2. add all other cells to create view, functions, etc..\n"," # 3. add a cell to create the security policy\n","\n"," # create cells for droping security policies\n"," sql_objects_contain_security_policies = (df_sql_objects_current['ObjectType'] == 'Security Policy').any()\n"," if sql_objects_contain_security_policies:\n"," df_sql_objects_current_policies = df_sql_objects_current[df_sql_objects_current['ObjectType'] == 'Security Policy']\n","\n"," # iterate over the sql objects of the current lakehouse\n"," for index, row in df_sql_objects_current_policies.iterrows():\n","\n"," # get the ddl statement\n"," vSchemaName = row['SchemaName']\n"," vObjectName = row['ObjectName']\n"," vDropStatement = row['DropStatement']\n","\n"," print(f\"adding a drop security cell for <{vSchemaName}.{vObjectName}>.\")\n","\n"," new_cell = {\n"," \"cell_type\": \"code\",\n"," \"source\": [vDropStatement]\n"," }\n"," json_notebook_definition_temp[\"cells\"].append(new_cell)\n"," \n","\n"," # iterate over the sql objects of the current lakehouse\n"," for index, row in df_sql_objects_current.iterrows():\n","\n"," # get the ddl statement\n"," vSchemaName = row['SchemaName']\n"," vObjectName = row['ObjectName']\n"," vDropStatement = row['DropStatement']\n"," vCreateStatement = row['CreateStatement']\n"," vObjectType = row['ObjectType']\n","\n"," print(f\"adding a create cell for <{vSchemaName}.{vObjectName}>.\")\n","\n"," # add the create statement\n"," new_cell = {\n"," \"cell_type\": \"code\",\n"," \"source\": [vCreateStatement]\n"," }\n"," json_notebook_definition_temp[\"cells\"].append(new_cell) \n","\n","\n","\n"," # replace the placeholders\n"," json_notebook_definition = replace_placeholders_in_json(json_notebook_definition_temp, default_inputs_for_notebook_definition)\n","\n"," # final json definition\n"," json_notebook_definition_new = json.loads(json.dumps(json_notebook_definition, indent=4))\n"," # print(json.dumps(json_notebook_definition, indent=4))\n","\n"," # base64 encoding for the api call\n"," json_notebook_definition_new_encoded = base64.b64encode(json.dumps(json_notebook_definition_new, indent=4).encode('utf-8')).decode('utf-8')\n","\n","\n"," # 3. update the notebook definition\n","\n"," # set the url for the update\n"," vUrl = vBaseUrl + f\"workspaces/{vTargetWorkspaceId}/notebooks/{vTargetSqlNotebookId}/updateDefinition\"\n","\n"," # set the body\n"," vJsonBody = {\n"," \"definition\": {\n"," \"format\": \"ipynb\",\n"," \"parts\": [\n"," {\n"," \"path\": \"notebook-content.py\",\n"," \"payload\": f\"{json_notebook_definition_new_encoded}\",\n"," \"payloadType\": \"InlineBase64\"\n"," }\n"," ]\n"," }\n"," }\n","\n","\n"," # update the notebook definition\n"," # the update notebook definition as of 02.2025 has an issue when executin the operation url when the response status code is 202\n"," # it returns an error although the update is successful\n"," create_or_update_fabric_item(vUrl, vHeaders, vJsonBody, 'post', \"updating\", vTargetWorkspaceId, vTargetSqlNotebookName, \"Notebook\", vSleepInSeconds, pDebugMode) \n","\n"," # 4. run the notebook\n","\n"," # set the url\n"," vUrl = vBaseUrl + f\"workspaces/{vTargetWorkspaceId}/items/{vTargetSqlNotebookId}/jobs/instances?jobType=RunNotebook\"\n","\n"," # run the notebook\n"," create_or_update_fabric_item(vUrl, vHeaders, None, 'post', \"executing\", vTargetWorkspaceId, vTargetSqlNotebookName, \"Notebook\", vSleepInSeconds, pDebugMode) \n","\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'create sql objects in target lakehouses', datetime.now(), None, vMessage, ''] \n","\n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'create sql objects in target lakehouses', datetime.now(), None, vMessage, str(e)]\n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"b65162c3-a325-46c9-afba-a12ce2a51810"},{"cell_type":"markdown","source":["**Delete notebooks created in previous steps**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"59cb2579-8d6f-4e79-a8da-56406bb0fc62"},{"cell_type":"code","source":["try:\n"," for lakehouse in df_source_lakehouses['Lakehouse Name']:\n","\n"," # set the lakehouse name\n"," vLakehouseName = lakehouse\n","\n"," # define the target notebook name\n"," vTargetNotebookName = \"nb_\" + vLakehouseName + \"_definition\"\n"," vTargetSqlNotebookName = \"nb_\" + vLakehouseName + \"_sql_definition\"\n","\n"," # delete the notebooks\n"," notebookutils.notebook.delete(vTargetNotebookName, workspaceId=vTargetWorkspaceId)\n"," notebookutils.notebook.delete(vTargetSqlNotebookName, workspaceId=vTargetWorkspaceId)\n","\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'delete temporary notebooks', datetime.now(), None, vMessage, ''] \n","\n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'delete temporary notebooks', datetime.now(), None, vMessage, str(e)]\n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"6392d38d-47f1-47ea-92be-889ec260ec2e"},{"cell_type":"markdown","source":["**Logging**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"c3ac9c82-9c12-453b-b650-9d79ab90ca1b"},{"cell_type":"code","source":["try:\n"," # perform the conversion of columns\n"," dfLogging = dfLogging.astype({\n"," \"LoadId\": \"string\",\t\n"," \"NotebookId\": \"string\", \t\n"," \"NotebookName\": \"string\", \n"," \"WorkspaceId\": \"string\", \n"," \"CellId\": \"string\", \n"," \"Timestamp\": \"datetime64[ns]\", \n"," \"ElapsedTime\": \"string\", \n"," \"Message\": \"string\", \n"," \"ErrorMessage\" : \"string\"\n"," })\n","\n"," # save panda dataframe to a spark dataframe \n"," sparkDF_Logging = spark.createDataFrame(dfLogging) \n","\n"," # save to the lakehouse\n"," sparkDF_Logging.write.mode(\"append\").format(\"delta\").option(\"mergeSchema\", \"true\").saveAsTable(\"staging.notebook_logging_cicd\")\n","\n","except Exception as e:\n"," vMessage = \"saving logs to the lakehouse failed\"\n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"95915092-87ee-4735-b698-571e2eb13cde"}],"metadata":{"language_info":{"name":"python"},"kernelspec":{"name":"synapse_pyspark","language":"Python","display_name":"Synapse PySpark"},"kernel_info":{"name":"synapse_pyspark"},"widgets":{},"microsoft":{"language":"python","language_group":"synapse_pyspark","ms_spell_check":{"ms_spell_check_language":"en"}},"nteract":{"version":"nteract-front-end@1.0.0"},"synapse_widget":{"version":"0.1","state":{}},"spark_compute":{"compute_id":"/trident/default","session_options":{"conf":{"spark.synapse.nbs.session.timeout":"1200000"}}},"dependencies":{"lakehouse":{},"environment":{}}},"nbformat":4,"nbformat_minor":5}
\ No newline at end of file
diff --git a/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_pre_update_warehouses.ipynb b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_pre_update_warehouses.ipynb
new file mode 100644
index 0000000..d2ac6ab
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_cicd_pre_update_warehouses.ipynb
@@ -0,0 +1 @@
+{"cells":[{"cell_type":"markdown","source":["**Helper notebook**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"da95792c-4c46-4183-be8b-5b0e6b0f9ef9"},{"cell_type":"code","source":["%run nb_helper"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"b563414c-45f3-438c-b444-586d71b386d0"},{"cell_type":"markdown","source":["**Define a logging dataframe**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"c2c41a93-d0ff-4763-bf57-7269190714e0"},{"cell_type":"code","source":["dfLogging = pd.DataFrame(columns = ['LoadId','NotebookId', 'NotebookName', 'WorkspaceId', 'CellId', 'Timestamp', 'ElapsedTime', 'Message', 'ErrorMessage'])\n","vContext = mssparkutils.runtime.context\n","vNotebookId = vContext[\"currentNotebookId\"]\n","vLogNotebookName = vContext[\"currentNotebookName\"]\n","vWorkspaceId = vContext[\"currentWorkspaceId\"] # where the notebook is running, to not confuse with source and target workspaces"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"9ac8aeb9-3878-479a-ae29-640161732bc1"},{"cell_type":"markdown","source":["**Parameters --> convert to code for debugging the notebook. otherwise, keep commented as parameters are passed from DevOps pipelines**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"de3479eb-28dd-4ad6-b1a2-35466e20a214"},{"cell_type":"code","source":["\n","pSqlToken = \"\"\n","pSourceWorkspaceId = \"\"\n","pTargetWorkspaceId = \"\"\n","pDebugMode = \"yes\""],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"50083b29-bc68-42a8-b289-66e4867320c5"},{"cell_type":"markdown","source":["**Resolve source and target workspace ids**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"0bb45b09-4d2d-47cc-b665-31a4b9575918"},{"cell_type":"code","source":["vSourceWorkspaceName = fabric.resolve_workspace_name(pSourceWorkspaceId)\n","vTargetWorkspaceName = fabric.resolve_workspace_name(pTargetWorkspaceId)"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"5d6326ab-345b-4d5b-871b-a61d08a522c4"},{"cell_type":"markdown","source":["**List source and target warehouses**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"3cacbc51-47fe-4a0b-a0cb-5df287ceb2f5"},{"cell_type":"code","source":["df_source_warehouses = labs.list_warehouses(workspace=vSourceWorkspaceName)\n","df_target_warehouses = labs.list_warehouses(workspace=vTargetWorkspaceName)"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"eef84dce-fbc2-4566-b4a8-c1e98f4e0514"},{"cell_type":"markdown","source":["**Verify that there is a least one warehouse in the source or the target workspace --> if there are no warehouses, exit the notebook**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"43ee39c4-d1de-4fd6-ab5b-08a4e87bbd4b"},{"cell_type":"code","source":["if df_target_warehouses.empty or df_source_warehouses.empty:\n"," vMessage = f\"workspace or workspace have 0 warehouse. pre-update is not required\"\n"," print(vMessage)\n","\n","\n"," # Display an exit message\n"," display(Markdown(\"### ✅ Notebook execution stopped successfully!\"))\n","\n"," # Exit without error\n"," # sys.exit(0)\n"," # InteractiveShell.instance().ask_exit()\n"," mssparkutils.notebook.exit(vMessage)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"b25ea850-66aa-422d-873f-efb4a37cc3f3"},{"cell_type":"markdown","source":["**Source and target sql analytics endpoints**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"867d321e-85e1-4a30-b21c-09da7ae0206e"},{"cell_type":"code","source":["vSourceSqlEndpoint = df_source_warehouses.loc[0, 'Connection Info']\n","vTargetSqlEndpoint = df_target_warehouses.loc[0, 'Connection Info']"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"4d85c781-45aa-414e-a1ad-51e16095064c"},{"cell_type":"markdown","source":["**Access Token**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"9aac2c16-29ff-4797-893f-7516002e5dc5"},{"cell_type":"code","source":["vScope = \"https://analysis.windows.net/powerbi/api\"\n","\n","# get the access token \n","if pDebugMode == \"yes\":\n"," # in debug mode, use the token of the current user\n"," vSqlAccessToken = mssparkutils.credentials.getToken(vScope)\n","else:\n"," # when the code is run from DevOps, the token passed as a parameter\n"," vSqlAccessToken = pSqlToken"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"e59e65a8-4156-4992-b7b9-2f682439d3f5"},{"cell_type":"markdown","source":["**Sql statement to get the tables and their columns**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"be2f4c31-195a-46e3-863b-ac1b508289d9"},{"cell_type":"code","source":["vSqlStatement = \"\"\"\n","SELECT \n","\t DB_NAME() as [DATABASE_NAME]\n","\t,c.TABLE_SCHEMA\n","\t,c.TABLE_NAME\n","\t,c.ORDINAL_POSITION\n","\t,c.COLUMN_NAME\n","\t, \n","\t'[' + DATA_TYPE + ']'\n","\t+ \n","\tCASE \n","\t\tWHEN DATA_TYPE IN ('tinyint', 'smallint', 'int', 'bigint','xml', 'smalldatetime', 'datetime', 'datetime2', 'bit', 'date', 'money', 'float', 'real') THEN ''\n","\t\tWHEN DATA_TYPE IN ('varchar', 'nvarchar', 'nchar', 'varbinary', 'char') \n","\t\tTHEN \n","\t\t\t'(' \n","\t\t\t+ \n","\t\t\tCASE CHARACTER_MAXIMUM_LENGTH \n","\t\t\t\tWHEN -1 THEN 'max'\n","\t\t\t\tELSE CAST(CHARACTER_MAXIMUM_LENGTH AS VARCHAR(10))\n","\t\t\tEND \n","\t\t\t+ ')'\n","\t\tWHEN DATA_TYPE IN ('numeric', 'decimal') THEN '(' + CAST(NUMERIC_PRECISION AS VARCHAR(10)) + ',' + CAST(NUMERIC_SCALE AS VARCHAR(10)) + ')'\n","\tEND \n","\tAS COLUMN_DEFINITION\n","FROM \n","\tINFORMATION_SCHEMA.COLUMNS c\n","\tINNER JOIN INFORMATION_SCHEMA.TABLES t \n","\t\tON c.TABLE_NAME = t.TABLE_NAME AND t.TABLE_TYPE = 'BASE TABLE'\n","WHERE\t\n","\tc.TABLE_SCHEMA NOT IN ('INFORMATION_SCHEMA','queryinsights','sys')\n","ORDER BY \n","\tc.TABLE_SCHEMA\n","\t,c.TABLE_NAME\n","\t,c.ORDINAL_POSITION\n","\"\"\""],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"550105fc-9579-4c6a-a048-a8d5a4d5ec32"},{"cell_type":"markdown","source":["**Functions**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"0d5e4909-aa21-4732-8645-38ffb6019ffe"},{"cell_type":"code","source":["# compare source and target dataframes, identify new and modified columns\n","def compare_dataframes(source_dataframe, target_dataframe, key_columns):\n","\n"," # Ensure both DataFrames have the same columns\n"," assert list(target_dataframe.columns) == list(source_dataframe.columns), \"DataFrames must have the same columns\"\n","\n"," source_dataframe_indexed = source_dataframe.set_index(key_columns)\n"," target_dataframe_indexed = target_dataframe.set_index(key_columns)\n","\n"," # columns in source but not in target --> added to source\n"," df_columns_only_in_source = source_dataframe_indexed.loc[~source_dataframe_indexed.index.isin(target_dataframe_indexed.index)].reset_index()\\\n","\n"," # # rows in target but not in source --> deleted from source\n"," # columns_only_in_target = target_dataframe_indexed.loc[~target_dataframe_indexed.index.isin(source_dataframe_indexed.index)].reset_index()\n","\n"," # columns in common but with a data type change\n"," df_common_rows = target_dataframe_indexed.index.intersection(source_dataframe_indexed.index)\n"," columns_with_type_change_list = []\n"," for index in df_common_rows:\n"," if not target_dataframe_indexed.loc[index].equals(source_dataframe_indexed.loc[index]): # Compare row values\n"," modified_row = source_dataframe_indexed.loc[[index]].reset_index() # Fetch modified row from B\n"," # modified_row[\"Change_Type\"] = \"modified\"\n"," columns_with_type_change_list.append(modified_row)\n","\n"," df_columns_with_type_change = pd.concat(columns_with_type_change_list, ignore_index=True) if columns_with_type_change_list else pd.DataFrame()\n","\n"," return df_columns_only_in_source, df_columns_with_type_change"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"b8259469-c19e-45ee-8a9d-1e32c9523ac4"},{"cell_type":"code","source":["# create the alchemy engine\n","def create_sqlalchemy_engine(connection_string : str):\n"," token = pSqlToken\n"," SQL_COPT_SS_ACCESS_TOKEN = 1256\n","\n"," # the following code is required to structure the token for pyodbc.connect\n"," exptoken = b'';\n"," for i in bytes(token, \"UTF-8\"):\n"," exptoken += bytes({i});\n"," exptoken += bytes(1);\n"," tokenstruct = struct.pack(\"=i\", len(exptoken)) + exptoken;\n","\n"," return sqlalchemy.create_engine(\"mssql+pyodbc://\", creator=lambda: pyodbc.connect(connection_string, attrs_before = { SQL_COPT_SS_ACCESS_TOKEN:bytearray(tokenstruct) }))"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"317fa76a-954f-44f4-a778-f5714a068056"},{"cell_type":"markdown","source":["**Get the definition of warehouse(s) tables in the source workspace**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"c55d7dd5-36b5-4114-81c8-15088f7593b3"},{"cell_type":"code","source":["df_source_warehouses_columns = pd.DataFrame()\n","try:\n","\n"," for index, row in df_source_warehouses.iterrows():\n","\n"," # get the current warehouse\n"," vWarehouseName = row['Warehouse Name']\n","\n"," # define the connection string for the alchemy engine\n"," vConnectionString = f\"Driver={{ODBC Driver 18 for SQL Server}};Server={vSourceSqlEndpoint};Database={vWarehouseName};\"\n"," # print(vConnectionString)\n","\n"," # create the sql engine\n"," sql_engine = create_sqlalchemy_engine(vConnectionString)\n","\n"," # connect to the engine\n"," with sql_engine.connect() as sql_connection:\n","\n"," # get the definition of the tables\n"," df_source_warehouses_columns_temp = pd.read_sql(vSqlStatement, sql_connection)\n","\n"," # append the rows to the dataframe\n"," df_source_warehouses_columns = pd.concat([df_source_warehouses_columns, df_source_warehouses_columns_temp], ignore_index=True)\n","\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'identify source warehouses tables definition', datetime.now(), None, vMessage, ''] \n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'identify source warehouses tables definition', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))\n"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"d668d0da-a791-4980-8bcc-c9ed2e04d341"},{"cell_type":"markdown","source":["**Get the definition of warehouse(s) tables in the target workspace**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"a8b319cc-51b7-4b60-9c3b-1c8b537725c3"},{"cell_type":"code","source":["df_target_warehouses_columns = pd.DataFrame()\n","try:\n","\n"," for index, row in df_target_warehouses.iterrows():\n","\n"," # get the current warehouse\n"," vWarehouseName = row['Warehouse Name']\n","\n"," # define the connection string for the alchemy engine\n"," vConnectionString = f\"Driver={{ODBC Driver 18 for SQL Server}};Server={vTargetSqlEndpoint};Database={vWarehouseName}\"\n","\n"," # create the sql engine\n"," sql_engine = create_sqlalchemy_engine(vConnectionString)\n","\n"," # connect to the engine\n"," with sql_engine.connect() as sql_connection:\n","\n"," # get the definition of the tables\n"," df_target_warehouses_columns_temp = pd.read_sql(vSqlStatement, sql_connection)\n","\n"," # append the rows to the dataframe\n"," df_target_warehouses_columns = pd.concat([df_target_warehouses_columns, df_target_warehouses_columns_temp], ignore_index=True)\n","\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'identify target warehouses tables definition', datetime.now(), None, vMessage, ''] \n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'identify target warehouses tables definition', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))\n"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"fc5c9a5e-7e97-43c3-9fa2-1c5386cb00a4"},{"cell_type":"markdown","source":["**Build the logic for the sql statements**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"05658626-30ac-4f21-8b0c-e1d90d797c77"},{"cell_type":"code","source":["# key columns for the merge and comparison\n","key_columns = ['DATABASE_NAME', 'TABLE_SCHEMA', 'TABLE_NAME', 'COLUMN_NAME']\n","\n","# source and target comparison\n","df_columns_only_in_source, df_columns_with_type_changed = compare_dataframes(df_source_warehouses_columns, df_target_warehouses_columns, key_columns)\n","df_compare_dataframes_unioned = pd.concat([df_columns_only_in_source, df_columns_with_type_changed]).drop_duplicates()\n","\n","if not df_compare_dataframes_unioned.empty:\n"," df_columns_in_common_not_changed = df_source_warehouses_columns.merge(df_compare_dataframes_unioned, on=list(df_source_warehouses_columns.columns), how='left', indicator=True).query('_merge == \"left_only\"').drop('_merge', axis=1)\n"," df_columns_in_common_not_changed['SelectColumnStatement'] = df_columns_in_common_not_changed['COLUMN_NAME']\n","\n","# # select statement generation\n","if not df_columns_only_in_source.empty:\n"," df_columns_only_in_source['SelectColumnStatement'] = \"CAST(NULL AS \" + df_columns_only_in_source['COLUMN_DEFINITION'] + \") AS \" + df_columns_only_in_source['COLUMN_NAME']\n","else:\n"," df_columns_only_in_source = pd.DataFrame()\n","\n","if not df_columns_with_type_changed.empty:\n"," df_columns_with_type_changed['SelectColumnStatement'] = \"CAST([\" + df_columns_with_type_changed['COLUMN_NAME'] + \"] AS \" + df_columns_with_type_changed['COLUMN_DEFINITION'] + \") AS \" + df_columns_with_type_changed['COLUMN_NAME']\n","else:\n"," df_columns_with_type_changed = pd.DataFrame()\n","\n","\n","# sources tables that changed\n","if not df_compare_dataframes_unioned.empty:\n","\n"," # generate a distinct list of tables that changed\n"," df_changed_tables = df_compare_dataframes_unioned[['DATABASE_NAME', 'TABLE_SCHEMA', 'TABLE_NAME']].drop_duplicates()\n"," # build the sql statement to run against the target warehouse\n"," df_sql_statements = pd.concat([df_columns_only_in_source, df_columns_with_type_changed, df_columns_in_common_not_changed]).drop_duplicates().sort_values(by=['DATABASE_NAME', 'TABLE_SCHEMA', 'TABLE_NAME', 'ORDINAL_POSITION' ])\n"," df_sql_statements_grouped = df_sql_statements.groupby(['DATABASE_NAME', 'TABLE_SCHEMA', 'TABLE_NAME'])['SelectColumnStatement'].agg(','.join).reset_index()\n"," df_sql_statements_grouped[\"DropBackupTableStatement\"] = \"DROP TABLE IF EXISTS \" + df_sql_statements_grouped[\"TABLE_SCHEMA\"] + \".\" + df_sql_statements_grouped[\"TABLE_NAME\"] + \"_backup\"\n"," df_sql_statements_grouped[\"CtasStatement\"] = \"CREATE TABLE \" + df_sql_statements_grouped[\"TABLE_SCHEMA\"] + \".\" + df_sql_statements_grouped[\"TABLE_NAME\"] + \"_backup AS SELECT \" + df_sql_statements_grouped[\"SelectColumnStatement\"] + \" FROM \" + df_sql_statements_grouped[\"TABLE_SCHEMA\"] + \".\" + df_sql_statements_grouped[\"TABLE_NAME\"]\n"," df_sql_statements_grouped[\"DropTableStatement\"] = \"DROP TABLE IF EXISTS \" + df_sql_statements_grouped[\"TABLE_SCHEMA\"] + \".\" + df_sql_statements_grouped[\"TABLE_NAME\"]\n"," df_sql_statements_grouped[\"RenamingTableStatement\"] = \"EXEC sp_rename '\" + df_sql_statements_grouped[\"TABLE_SCHEMA\"] + \".\" + df_sql_statements_grouped[\"TABLE_NAME\"] + \"_backup', '\" + df_sql_statements_grouped[\"TABLE_NAME\"] + \"';\"\n","\n","else:\n"," df_changed_tables = pd.DataFrame()\n","\n","\n"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"89d40c51-79ba-417e-b2a9-8770ce44f46a"},{"cell_type":"markdown","source":["**Run the sql statements against the target sql endpoint**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"01282a03-e82a-459c-b0d1-9619dfce1906"},{"cell_type":"code","source":["\n","try:\n","\n"," # iterate through the source warehouses\n"," for index, row in df_source_warehouses.iterrows():\n","\n"," # get the current warehouse\n"," vWarehouseName = row['Warehouse Name']\n","\n"," if not df_changed_tables.empty:\n","\n"," # filter the changed tables on the current warehouse\n"," df_changed_tables_in_scope = df_changed_tables[df_changed_tables['DATABASE_NAME']==vWarehouseName]\n","\n"," # if the changed tables df is not empty\n"," if not df_changed_tables_in_scope.empty:\n","\n"," # define the connection string for the alchemy engine\n"," vConnectionString = f\"Driver={{ODBC Driver 18 for SQL Server}};Server={vTargetSqlEndpoint};Database={vWarehouseName}\"\n","\n"," # create the sql engine\n"," sql_engine = create_sqlalchemy_engine(vConnectionString)\n","\n"," # connect to the engine\n"," with sql_engine.connect() as sql_connection:\n","\n"," connection = sql_engine.raw_connection()\n"," cursor = connection.cursor()\n","\n","\n"," # iterate over tables that require an update\n"," for index_table, row_table in df_changed_tables_in_scope.iterrows():\n"," vChangedSchema = row_table['TABLE_SCHEMA']\n"," vChangedTable = row_table['TABLE_NAME']\n","\n"," # filter the sql statements on the current warehouse, schema and table\n"," df_sql_statements_in_scope = df_sql_statements_grouped[(df_sql_statements_grouped['DATABASE_NAME']==vWarehouseName) & (df_sql_statements_grouped['TABLE_SCHEMA']==vChangedSchema) & (df_sql_statements_grouped['TABLE_NAME']==vChangedTable)]\n","\n","\n"," # retrieve each of the sql statement and execute it\n"," vDropBackupTableStatement = df_sql_statements_in_scope.loc[0, 'DropBackupTableStatement'] + ';'\n"," print(f\"running statement: {vDropBackupTableStatement}\")\n"," cursor.execute(vDropBackupTableStatement)\n","\n"," vCtasStatement = df_sql_statements_in_scope.loc[0, 'CtasStatement'] + ';'\n"," print(f\"running statement: {vCtasStatement}\")\n"," cursor.execute(vCtasStatement)\n","\n"," vDropTableStatement = df_sql_statements_in_scope.loc[0, 'DropTableStatement'] + ';'\n"," print(f\"running statement: {vDropTableStatement}\")\n"," cursor.execute(vDropTableStatement)\n","\n"," vRenamingTableStatement = df_sql_statements_in_scope.loc[0, 'RenamingTableStatement'] + ';'\n"," print(f\"running statement: {vRenamingTableStatement}\")\n"," cursor.execute(vRenamingTableStatement)\n","\n"," # commit\n"," connection.commit()\n","\n"," else:\n"," vMessage = f\"no change detected in warehouse <{vWarehouseName}>\"\n"," if pDebugMode == \"yes\":\n"," print(vMessage)\n","\n"," else:\n"," vMessage = f\"no change detected in existings warehouses\"\n"," if pDebugMode == \"yes\":\n"," print(vMessage)\n","\n"," # logging\n"," vMessage = f\"succeeded\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'updating target warehouses tables definition', datetime.now(), None, vMessage, ''] \n","\n","except Exception as e:\n"," vMessage = f\"failed\"\n"," dfLogging.loc[len(dfLogging.index)] = [None, vNotebookId, vLogNotebookName, vWorkspaceId, 'updating target warehouses tables definition', datetime.now(), None, vMessage, str(e) ] \n"," if pDebugMode == \"yes\":\n"," print(str(e))\n"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"cc5b6252-376c-42de-a6af-331de022203c"},{"cell_type":"markdown","source":["**Logging**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"5c6cbe48-3cd5-4e62-ae79-50f2106e7857"},{"cell_type":"code","source":["try:\n"," # perform the conversion of columns\n"," dfLogging = dfLogging.astype({\n"," \"LoadId\": \"string\",\t\n"," \"NotebookId\": \"string\", \t\n"," \"NotebookName\": \"string\", \n"," \"WorkspaceId\": \"string\", \n"," \"CellId\": \"string\", \n"," \"Timestamp\": \"datetime64[ns]\", \n"," \"ElapsedTime\": \"string\", \n"," \"Message\": \"string\", \n"," \"ErrorMessage\" : \"string\"\n"," })\n","\n"," # save panda dataframe to a spark dataframe \n"," sparkDF_Logging = spark.createDataFrame(dfLogging) \n","\n"," # save to the lakehouse\n"," sparkDF_Logging.write.mode(\"append\").format(\"delta\").option(\"mergeSchema\", \"true\").saveAsTable(\"staging.notebook_logging_cicd\")\n","\n","except Exception as e:\n"," vMessage = \"saving logs to the lakehouse failed\"\n"," if pDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"b16dcf7c-22b8-4151-aeac-d7423a504fc3"}],"metadata":{"language_info":{"name":"python"},"kernelspec":{"name":"synapse_pyspark","language":"Python","display_name":"Synapse PySpark"},"kernel_info":{"name":"synapse_pyspark"},"microsoft":{"language":"python","language_group":"synapse_pyspark","ms_spell_check":{"ms_spell_check_language":"en"}},"nteract":{"version":"nteract-front-end@1.0.0"},"synapse_widget":{"version":"0.1","state":{}},"spark_compute":{"compute_id":"/trident/default","session_options":{"conf":{"spark.synapse.nbs.session.timeout":"1200000"}}},"dependencies":{"lakehouse":{},"environment":{}}},"nbformat":4,"nbformat_minor":5}
\ No newline at end of file
diff --git a/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_extract_lakehouse_access.ipynb b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_extract_lakehouse_access.ipynb
new file mode 100644
index 0000000..6b04155
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_extract_lakehouse_access.ipynb
@@ -0,0 +1 @@
+{"cells":[{"cell_type":"markdown","source":["**Helper notebook**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"e2928a34-d72f-4a04-b749-bff740706576"},{"cell_type":"code","source":["%run nb_helper"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"5209822b-78bc-4987-95e8-278ac02c706a"},{"cell_type":"markdown","source":["**Parameters**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"63228e99-f488-4a1b-812f-c44fa6fb0d6d"},{"cell_type":"code","source":["pSourceWorkspaceId = \"\"\n","pProjectPath = \"\" # path to the extraction in your cicd lakehouse"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"6a5402ee-853b-4658-bfd9-347bfc206d19"},{"cell_type":"markdown","source":["**Access token and base url**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"17d61c3b-873d-47f4-bfff-d19c8e307937"},{"cell_type":"code","source":["vScope = \"https://analysis.windows.net/powerbi/api\"\n","vAccessToken = mssparkutils.credentials.getToken(vScope)\n","vBaseUrl = f\"https://api.fabric.microsoft.com/v1/\"\n","vHeaders = {'Authorization': f'Bearer {vAccessToken}'}"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"0de08264-f382-48d5-9631-7f51be6f370d"},{"cell_type":"markdown","source":["**Resolve source workspace ids**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"07c1a29c-e46e-45d2-8fa3-1248df325b74"},{"cell_type":"code","source":["vSourceWorkspaceName = fabric.resolve_workspace_name(pSourceWorkspaceId)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"1e2132ab-a7c9-40e3-b3ac-8a2dcce3e1b9"},{"cell_type":"markdown","source":["**List source lakehouses**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"a4ea3cd0-1c3b-4bae-baef-c973a7a83be0"},{"cell_type":"code","source":["df_source_lakehouses = labs.list_lakehouses(workspace=vSourceWorkspaceName)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"33ebb478-b42c-4dbe-a6e9-fe341b2b1c96"},{"cell_type":"markdown","source":["**Identify current access roles**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"4183d5b8-042e-4f70-a7ac-f5005abe07a4"},{"cell_type":"code","source":["try:\n","\n"," df_onelake_access = pd.DataFrame()\n","\n"," # get a token for storage\n"," vOnelakeHeaders = {\"authorization\": f\"bearer {mssparkutils.credentials.getToken('storage')}\"}\n","\n"," # iterate over the lakehouses\n"," for index, row in df_source_lakehouses.iterrows():\n","\n"," # set the lakehouse name and id\n"," vLakehouseName = row['Lakehouse Name']\n"," vLakehouseSourceId = row['Lakehouse ID']\n"," vLakehouseTargetId = labs.resolve_lakehouse_id(lakehouse=vLakehouseName, workspace=vSourceWorkspaceName)\n","\n","\n"," # 3. extract onelake access\n"," vExtractionType = \"onelake_access\"\n"," vShortcutUrl = f\"workspaces/{pSourceWorkspaceId}/items/{vLakehouseSourceId}/dataAccessRoles\"\n"," vUrl = vBaseUrl + vShortcutUrl\n"," print(f\"extracting onelake access for lakehouse {vLakehouseName}\") \n","\n"," # create the api global dataframe for shortcuts\n"," api_call_global_dataframe = pd.DataFrame()\n","\n"," try:\n"," \n"," # make the api call\n"," api_call_main(vUrl, vHeaders, 'yes', vExtractionType)\n","\n"," api_call_global_dataframe['lakehouse'] = vLakehouseName\n","\n","\n"," # concat to the correspondant dataframe\n"," df_onelake_access = pd.concat([df_onelake_access, api_call_global_dataframe], ignore_index=True)\n","\n"," # logging\n"," vMessage = f\"extracting onelake access for lakehouse {vLakehouseName} succeeded\"\n"," print(vMessage)\n","\n"," except Exception as e:\n"," print(str(e))\n","\n","except Exception as e:\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"464c174b-6ea9-4820-8cb8-003be4e252de"},{"cell_type":"markdown","source":["**Extract rules, entra members and item members**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"25ffb360-cc72-41ed-b5d5-37c76a52a64f"},{"cell_type":"code","source":["# 2. prepare the inputs for the creation\n","df_role_rules = flatten_nested_json_df(df_onelake_access[['id', 'decisionRules']].explode('decisionRules').dropna())\n","condition_1 = (df_role_rules[\"decisionRules.permission.attributeName\"] == \"Action\") & (df_role_rules[\"decisionRules.permission.attributeValueIncludedIn\"] != \"Read\")\n","df_role_rules_1 = df_role_rules[~condition_1]\n","condition_2 = (df_role_rules_1[\"decisionRules.permission.attributeName\"] == \"Path\") & (df_role_rules_1[\"decisionRules.permission.attributeValueIncludedIn\"] == \"Read\")\n","df_role_rules_2 = df_role_rules_1[~condition_2]\n","df_role_rules = df_role_rules_2\n","df_entra_members = flatten_nested_json_df(df_onelake_access[['id', 'members.microsoftEntraMembers']].explode('members.microsoftEntraMembers').dropna())\n","df_item_members = flatten_nested_json_df(df_onelake_access[['id', 'members.fabricItemMembers']].explode('members.fabricItemMembers').dropna())"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"3b381651-72d2-4966-8a94-546b494d977f"},{"cell_type":"markdown","source":["**Generate the onelake_roles.json file**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"0409e1c6-ece4-4282-a5c5-722df8270f1a"},{"cell_type":"code","source":["df_onelake_roles = df_onelake_access.reset_index()[[\"index\", \"name\", \"id\", \"lakehouse\"]]\n","onelake_roles_json = df_onelake_roles.apply(lambda row: {col: row[col] for col in df_onelake_roles.columns if not pd.isna(row[col])}, axis=1).tolist()\n","\n","fileName = \"onelake_roles.json\"\n","filePath = f'Files/{pProjectPath}/{fileName}'\n","\n","with open(f'/lakehouse/default/{filePath}','w') as f:\n"," json.dump(onelake_roles_json,f)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"591a4f92-c7bc-46e1-ad7a-b190d4b486ac"},{"cell_type":"markdown","source":["**Generate the onelake_rules.json file**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"3d4a8743-b094-4585-ad08-bf007a85453e"},{"cell_type":"code","source":["onelake_rules_json = df_role_rules.apply(lambda row: {col: row[col] for col in df_role_rules.columns if not pd.isna(row[col])}, axis=1).tolist()\n","\n","fileName = \"onelake_rules.json\"\n","filePath = f'Files/{pProjectPath}/{fileName}'\n","\n","with open(f'/lakehouse/default/{filePath}','w') as f:\n"," json.dump(onelake_rules_json,f)"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"2cc41dad-22b8-4f20-bf87-59f006c86ddf"},{"cell_type":"markdown","source":["**Generate the onelake_entra_members.json file**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"3c9e0e51-2cf1-4a4c-872c-9af81c4099ed"},{"cell_type":"code","source":["onelake_entra_members_json = df_entra_members.apply(lambda row: {col: row[col] for col in df_entra_members.columns if not pd.isna(row[col])}, axis=1).tolist()\n","\n","fileName = \"onelake_entra_members.json\"\n","filePath = f'Files/{pProjectPath}/{fileName}'\n","\n","with open(f'/lakehouse/default/{filePath}','w') as f:\n"," json.dump(onelake_entra_members_json,f)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"a28e6b9f-94bd-404f-ba7b-725f112afc17"},{"cell_type":"markdown","source":["**Generate the onelake_item_members.json file**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"521f6ed6-5c7c-43c5-8171-fbe6eea76dd6"},{"cell_type":"code","source":["onelake_item_members_json = df_item_members.apply(lambda row: {col: row[col] for col in df_item_members.columns if not pd.isna(row[col])}, axis=1).tolist()\n","\n","fileName = \"onelake_item_members.json\"\n","filePath = f'Files/{pProjectPath}/{fileName}'\n","\n","with open(f'/lakehouse/default/{filePath}','w') as f:\n"," json.dump(onelake_item_members_json,f)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"fc871385-5f0a-4590-b6da-48b2b7eddb44"},{"cell_type":"code","source":["pOnelakeRoles = json.dumps(onelake_roles_json, separators=(\",\", \":\"))\n","pOnelakeRules = json.dumps(onelake_rules_json, separators=(\",\", \":\"))\n","pOnelakeEntraMembers = json.dumps(onelake_entra_members_json, separators=(\",\", \":\"))\n","pOnelakeItemMembers = json.dumps(onelake_item_members_json, separators=(\",\", \":\"))"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"c442e39c-f123-4e0c-8e81-a6f0ff4c8ab8"},{"cell_type":"code","source":["# print(pOnelakeRoles)\n","# print(pOnelakeRules)\n","# print(pOnelakeEntraMembers)\n","# print(pOnelakeItemMembers)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"06f266d8-7eba-466a-bc8c-7a52e1bb555b"}],"metadata":{"kernel_info":{"name":"synapse_pyspark"},"kernelspec":{"name":"synapse_pyspark","language":"Python","display_name":"Synapse PySpark"},"language_info":{"name":"python"},"microsoft":{"language":"python","language_group":"synapse_pyspark","ms_spell_check":{"ms_spell_check_language":"en"}},"nteract":{"version":"nteract-front-end@1.0.0"},"spark_compute":{"compute_id":"/trident/default","session_options":{"conf":{"spark.synapse.nbs.session.timeout":"1200000"}}},"dependencies":{"lakehouse":{}}},"nbformat":4,"nbformat_minor":5}
\ No newline at end of file
diff --git a/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_helper.ipynb b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_helper.ipynb
new file mode 100644
index 0000000..c048944
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_helper.ipynb
@@ -0,0 +1 @@
+{"cells":[{"cell_type":"markdown","source":["**Install semantic link labs**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"29733b6f-2b10-49e8-acec-6984fcb3e5e4"},{"cell_type":"code","source":["!pip install semantic-link-labs --quiet"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"5acd6578-510f-41ea-9f34-40093d5680d6"},{"cell_type":"markdown","source":["**Libraries**"],"metadata":{},"id":"6592e13a-2323-4d11-b107-c9f519194f46"},{"cell_type":"code","source":["import base64\n","import datetime as dt\n","import json\n","import os\n","import re\n","import struct\n","import time\n","from datetime import datetime, timedelta\n","from string import Template\n","from timeit import default_timer as timer\n","from typing import List, Optional, Tuple\n","\n","import numpy as np\n","import pandas as pd\n","import pyodbc\n","import requests\n","import sqlalchemy\n","from IPython.display import Markdown, display\n","from notebookutils import mssparkutils\n","from pyspark.sql import DataFrame\n","from pyspark.sql.functions import col, current_timestamp, lit\n","\n","import com.microsoft.spark.fabric\n","from com.microsoft.spark.fabric.Constants import Constants\n","\n","import sempy.fabric as fabric\n","import sempy_labs as labs\n","import sempy_labs._icons as icons\n","from sempy import fabric\n","from sempy.fabric.exceptions import FabricHTTPException, WorkspaceNotFoundException\n","from sempy_labs._helper_functions import _decode_b64, lro, resolve_workspace_name_and_id\n"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"4ec5ed38-9f17-43ef-a8a4-4deb1aaf853f"},{"cell_type":"markdown","source":["import sempy.fabric as fabric\n","import sempy_labs as labs\n","from sempy.fabric.exceptions import FabricHTTPException, WorkspaceNotFoundException\n","import json\n","import requests\n","import pandas as pd\n","import os\n","import datetime as dt\n","import time\n","from timeit import default_timer as timer\n","from datetime import datetime, timedelta\n","from string import Template\n","import base64\n","import re\n","import struct\n","import sqlalchemy\n","import pyodbc\n","from notebookutils import mssparkutils\n","import numpy as np\n","import sempy_labs as labs\n","from sempy import fabric\n","import com.microsoft.spark.fabric\n","from com.microsoft.spark.fabric.Constants import Constants\n","from IPython.display import display, Markdown\n","from pyspark.sql import DataFrame\n","from pyspark.sql.functions import col,current_timestamp,lit\n","from typing import Optional, Tuple, List\n","from sempy_labs._helper_functions import (resolve_workspace_name_and_id, lro, _decode_b64)\n","import sempy_labs._icons as icons"],"metadata":{"nteract":{"transient":{"deleting":false}}},"id":"72cd894e-204f-4edf-b44c-55c422f24f72"},{"cell_type":"markdown","source":["**Fabric client**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"58ee6402-63b1-4b0b-a0db-1468dd8ab9d1"},{"cell_type":"code","source":["client = fabric.FabricRestClient()"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"6d3bdc3d-ae65-437b-9da7-c6968fce0d74"},{"cell_type":"markdown","source":["**Function to flatten a nested json**\n","- Takes a pandas dataframe\n","- Search for columns of type list\n","- flatten the list"],"metadata":{},"id":"cf1c96a9-36bb-433c-bdae-e74a751c903a"},{"cell_type":"code","source":["def flatten_nested_json_df(df):\n","\n"," df = df.reset_index()\n","\n"," # search for columns to explode/flatten\n"," s = (df.applymap(type) == list).all()\n"," list_columns = s[s].index.tolist()\n","\n"," s = (df.applymap(type) == dict).all()\n"," dict_columns = s[s].index.tolist()\n","\n"," while len(list_columns) > 0 or len(dict_columns) > 0:\n"," new_columns = []\n","\n"," for col in dict_columns:\n","\n"," horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')\n"," horiz_exploded.index = df.index\n"," df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])\n"," new_columns.extend(horiz_exploded.columns) # inplace\n","\n"," for col in list_columns:\n","\n"," # explode lists vertically, adding new columns\n"," df = df.drop(columns=[col]).join(df[col].explode().to_frame())\n"," new_columns.append(col)\n","\n"," # check if there are still dict o list fields to flatten\n"," s = (df[new_columns].applymap(type) == list).all()\n"," list_columns = s[s].index.tolist()\n","\n"," s = (df[new_columns].applymap(type) == dict).all()\n"," dict_columns = s[s].index.tolist()\n","\n"," return df\n"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"fd93c03f-f80f-4ef2-926b-dd87c937d60d"},{"cell_type":"markdown","source":["**Function to upper case the first letter of a string**"],"metadata":{},"id":"d651a897-4d09-42ed-acb7-6f08f6229659"},{"cell_type":"code","source":["def convert_into_uppercase(string_val):\n"," return string_val.group(1) + string_val.group(2).upper()\n"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"49d6b1dd-edb0-4420-bd77-895292a3c8ca"},{"cell_type":"markdown","source":["**Function to split a string, take the last 2 items and capitalize their first character**\n","\n","Example: \n","input -> 'datamarts.users.datamartUserAccessRight'\n","output -> 'UsersDatamartUserAccessRight'\n","\n","or \n","input -> 'datamarts.users'\n","output -> 'DatamartsUsers'"],"metadata":{},"id":"7b8934ac-bdd6-4a07-8c03-de8fef552b52"},{"cell_type":"code","source":["def process_column_name(column_name, separator):\n"," list_values = column_name.split(separator)\n","\n"," len_list = len(list_values)\n","\n"," # iterate over the list\n"," for i in range(len(list_values)):\n","\n"," # current value \n"," current_value = list_values[i]\n","\n"," # upper case the first letter \n"," upper_case_value = re.sub(\"(^|\\s)(\\S)\", convert_into_uppercase, current_value) \n","\n"," # replace the column name in the dataframe\n"," list_values[i] = upper_case_value\n","\n"," list_values_joined = ''.join(list_values)\n"," return list_values_joined"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"7b159545-5345-479c-8471-fb6c5e5aa14a"},{"cell_type":"markdown","source":["**Time rounder**"],"metadata":{},"id":"390e0764-d917-4bf2-be3c-8c8c17d2ec4c"},{"cell_type":"code","source":["# function to round to the nearest 15min\n","def fnRoundMinDatetime(dt, delta):\n"," return datetime.min + round((dt - datetime.min) / delta) * delta\n","\n","def fnRoundHourDatetime(dt):\n"," # Rounds to nearest hour by adding a timedelta hour if minute >= 30\n"," return (dt.replace(second=0, microsecond=0, minute=0, hour=dt.hour)\n"," +timedelta(hours=dt.minute//30))"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"feaef5dc-0e8d-4ecd-aa8d-51ef36519fcd"},{"cell_type":"markdown","source":["**Function to add a user to a fabric workspace**"],"metadata":{},"id":"e25aff1d-54ed-4a84-b92c-e6b14700da9a"},{"cell_type":"code","source":["def add_user_to_fabric_workspace(baseUrl, workspaceId, userUPN, accessToken, waitTime):\n","\n"," vWorkspaceId = workspaceId\n"," vBaseUrl = baseUrl\n"," vUserUPN = userUPN\n"," vAccessToken = accessToken\n"," vWaitTime = waitTime\n","\n"," # log activity\n"," vMessage = f\"adding user <{vUserUPN}> as admin to workspace <{vWorkspaceId}>\"\n"," print(vMessage)\n","\n"," # inputs for post request\n"," vHeader = {'Content-Type':'application/json','Authorization': f'Bearer {vAccessToken}'} \n"," vJsonBody = {\n"," \"groupUserAccessRight\": \"Admin\",\n"," \"emailAddress\": vUserUPN\n"," }\n"," vAssignUrl = \"admin/groups/\" + vWorkspaceId + \"/users\"\n","\n","\n"," try:\n"," # post the assignment\n"," assignment_response = requests.post(vBaseUrl + vAssignUrl, headers=vHeader, json=vJsonBody)\n","\n"," # raise an error for bad status codes\n"," assignment_response.raise_for_status() \n","\n"," # get the status code and reason\n"," status_code = assignment_response.status_code\n"," status = assignment_response.reason\n","\n"," # check status\n"," if status_code == 200: \n","\n"," vMessage = f\"assigning user <{vUserUPN}> to workspace <{vWorkspaceId}> succeeded.\"\n"," print(f\"{vMessage}\")\n"," status = \"succeeded\"\n"," print(f\"sleeping {vWaitTime} seconds\")\n"," time.sleep(vWaitTime) # to avoid hitting the limit of the api\n","\n"," except requests.exceptions.HTTPError as errh: \n"," error_message = errh.args[0]\n"," vMessage = f\"assigning user <{vUserUPN}> to workspace <{vWorkspaceId}> failed. HTTP Error; error: <{error_message}>\"\n"," print(f\"{vMessage}\")\n"," status = \"failed\"\n","\n"," except requests.exceptions.ReadTimeout as errrt: \n"," vMessage = f\"assigning user <{vUserUPN}> to workspace <{vWorkspaceId}> failed. Time out; error: <{errrt}>\"\n"," print(f\"{vMessage}\")\n"," status = \"failed\"\n","\n"," except requests.exceptions.ConnectionError as conerr: \n"," vMessage = f\"assigning user <{vUserUPN}> to workspace <{vWorkspaceId}> failed. Connection error; error: <{conerr}>\"\n"," print(f\"{vMessage}\")\n"," status = \"failed\"\n","\n"," except requests.exceptions.RequestException as errex: \n"," vMessage = f\"assigning user <{vUserUPN}> to workspace <{vWorkspaceId}> failed. Exception request; error: <{errex}>\"\n"," print(f\"{vMessage}\")\n"," status = \"failed\"\n","\n","\n"," # return the status\n"," return status\n","\n","\n"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"1685dbe1-8f48-4bd4-a509-3b1c05c49940"},{"cell_type":"markdown","source":["**Function to remove a user from a fabric workspace**"],"metadata":{},"id":"6be8a1dd-7b17-4d4d-bda7-9ca03a97fc14"},{"cell_type":"code","source":["def remove_user_from_fabric_workspace(baseUrl, workspaceId, userUPN, accessToken, waitTime):\n","\n"," vWorkspaceId = workspaceId\n"," vBaseUrl = baseUrl\n"," vUserUPN = userUPN\n"," vAccessToken = accessToken\n"," vWaitTime = waitTime\n","\n"," # log activity\n"," vMessage = f\"deleting user <{vUserUPN}> from workspace <{vWorkspaceId}>\"\n"," print(vMessage)\n","\n","\n"," # inputs for post request\n"," vHeader = {'Content-Type':'application/json','Authorization': f'Bearer {vAccessToken}'} \n"," vDeleteUrl = \"admin/groups/\" + vWorkspaceId + \"/users/\" + vUserUPN\n","\n"," try:\n"," # post the assignment\n"," assignment_response = requests.delete(vBaseUrl + vDeleteUrl, headers=vHeader)\n","\n"," # raise an error for bad status codes\n"," assignment_response.raise_for_status() \n","\n"," # get the status code and reason\n"," status_code = assignment_response.status_code\n"," status = assignment_response.reason\n","\n"," # check status\n"," if status_code == 200: \n","\n"," vMessage = f\"deleting user <{vUserUPN}> from workspace <{vWorkspaceId}> succeeded.\"\n"," print(f\"{vMessage}\")\n"," status = \"succeeded\"\n"," print(f\"sleeping {vWaitTime} seconds\")\n"," time.sleep(vWaitTime) # to avoid hitting the limit of the api\n","\n","\n"," except requests.exceptions.HTTPError as errh: \n"," error_message = errh.args[0]\n"," vMessage = f\"deleting user <{vUserUPN}> from workspace <{vWorkspaceId}> failed. HTTP Error; error: <{error_message}>\"\n"," print(f\"{vMessage}\")\n"," status = \"failed\"\n","\n"," except requests.exceptions.ReadTimeout as errrt: \n"," vMessage = f\"deleting user <{vUserUPN}> from workspace <{vWorkspaceId}> failed. Time out; error: <{errrt}>\"\n"," print(f\"{vMessage}\")\n"," status = \"failed\"\n","\n"," except requests.exceptions.ConnectionError as conerr: \n"," vMessage = f\"deleting user <{vUserUPN}> from workspace <{vWorkspaceId}> failed. Connection error; error: <{conerr}>\"\n"," print(f\"{vMessage}\")\n"," status = \"failed\"\n","\n"," except requests.exceptions.RequestException as errex: \n"," vMessage = f\"deleting user <{vUserUPN}> from workspace <{vWorkspaceId}> failed. Exception request; error: <{errex}>\"\n"," print(f\"{vMessage}\")\n"," status = \"failed\"\n","\n","\n"," # return the status\n"," return status"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"3295de51-8b19-4eca-82af-87e326a748a7"},{"cell_type":"markdown","source":["**Function to call a fabric api and return the correspondant dataframe**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"f3788c30-4adf-4f93-8b9b-0ec73355b1f9"},{"cell_type":"code","source":["def perform_get_request(url, headers, debug_mode, extraction_type):\n","\n"," # the fabric client fails when making api calls to the onelake storage\n"," if extraction_type==\"file_system\":\n"," try:\n"," response = requests.get(url, headers = headers)\n"," return response\n"," except Exception as e:\n"," print(\"failed to call the api. exception:\", str(e))\n"," return None\n"," else:\n"," try:\n"," response = client.get(url, headers)\n","\n"," if response.status_code != 200:\n"," raise FabricHTTPException(response)\n"," else:\n"," return response\n","\n"," except FabricHTTPException as e:\n"," if debug_mode == \"yes\":\n"," print(\"failed to call the fabric api. exception:\", str(e))\n"," return None\n","\n","\n","def handle_response(response, debug_mode):\n"," if response is None:\n"," if debug_mode == \"yes\":\n"," print(\"response is None\")\n"," return None\n"," elif not response.text.strip():\n"," if debug_mode == \"yes\":\n"," print(\"response is empty\")\n"," return None\n"," else:\n"," try:\n"," # convert response to JSON\n"," response_data = response.json()\n"," response_content = json.loads(response.content)\n"," continuation_token = \"\" #response_data.get('continuationToken', None)\n"," continuation_uri = \"\" #response_data.get('continuationUri', None)\n"," return response_content, continuation_token, continuation_uri\n"," except ValueError:\n"," if debug_mode == \"yes\":\n"," print(\"failed to parse response as json\")\n"," return None\n","\n","\n","def json_to_dataframe(response_content, debug_mode, extraction_type):\n"," try:\n"," match extraction_type: \n"," case \"audit_logs\":\n"," result_dataframe = pd.concat([pd.json_normalize(x) for x in response_content['activityEventEntities']])\n"," return result_dataframe\n"," case \"domains\":\n"," result_dataframe = pd.concat([pd.json_normalize(x) for x in response_content['domains']])\n"," return result_dataframe\n"," case \"external_data_shares\":\n"," result_dataframe = pd.concat([pd.json_normalize(x) for x in response_content['value']])\n"," return result_dataframe\n"," case \"tenant_settings\":\n"," result_dataframe = pd.concat([pd.json_normalize(x) for x in response_content['tenantSettings']])\n"," return result_dataframe\n"," case \"capacities\":\n"," result_dataframe = pd.concat([pd.json_normalize(x) for x in response_content['value']])\n"," return result_dataframe\n"," case \"connections\":\n"," result_dataframe = pd.concat([pd.json_normalize(x) for x in response_content['value']])\n"," return result_dataframe\n"," case \"deployment_pipelines\":\n"," result_dataframe = pd.concat([pd.json_normalize(x) for x in response_content['value']])\n"," return result_dataframe\n"," case \"gateways\":\n"," result_dataframe = pd.concat([pd.json_normalize(x) for x in response_content['value']])\n"," return result_dataframe\n"," case \"shortcuts\":\n"," result_dataframe = pd.concat([pd.json_normalize(x) for x in response_content['value']])\n"," return result_dataframe\n"," case \"file_system\":\n"," result_dataframe = pd.concat([pd.json_normalize(x) for x in response_content['paths']])\n"," return result_dataframe\n"," case \"onelake_access\":\n"," result_dataframe = pd.concat([pd.json_normalize(x) for x in response_content['value']])\n"," return result_dataframe\n"," # case \"relations\":\n"," # result_dataframe = pd.concat([pd.json_normalize(x) for x in response_content['relations']])\n"," # return result_dataframe \n","\n"," except Exception as e:\n"," if debug_mode == \"yes\":\n"," print(f\"failed to generate the required dataframe. exception: {str(e)}\") \n"," \n","\n","def append_to_global_df(result_dataframe):\n"," global api_call_global_dataframe\n"," if not result_dataframe.empty:\n"," api_call_global_dataframe = pd.concat([api_call_global_dataframe, result_dataframe], ignore_index=True)\n","\n","def api_call_main(url, headers, debug_mode, extraction_type):\n","\n"," # set boolean vaule to continue to the next interval (in case response has a paging url)\n"," continue_to_next_interval = True\n","\n"," # while loop the boolean is true\n"," while continue_to_next_interval:\n","\n"," # perform the GET request\n"," response = perform_get_request(url, headers, debug_mode, extraction_type)\n"," # print(json.loads(response.text))\n","\n"," # # handle the response\n"," response_content, continuation_token, continuation_uri = handle_response(response, debug_mode)\n"," # print(response_content, continuation_token, continuation_uri)\n","\n"," # convert to a dataframe\n"," result_dataframe = json_to_dataframe(response_content, debug_mode, extraction_type)\n","\n"," # append to the global dataframe\n"," append_to_global_df(result_dataframe)\n","\n"," # while there is a continuation token, request the next continuation url\n"," # continuation_count = 0\n"," while continuation_token:\n"," # continuation_count +=1\n"," # print(f\"continuation {continuation_count}\")\n"," response = perform_get_request(continuation_uri, headers, debug_mode, extraction_type) \n"," response_content, continuation_token, continuation_uri = handle_response(response, debug_mode)\n"," result_dataframe = json_to_dataframe(response_content, debug_mode, extraction_type)\n"," append_to_global_df(result_dataframe)\n","\n"," # if no error exit the while loop\n"," continue_to_next_interval = False"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"a3668c60-dbb5-4e17-9925-af47e6c4a185"},{"cell_type":"markdown","source":["**Function to create a fabric item**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"33d587ba-36a1-44a0-a0e7-8bd335e0c150"},{"cell_type":"code","source":["def create_or_update_fabric_item(url, headers, body, call_type, operation, workspace_id, item_name, item_type, sleep_in_seconds, debug_mode):\n","\n"," vMessage = f\"{operation} {item_type} <{item_name}> in workspace <{workspace_id}>\"\n"," print(vMessage)\n"," \n"," if call_type == \"post\":\n","\n"," # # json body\n"," # vJsonBody = {\n"," # \"displayName\": f\"{item_name}\",\n"," # \"type\": f\"{item_type}\",\n"," # \"description\": f\"{item_type} {item_name} created by fabric notebook\"\n"," # }\n","\n"," try:\n"," # post the assignment\n"," if body is None:\n"," response = client.post(url, headers=headers)\n"," else:\n"," response = client.post(url, headers=headers, json=body)\n","\n"," if response.status_code not in (200, 201, 202):\n"," raise FabricHTTPException(response)\n"," else:\n","\n"," # check status\n"," if response.status_code == 201: # if status is 201 then the create item succeeded\n","\n"," vMessage = f\"{operation} {item_type} <{item_name}> in workspace <{workspace_id}> succeeded\"\n"," print(f\"{vMessage}\")\n","\n"," elif response.status_code == 202: # if status is 202 then the create item is in progress\n"," \n"," vMessage = f\"{operation} {item_type} <{item_name}> in workspace <{workspace_id}> is in progress\"\n"," print(vMessage)\n","\n"," # get the operation url from the header location\n"," # doc https://learn.microsoft.com/en-us/rest/api/fabric/articles/long-running-operation\n"," operation_url = response.headers.get(\"Location\")\n","\n"," # vMessage = f\"operation url: <{operation_url}>\"\n","\n"," # monitor the operation\n"," while True:\n","\n"," # sleep the specified time --> this wait time might need adjustment\n"," time.sleep(sleep_in_seconds) \n","\n"," # check the operation\n"," operation_response = client.get(operation_url, headers=headers) \n","\n"," if operation_response.status_code == 200:\n","\n"," vMessage = f\"{operation} {item_type} <{item_name}> in workspace <{workspace_id}> succeeded\"\n"," print(f\"{vMessage}\")\n"," break\n","\n"," else:\n"," vMessage = f\"{operation} {item_type} <{item_name}> in workspace <{workspace_id}> failed\"\n"," print(f\"{vMessage}\")\n"," break\n","\n"," else: # any other status is a failure\n"," vMessage = f\"{operation} {item_type} <{item_name}> in workspace <{workspace_id}> failed\"\n"," print(f\"{vMessage}\")\n","\n"," # retry: \n"," vMessage = f\"second attempt - {operation} {item_type} <{item_name}> in workspace <{workspace_id}>\"\n"," print(f\"{vMessage}\")\n"," create_item(url, headers, body, operation, workspace_id, item_name, item_type, sleep_in_seconds, debug_mode)\n","\n"," except FabricHTTPException as e:\n"," print(\"failed to call the fabric api. exception:\", str(e))\n"," return None\n"," else:\n"," try:\n"," response = requests.put(url, headers=headers, json=body)\n"," print(response.text)\n"," except Exception as e:\n"," print(\"failed to call the fabric api. exception:\", str(e))\n"," return None\n","\n"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"1f8beb4f-08be-4814-93ec-1b848593665c"},{"cell_type":"markdown","source":["**Function to recursively replace placeholders in a json object**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"8ba72b88-633e-4fdf-9557-312985e565c3"},{"cell_type":"code","source":["def replace_placeholders_in_json(obj, inputs_for_json):\n"," if isinstance(obj, dict):\n"," return {k: replace_placeholders_in_json(v, inputs_for_json) for k, v in obj.items()}\n"," elif isinstance(obj, list):\n"," return [replace_placeholders_in_json(item, inputs_for_json) for item in obj]\n"," elif isinstance(obj, str):\n"," for key, value in inputs_for_json.items():\n"," obj = obj.replace(f\"{{{key}}}\", str(value))\n"," return obj\n"," else:\n"," return obj"],"outputs":[],"execution_count":null,"metadata":{"jupyter":{"source_hidden":false,"outputs_hidden":false},"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"0f07bc69-1bdc-4df9-99c6-3f6cfd407a61"}],"metadata":{"language_info":{"name":"python"},"kernel_info":{"name":"synapse_pyspark"},"a365ComputeOptions":null,"sessionKeepAliveTimeout":0,"microsoft":{"language":"python","language_group":"synapse_pyspark","ms_spell_check":{"ms_spell_check_language":"en"}},"widgets":{},"nteract":{"version":"nteract-front-end@1.0.0"},"spark_compute":{"compute_id":"/trident/default","session_options":{"conf":{"spark.synapse.nbs.session.timeout":"1200000"}}},"dependencies":{"lakehouse":{}}},"nbformat":4,"nbformat_minor":5}
\ No newline at end of file
diff --git a/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_prepare_cicd_workspace.ipynb b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_prepare_cicd_workspace.ipynb
new file mode 100644
index 0000000..0e7f4ea
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/cicd-workspace/nb_prepare_cicd_workspace.ipynb
@@ -0,0 +1 @@
+{"cells":[{"cell_type":"markdown","source":["**This notebook will do the following:**\n","- Create the required folders in the File section\n","- Bind the cicd notebooks to the cicd lakehouse"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"fe0183e9-77ea-4029-9444-ba65d61bba9a"},{"cell_type":"markdown","source":["**Manual step**\n","- Manually create a schema enabled cicd lakehouse \n","- Remove the existing attached lakehouse and attach the newly created cicd lakehouse to this notebook"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"ed6598db-a7be-4e69-8171-869553c8c998"},{"cell_type":"markdown","source":["**Variables**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"748fefea-2fd2-4c57-b901-56a0e8ae831e"},{"cell_type":"code","source":["vLakehouseName = \"cicdlakehouse\" # the name of your lakehouse\n","vCicdFolderName = \"cicd\"\n","vConnectionFolderName = \"connections\" # folder to host the extraction of exiting connections \n","vOnelakeAccessFolderName = \"onelake_access\"\n","vProjectName = \"fabric-cicd\" # replace by your ci\n","vDebugMode = \"yes\""],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"55e061f2-bcf0-4ddd-bc21-fc3a88672028"},{"cell_type":"markdown","source":["**Helper notebook**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"099e227e-1d3e-4e53-929f-0a8b4b2e6dbb"},{"cell_type":"code","source":["%run nb_helper"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"a02c4686-cc0f-411b-a084-8cec727db996"},{"cell_type":"markdown","source":["**Token and base url**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"d1351226-3083-4b4d-950d-0926db725ae6"},{"cell_type":"code","source":["vApiVersion = \"v1\"\n","vScope = \"https://analysis.windows.net/powerbi/api\"\n","vAccessToken = mssparkutils.credentials.getToken(vScope)\n","vBaseUrl = f\"https://api.fabric.microsoft.com/{vApiVersion}/\"\n","vHeaders = {'Authorization': f'Bearer {vAccessToken}'}"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"6cc2f2f2-698f-4ba7-aadb-d954225d57e4"},{"cell_type":"markdown","source":["**Resolve current workspace name and id**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"eb7e1857-8aed-4dff-bd8d-4c2f8d558f87"},{"cell_type":"code","source":["vWorkspaceName, vWorkspaceId = fabric.resolve_workspace_name_and_id()\n","print(vWorkspaceName, vWorkspaceId)"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"f11a7548-6085-4e0b-ba7d-51aa10a836a3"},{"cell_type":"markdown","source":["**Create a schema called staging**"],"metadata":{"nteract":{"transient":{"deleting":false}}},"id":"646661c7-8b91-466f-9187-b12b3b229100"},{"cell_type":"code","source":["%%sql\n","CREATE SCHEMA staging"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"sparksql","language_group":"synapse_pyspark"},"collapsed":false},"id":"6580b0c9-a7ce-48e2-b61c-4206ee3bb403"},{"cell_type":"markdown","source":["**Create the folders in the lakehouse**"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"3bc41d14-d70d-4c51-9fe9-ece7730a90d6"},{"cell_type":"code","source":["# this folder can be used to host a file containing the list of existing connections in a fabric tenant\n","# the list can help creating the mapping_connections.json secure file required for running post deployment steps in the yaml pipeline\n","# it is not specific to a particular project\n","notebookutils.fs.mkdirs(f\"Files/{vCicdFolderName}/{vConnectionFolderName}\") \n","\n","# this folder can be use to host the onelake access files generated by nb_extract_lakehouse_access\n","# these files can help the creation of dedicated roles in target lakehouses \n","# the folder should be specific to a project\n","notebookutils.fs.mkdirs(f\"Files/{vCicdFolderName}/{vOnelakeAccessFolderName}/{vProjectName}\") "],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"3fc003cd-763a-4950-9b7e-ecf7098b9450"},{"cell_type":"markdown","source":["**Manual step**\n","- For all projects --> extract the list of connections in the tenant\n","- Per project --> run nb_extract_lakehouse_access to extract the onelake roles definition for a specific project"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"08220090-7bfb-4900-ae28-ca955e489868"},{"cell_type":"markdown","source":["**Define the default lakehouse in the cicd notebooks:**\n","- nb_cicd_post_deployment\n","- nb_cicd_post_update_data_pipelines\n","- nb_cicd_post_update_notebooks\n","- nb_cicd_post_update_semantic_models\n","- nb_cicd_pre_deployment\n","- nb_cicd_pre_update_lakehouses\n","- nb_cicd_pre_update_warehouses"],"metadata":{"nteract":{"transient":{"deleting":false}},"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"77b859b4-422b-4c02-b46a-e9d1991409bb"},{"cell_type":"code","source":["vLakehouseId = fabric.resolve_item_id(vLakehouseName)\n","\n","# get the list of data pipelines in the target workspace\n","vNotebookList = [\n"," 'nb_cicd_post_deployment',\n"," 'nb_cicd_post_update_data_pipelines',\n"," 'nb_cicd_post_update_notebooks',\n"," 'nb_cicd_post_update_semantic_models',\n"," 'nb_cicd_pre_deployment',\n"," 'nb_cicd_pre_update_lakehouses',\n"," 'nb_cicd_pre_update_warehouses'\n","]\n","\n","df_notebooks = notebookutils.notebook.list(workspaceId=vWorkspaceId)\n","for notebook in df_notebooks:\n"," \n"," # get the notebook id and display name\n"," vNotebookId = notebook.id\n"," vNotebookName = notebook.displayName\n","\n"," if vNotebookName in vNotebookList:\n","\n"," # get the current notebook definition\n"," vNotebookDefinition = notebookutils.notebook.getDefinition(name=vNotebookName, workspaceId=vWorkspaceId) \n"," vNotebookJson = json.loads(vNotebookDefinition)\n","\n"," # update lakehouse dependencies\n"," try:\n","\n"," # check and remove any attached lakehouses\n"," if 'dependencies' in vNotebookJson['metadata'] \\\n"," and 'lakehouse' in vNotebookJson['metadata']['dependencies'] \\\n"," and vNotebookJson['metadata'][\"dependencies\"][\"lakehouse\"] is not None:\n","\n"," vCurrentLakehouse = vNotebookJson['metadata']['dependencies']['lakehouse']\n"," # print(vCurrentLakehouse)\n","\n"," if 'default_lakehouse_name' in vCurrentLakehouse:\n","\n"," vNotebookJson['metadata']['dependencies']['lakehouse'] = {}\n"," print(f\"attempting to update notebook <{vNotebookName}> with new default lakehouse: {vCurrentLakehouse['default_lakehouse_name']} in workspace <{vWorkspaceName}>.\")\n","\n"," # update new notebook definition after removing existing lakehouses and with new default lakehouseId\n"," notebookutils.notebook.updateDefinition(\n"," name = vNotebookName,\n"," content = json.dumps(vNotebookJson), \n"," defaultLakehouse = vLakehouseName, #vCurrentLakehouse['default_lakehouse_name'],\n"," defaultLakehouseWorkspace = vWorkspaceId,\n"," workspaceId = vWorkspaceId\n"," )\n","\n"," print(f\"updated notebook <{vNotebookName}> in workspace <{vWorkspaceName}>.\")\n","\n"," else:\n"," print(f'no default lakehouse set for notebook <{vNotebookName}>, ignoring.')\n","\n"," vMessage = f\"succeeded\"\n"," except Exception as e:\n"," vMessage = f\"failed\"\n"," if vDebugMode == \"yes\":\n"," print(str(e))"],"outputs":[],"execution_count":null,"metadata":{"microsoft":{"language":"python","language_group":"synapse_pyspark"}},"id":"1a9f2c16-2d5c-4e02-b600-b7290e71b5fb"}],"metadata":{"kernel_info":{"name":"synapse_pyspark"},"kernelspec":{"name":"synapse_pyspark","language":"Python","display_name":"Synapse PySpark"},"language_info":{"name":"python"},"microsoft":{"language":"python","language_group":"synapse_pyspark","ms_spell_check":{"ms_spell_check_language":"en"}},"nteract":{"version":"nteract-front-end@1.0.0"},"spark_compute":{"compute_id":"/trident/default","session_options":{"conf":{"spark.synapse.nbs.session.timeout":"1200000"}}},"dependencies":{"lakehouse":{}}},"nbformat":4,"nbformat_minor":5}
\ No newline at end of file
diff --git a/accelerators/CICD/Git-base-deployments/project-workspace/cd-update-workspace-test-prod.yml b/accelerators/CICD/Git-base-deployments/project-workspace/cd-update-workspace-test-prod.yml
new file mode 100644
index 0000000..2b3c45d
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/project-workspace/cd-update-workspace-test-prod.yml
@@ -0,0 +1,272 @@
+pool:
+ vmImage: ubuntu-20.04
+
+name: release pipeline
+
+variables:
+ - group: GroupFabricWorkspaces
+ - group: GroupDevOps
+
+trigger: none
+
+stages:
+
+ - stage: stage1
+ displayName: 'Stage 1 - Authentication'
+
+ jobs:
+
+ - job: interactive_login
+ displayName: 'Interactive login'
+
+ steps:
+
+ - task: Bash@3
+ displayName: 'Azure CLI - Interactive Login'
+ name: azure_cli_interactive_login
+ inputs:
+ targetType: 'inline'
+ script: |
+ echo "Triggering interactive Azure CLI login..."
+ az login --use-device-code
+
+ - task: Bash@3
+ displayName: 'Get access token for Fabric APIs'
+ name: get_access_token_fabric_api
+ inputs:
+ targetType: 'inline'
+ script: |
+ FABRIC_BEARER_TOKEN=$(az account get-access-token --resource https://api.fabric.microsoft.com/ --query accessToken -o tsv)
+ # Set the tokens as pipeline variables
+ echo "##vso[task.setvariable variable=FABRIC_BEARER_TOKEN;isOutput=true;]$FABRIC_BEARER_TOKEN"
+ echo "BEAR TOKEN-------$(FABRIC_BEARER_TOKEN)"
+
+ - task: Bash@3
+ displayName: 'Get access token for SQL Server'
+ name: get_access_token_sql_server
+ inputs:
+ targetType: 'inline'
+ script: |
+ SQL_BEARER_TOKEN=$(az account get-access-token --resource https://database.windows.net/ --query accessToken -o tsv)
+ # Set the tokens as pipeline variables
+ echo "##vso[task.setvariable variable=SQL_BEARER_TOKEN;isOutput=true;]$SQL_BEARER_TOKEN"
+ echo "BEAR TOKEN-------$(SQL_BEARER_TOKEN)"
+
+ - stage: stage2
+ displayName: 'Stage 2 - Deployment to TEST'
+ dependsOn:
+ - stage1
+
+ jobs:
+
+ - job: deployment
+ displayName: 'Deployment'
+ variables:
+ fabricToken: $[ stageDependencies.stage1.interactive_login.outputs['get_access_token_fabric_api.FABRIC_BEARER_TOKEN'] ]
+ sqlToken: $[ stageDependencies.stage1.interactive_login.outputs['get_access_token_sql_server.SQL_BEARER_TOKEN'] ]
+
+ steps:
+
+ - task: DownloadSecureFile@1
+ displayName: 'Download the mapping connections secure file'
+ name: mapping_connections_download
+ inputs:
+ secureFile: '$(MappingConnectionsFileName)'
+
+ - task: DownloadSecureFile@1
+ displayName: 'Download the onelake roles secure file'
+ name: onelake_roles_download
+ inputs:
+ secureFile: '$(OnelakeRolesFileName)'
+
+ - task: DownloadSecureFile@1
+ displayName: 'Download the onelake rules secure file'
+ name: onelake_rules_download
+ inputs:
+ secureFile: '$(OnelakeRulesFileName)'
+
+ - task: DownloadSecureFile@1
+ displayName: 'Download the onelake entra members secure file'
+ name: onelake_entra_members_download
+ inputs:
+ secureFile: '$(OnelakeEntraMembersFileName)'
+
+ - task: DownloadSecureFile@1
+ displayName: 'Download the onelake item members secure file'
+ name: onelake_item_members_download
+ inputs:
+ secureFile: '$(OnelakeItemMembersFileName)'
+
+ - task: UsePythonVersion@0
+ displayName: 'Install python dependencies'
+ name: install_dependencies
+ inputs:
+ versionSpec: '3.9'
+ addToPath: true
+ - script: |
+ python -m pip install --upgrade pip
+ python -m pip install requests
+ python -m pip install pandas
+ python -m pip install argparse
+ python -m pip install regex
+
+ - task: PythonScript@0
+ displayName: "Run pre deployment steps - Lakehouses & Warehouses"
+ name: pre_deployment
+ inputs:
+ scriptSource: filePath
+ scriptPath: pipeline-scripts/pre_deployment.py
+ arguments: -oneLakeRolesFilePath "$(Agent.TempDirectory)/$(OnelakeRolesFileName)" -oneLakeRulesFilePath "$(Agent.TempDirectory)/$(OnelakeRulesFileName)" -oneLakeEntraMembersFilePath "$(Agent.TempDirectory)/$(OnelakeEntraMembersFileName)" -oneLakeItemMembersFilePath "$(Agent.TempDirectory)/$(OnelakeItemMembersFileName)"
+ env:
+ cicdWorkspaceId : '$(CiCdWorkspaceId)'
+ sourceWorkspaceId : '$(Stage1WorkspaceId)'
+ targetWorkspaceId : '$(Stage2WorkspaceId)'
+ fabricToken : $(fabricToken)
+ sqlToken : $(sqlToken)
+ projectName : $(ProjectName)
+ featureBranch : "NA"
+
+ - task: PythonScript@0
+ displayName: "Git process to deploy artifacts to workspace"
+ name: "git_update"
+ inputs:
+ scriptSource: filePath
+ scriptPath: pipeline-scripts/git_update.py
+ env:
+ token : '$(fabricToken)'
+ targetWorspaceId: '$(Stage2WorkspaceId)'
+ organizationName: '$(OrganizationName)'
+ projectName: '$(ProjectName)'
+ repositoryName: '$(RepositoryName)'
+ brancheName : '$(Stage2BrancheName)'
+ initializationStrategy: '$(InitializationStrategy)'
+ conflictResolutionPolicy: '$(ConflictResolutionPolicy)'
+ disconnectGit: "yes"
+
+ - task: PythonScript@0
+ displayName: "Run post deployment steps - Notebooks & Data Pipelines & Semantic Models/Reports"
+ name: post_deployment
+ inputs:
+ scriptSource: filePath
+ scriptPath: pipeline-scripts/post_deployment.py
+ arguments: -mappingConnectionsFilePath "$(Agent.TempDirectory)/$(MappingConnectionsFileName)"
+ env:
+ cicdWorkspaceId : '$(CiCdWorkspaceId)'
+ sourceWorkspaceId : '$(Stage1WorkspaceId)'
+ targetWorkspaceId : '$(Stage2WorkspaceId)'
+ fabricToken : $(fabricToken)
+ sqlToken : $(sqlToken)
+ targetStage : "Stage2"
+ projectName : $(ProjectName)
+ featureBranch : "NA"
+
+ - stage: stage3
+ displayName: 'Stage 3 - Deployment to PROD'
+ dependsOn:
+ - stage1
+
+ jobs:
+
+ - job: deployment
+ displayName: 'Deployment'
+ variables:
+ fabricToken: $[ stageDependencies.stage1.interactive_login.outputs['get_access_token_fabric_api.FABRIC_BEARER_TOKEN'] ]
+ sqlToken: $[ stageDependencies.stage1.interactive_login.outputs['get_access_token_sql_server.SQL_BEARER_TOKEN'] ]
+
+ steps:
+
+ - task: DownloadSecureFile@1
+ displayName: 'Download the mapping connections secure file'
+ name: mapping_connections_download
+ inputs:
+ secureFile: '$(MappingConnectionsFileName)'
+
+ - task: DownloadSecureFile@1
+ displayName: 'Download the onelake roles secure file'
+ name: onelake_roles_download
+ inputs:
+ secureFile: '$(OnelakeRolesFileName)'
+
+ - task: DownloadSecureFile@1
+ displayName: 'Download the onelake rules secure file'
+ name: onelake_rules_download
+ inputs:
+ secureFile: '$(OnelakeRulesFileName)'
+
+ - task: DownloadSecureFile@1
+ displayName: 'Download the onelake entra members secure file'
+ name: onelake_entra_members_download
+ inputs:
+ secureFile: '$(OnelakeEntraMembersFileName)'
+
+ - task: DownloadSecureFile@1
+ displayName: 'Download the onelake item members secure file'
+ name: onelake_item_members_download
+ inputs:
+ secureFile: '$(OnelakeItemMembersFileName)'
+
+ - task: UsePythonVersion@0
+ displayName: 'Install python dependencies'
+ name: install_dependencies
+ inputs:
+ versionSpec: '3.9'
+ addToPath: true
+ - script: |
+ python -m pip install --upgrade pip
+ python -m pip install requests
+ python -m pip install pandas
+ python -m pip install argparse
+ python -m pip install regex
+
+ - task: PythonScript@0
+ displayName: "Run pre deployment steps - Lakehouses & Warehouses"
+ name: pre_deployment
+ inputs:
+ scriptSource: filePath
+ scriptPath: pipeline-scripts/pre_deployment.py
+ arguments: -oneLakeRolesFilePath "$(Agent.TempDirectory)/$(OnelakeRolesFileName)" -oneLakeRulesFilePath "$(Agent.TempDirectory)/$(OnelakeRulesFileName)" -oneLakeEntraMembersFilePath "$(Agent.TempDirectory)/$(OnelakeEntraMembersFileName)" -oneLakeItemMembersFilePath "$(Agent.TempDirectory)/$(OnelakeItemMembersFileName)"
+ env:
+ cicdWorkspaceId : '$(CiCdWorkspaceId)'
+ sourceWorkspaceId : '$(Stage2WorkspaceId)'
+ targetWorkspaceId : '$(Stage3WorkspaceId)'
+ fabricToken : $(fabricToken)
+ sqlToken : $(sqlToken)
+ projectName : $(ProjectName)
+ featureBranch : "NA"
+
+ - task: PythonScript@0
+ displayName: "Git process to deploy artifacts to workspace"
+ name: "git_update"
+ inputs:
+ scriptSource: filePath
+ scriptPath: pipeline-scripts/git_update.py
+ env:
+ token : '$(fabricToken)'
+ targetWorspaceId: '$(Stage3WorkspaceId)'
+ organizationName: '$(OrganizationName)'
+ projectName: '$(ProjectName)'
+ repositoryName: '$(RepositoryName)'
+ brancheName : '$(Stage3BrancheName)'
+ initializationStrategy: '$(InitializationStrategy)'
+ conflictResolutionPolicy: '$(ConflictResolutionPolicy)'
+ disconnectGit: "yes"
+
+ - task: PythonScript@0
+ displayName: "Run post deployment steps - Notebooks & Data Pipelines & Semantic Models/Reports"
+ name: post_deployment
+ inputs:
+ scriptSource: filePath
+ scriptPath: pipeline-scripts/post_deployment.py
+ arguments: -mappingConnectionsFilePath "$(Agent.TempDirectory)/$(MappingConnectionsFileName)"
+ env:
+ cicdWorkspaceId : '$(CiCdWorkspaceId)'
+ sourceWorkspaceId : '$(Stage2WorkspaceId)'
+ targetWorkspaceId : '$(Stage3WorkspaceId)'
+ fabricToken : $(fabricToken)
+ sqlToken : $(sqlToken)
+ targetStage : "Stage3"
+ projectName : $(ProjectName)
+ featureBranch : "NA"
+
+
+
\ No newline at end of file
diff --git a/accelerators/CICD/Git-base-deployments/project-workspace/ci-get-set-feature-branch.yml b/accelerators/CICD/Git-base-deployments/project-workspace/ci-get-set-feature-branch.yml
new file mode 100644
index 0000000..7528e49
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/project-workspace/ci-get-set-feature-branch.yml
@@ -0,0 +1,80 @@
+pool:
+
+ vmImage: ubuntu-20.04
+
+name: release pipeline
+
+variables:
+ - group: GroupFabricWorkspaces
+ - group: GroupDevOps
+ - group: DynamicGroup
+
+trigger: none # Disable CI trigger for direct pushes
+
+pr:
+ branches:
+ include:
+ - main # Trigger only when there is a PR targeting the dev branch
+
+
+stages:
+
+ - stage: stage1
+ displayName: 'Stage 1 - Get and set feature branch'
+
+ jobs:
+
+ - job: get_set_feature_branch
+ displayName: 'Get and set feature branch'
+
+ steps:
+
+ # Using the task syntax
+ - task: PowerShell@2
+ displayName: "Get the feature branch name"
+ name: "get_feature_branch_name"
+ inputs:
+ targetType: inline
+ script: |
+
+ $FEATURE_BRANCH_TEMP = $env:SYSTEM_PULLREQUEST_SOURCEBRANCH
+ $FEATURE_BRANCH = $FEATURE_BRANCH_TEMP -replace 'refs/heads/', ''
+ Write-Host "Feature branch is: $FEATURE_BRANCH"
+ Write-Host "##vso[task.setvariable variable=FEATURE_BRANCH;isOutput=true;]$FEATURE_BRANCH"
+
+ - task: AzureCLI@2
+ displayName: "Set the feature branch name in DynamicGroup"
+ name: "set_feature_branch_name"
+ inputs:
+ azureSubscription: 'ME-MngEnvMCAP471958-rsayegh-1(cdc2b911-3053-4dc6-a4f1-e979614c7cfd)' # Replace with your Azure DevOps service connection name
+ scriptType: pscore
+ scriptLocation: inlineScript
+ inlineScript: |
+ $organisationUrl = "$(System.TeamFoundationCollectionUri)"
+ $project = "$(System.TeamProject)"
+ $variableGroupName = "DynamicGroup"
+ $variableName = "FeatureBranch"
+ $variableValue = "$(get_feature_branch_name.FEATURE_BRANCH)"
+
+ # Get Variable Group ID
+ $variableGroupId = $(az pipelines variable-group list --organization $organisationUrl --project $project --query "[?name=='$variableGroupName'].id" --output tsv)
+
+ if (-not $variableGroupId) {
+ Write-Host "##vso[task.logissue type=error]Variable group '$variableGroupName' not found"
+ exit 1
+ }
+
+ # Update the variable in the Variable Group
+ az pipelines variable-group variable update `
+ --organization $organisationUrl `
+ --project $project `
+ --group-id $variableGroupId `
+ --name $variableName `
+ --value $variableValue
+
+ Write-Host "Successfully updated $variableName in $variableGroupName"
+ env:
+ AZURE_DEVOPS_EXT_PAT: $(System.AccessToken)
+
+
+
\ No newline at end of file
diff --git a/accelerators/CICD/Git-base-deployments/project-workspace/ci-update-workspace-dev.yml b/accelerators/CICD/Git-base-deployments/project-workspace/ci-update-workspace-dev.yml
new file mode 100644
index 0000000..3ef3f37
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/project-workspace/ci-update-workspace-dev.yml
@@ -0,0 +1,163 @@
+pool:
+ vmImage: ubuntu-20.04
+
+name: release pipeline
+
+variables:
+ - group: GroupFabricWorkspaces
+ - group: GroupDevOps
+ - group: DynamicGroup
+
+trigger:
+- main
+
+stages:
+
+ - stage: stage1
+ displayName: 'Stage 1 - Authentication'
+
+ jobs:
+
+ - job: interactive_login
+ displayName: 'Interactive login'
+
+ steps:
+
+ - task: Bash@3
+ displayName: 'Azure CLI - Interactive Login'
+ name: azure_cli_interactive_login
+ inputs:
+ targetType: 'inline'
+ script: |
+ echo "Triggering interactive Azure CLI login..."
+ az login --use-device-code
+
+ - task: Bash@3
+ displayName: 'Get access token for Fabric APIs'
+ name: get_access_token_fabric_api
+ inputs:
+ targetType: 'inline'
+ script: |
+ FABRIC_BEARER_TOKEN=$(az account get-access-token --resource https://api.fabric.microsoft.com/ --query accessToken -o tsv)
+ # Set the tokens as pipeline variables
+ echo "##vso[task.setvariable variable=FABRIC_BEARER_TOKEN;isOutput=true;]$FABRIC_BEARER_TOKEN"
+ echo "BEAR TOKEN-------$(FABRIC_BEARER_TOKEN)"
+
+ - task: Bash@3
+ displayName: 'Get access token for SQL Server'
+ name: get_access_token_sql_server
+ inputs:
+ targetType: 'inline'
+ script: |
+ SQL_BEARER_TOKEN=$(az account get-access-token --resource https://database.windows.net/ --query accessToken -o tsv)
+ # Set the tokens as pipeline variables
+ echo "##vso[task.setvariable variable=SQL_BEARER_TOKEN;isOutput=true;]$SQL_BEARER_TOKEN"
+ echo "BEAR TOKEN-------$(SQL_BEARER_TOKEN)"
+
+ - stage: stage2
+ displayName: 'Stage 2 - Deployment to DEV'
+ dependsOn:
+ - stage1
+
+ jobs:
+
+ - job: deployment
+ displayName: 'Deployment'
+ variables:
+ fabricToken: $[ stageDependencies.stage1.interactive_login.outputs['get_access_token_fabric_api.FABRIC_BEARER_TOKEN'] ]
+ sqlToken: $[ stageDependencies.stage1.interactive_login.outputs['get_access_token_sql_server.SQL_BEARER_TOKEN'] ]
+
+ steps:
+
+ - task: DownloadSecureFile@1
+ displayName: 'Download the mapping connections secure file'
+ name: mapping_connections_download
+ inputs:
+ secureFile: '$(MappingConnectionsFileName)'
+
+ - task: DownloadSecureFile@1
+ displayName: 'Download the onelake roles secure file'
+ name: onelake_roles_download
+ inputs:
+ secureFile: '$(OnelakeRolesFileName)'
+
+ - task: DownloadSecureFile@1
+ displayName: 'Download the onelake rules secure file'
+ name: onelake_rules_download
+ inputs:
+ secureFile: '$(OnelakeRulesFileName)'
+
+ - task: DownloadSecureFile@1
+ displayName: 'Download the onelake entra members secure file'
+ name: onelake_entra_members_download
+ inputs:
+ secureFile: '$(OnelakeEntraMembersFileName)'
+
+ - task: DownloadSecureFile@1
+ displayName: 'Download the onelake item members secure file'
+ name: onelake_item_members_download
+ inputs:
+ secureFile: '$(OnelakeItemMembersFileName)'
+
+ - task: UsePythonVersion@0
+ displayName: 'Install python dependencies'
+ name: install_dependencies
+ inputs:
+ versionSpec: '3.9'
+ addToPath: true
+ - script: |
+ python -m pip install --upgrade pip
+ python -m pip install requests
+ python -m pip install pandas
+ python -m pip install argparse
+ python -m pip install regex
+
+ - task: PythonScript@0
+ displayName: "Run pre deployment steps - Lakehouses & Warehouses"
+ name: pre_deployment
+ inputs:
+ scriptSource: filePath
+ scriptPath: pipeline-scripts/pre_deployment.py
+ arguments: -oneLakeRolesFilePath "$(Agent.TempDirectory)/$(OnelakeRolesFileName)" -oneLakeRulesFilePath "$(Agent.TempDirectory)/$(OnelakeRulesFileName)" -oneLakeEntraMembersFilePath "$(Agent.TempDirectory)/$(OnelakeEntraMembersFileName)" -oneLakeItemMembersFilePath "$(Agent.TempDirectory)/$(OnelakeItemMembersFileName)"
+ env:
+ cicdWorkspaceId : '$(CiCdWorkspaceId)'
+ sourceWorkspaceId : '$(FeatureBranch)' # feature branch, used to resolve the feature workspace
+ targetWorkspaceId : '$(Stage1WorkspaceId)' # dev workspace
+ fabricToken : $(fabricToken)
+ sqlToken : $(sqlToken)
+ projectName : $(ProjectName)
+ featureBranch : $(FeatureBranch)
+
+ - task: PythonScript@0
+ displayName: "Git process to deploy artifacts to workspace"
+ name: "git_update"
+ inputs:
+ scriptSource: filePath
+ scriptPath: pipeline-scripts/git_update.py
+ env:
+ token : '$(fabricToken)'
+ targetWorspaceId: '$(Stage1WorkspaceId)'
+ organizationName: '$(OrganizationName)'
+ projectName: '$(ProjectName)'
+ repositoryName: '$(RepositoryName)'
+ brancheName : '$(Stage1BrancheName)'
+ initializationStrategy: '$(InitializationStrategy)'
+ conflictResolutionPolicy: '$(ConflictResolutionPolicy)'
+ disconnectGit: "no"
+
+ - task: PythonScript@0
+ displayName: "Run post deployment steps - Notebooks & Data Pipelines & Semantic Models/Reports"
+ name: post_deployment
+ inputs:
+ scriptSource: filePath
+ scriptPath: pipeline-scripts/post_deployment.py
+ arguments: -mappingConnectionsFilePath "$(Agent.TempDirectory)/$(MappingConnectionsFileName)"
+ env:
+ cicdWorkspaceId : '$(CiCdWorkspaceId)'
+ sourceWorkspaceId : '$(FeatureBranch)' # feature branch, used to resolve the feature workspace
+ targetWorkspaceId : '$(Stage1WorkspaceId)' # dev workspace
+ fabricToken : $(fabricToken)
+ sqlToken : $(sqlToken)
+ targetStage : "Stage1"
+ projectName : $(ProjectName)
+ featureBranch : $(FeatureBranch)
\ No newline at end of file
diff --git a/accelerators/CICD/Git-base-deployments/project-workspace/git_update.py b/accelerators/CICD/Git-base-deployments/project-workspace/git_update.py
new file mode 100644
index 0000000..22f8814
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/project-workspace/git_update.py
@@ -0,0 +1,353 @@
+# import argparse
+import json
+import requests
+import time
+import os
+
+# env variables
+token = os.environ['token']
+targetWorspaceId = os.environ['targetWorspaceId']
+organizationName = os.environ['organizationName']
+projectName = os.environ['projectName']
+repositoryName = os.environ['repositoryName']
+brancheName = os.environ['brancheName']
+initializationStrategy = os.environ['initializationStrategy']
+conflictResolutionPolicy = os.environ['conflictResolutionPolicy']
+requiredAction = "UpdateFromGit"
+directoryName = "workspace"
+disconnectGit = os.environ['disconnectGit']
+
+#####################
+# base url and header
+#####################
+vBaseUrl = 'https://api.fabric.microsoft.com/v1/'
+vHeader = {'Content-Type':'application/json','Authorization': f'Bearer {token}'}
+
+##################################
+# function to disconnect workspace
+##################################
+def git_disconnect(targetWorspaceId):
+
+ vMessage = f"disconnecting workspace <{targetWorspaceId}>"
+ print(vMessage)
+
+ # url
+ vUrl = f"workspaces/{targetWorspaceId}/git/disconnect"
+
+ try:
+ # post the assignment
+ response = requests.post(vBaseUrl + vUrl, headers=vHeader)
+
+ # Raise an error for bad status codes
+ response.raise_for_status()
+
+ # get the status code and reason
+ status_code = response.status_code
+ status = response.reason
+
+ # check status
+ if status_code == 200:
+
+ vMessage = f"disconnecting workspace <{targetWorspaceId}> succeeded"
+ print(f"{vMessage}")
+ status = "succeeded"
+
+
+ except requests.exceptions.HTTPError as err:
+
+ errorCode = err.response.status_code
+ errorMessage = err.response.reason
+
+ vMessage = f"disconnecting workspace <{targetWorspaceId}> failed. error code <{errorCode}> and error message <{errorMessage}>"
+ print(f"{vMessage}")
+ status = "failed"
+
+ return status
+
+
+###############################################
+# function to connect the workspace to the repo
+###############################################
+def git_connect(targetWorspaceId, organizationName, projectName, repositoryName, brancheName):
+
+ vMessage = f"connecting workspace <{targetWorspaceId}> to git"
+ print(vMessage)
+
+ # url
+ vUrl = f"workspaces/{targetWorspaceId}/git/connect"
+
+ # json body
+ vJsonBody = {
+ "gitProviderDetails": {
+ "organizationName": f"{organizationName}",
+ "projectName": f"{projectName}",
+ "gitProviderType": "AzureDevOps",
+ "repositoryName": f"{repositoryName}",
+ "branchName": f"{brancheName}",
+ "directoryName": f"{directoryName}"
+ }
+ }
+
+ try:
+ # post the assignment
+ response = requests.post(vBaseUrl + vUrl, headers=vHeader, json=vJsonBody)
+
+ # Raise an error for bad status codes
+ response.raise_for_status()
+
+ # get the status code and reason
+ status_code = response.status_code
+ status = response.reason
+
+ # check status
+ if status_code == 200 or response.status_code == 204: # status is 204 as of 23.03.2024 is success, but not in API documentation
+
+ vMessage = f"connecting workspace <{targetWorspaceId}> to git succeeded"
+ print(f"{vMessage}")
+
+
+ except requests.exceptions.HTTPError as err:
+
+ errorCode = err.response.status_code
+ errorMessage = err.response.reason
+
+ vMessage = f"connecting workspace <{targetWorspaceId}> to git failed. error code <{errorCode}> and error message <{errorMessage}>"
+ print(f"{vMessage}")
+
+#######################################
+# function to initialise the connection
+#######################################
+def git_initialize(targetWorspaceId, initializationStrategy):
+
+ vMessage = f"initializing git for workspace <{targetWorspaceId}>"
+ print(vMessage)
+
+ # url
+ vUrl = f"workspaces/{targetWorspaceId}/git/initializeConnection"
+
+ # json body
+ vJsonBody = {
+ "initializationStrategy": f"{initializationStrategy}"
+ }
+ print(vJsonBody)
+
+ try:
+ # post the assignment
+ response = requests.post(vBaseUrl + vUrl, headers=vHeader, json=vJsonBody)
+
+ # Raise an error for bad status codes
+ response.raise_for_status()
+
+ # get the status code and reason
+ status_code = response.status_code
+ status = response.reason
+
+ # check status
+ if status_code == 200:
+
+ vMessage = f"initializing git for workspace <{targetWorspaceId}> succeeded"
+ print(f"{vMessage}")
+ remoteCommitHash = response.json().get('remoteCommitHash', '')
+ status = "succeeded"
+
+ if status_code == 202:
+
+ vMessage = f"initializing git for workspace <{targetWorspaceId}> - status 202"
+ print(vMessage)
+
+ # get the operation url from the header Locaiton
+ # doc https://learn.microsoft.com/en-us/rest/api/fabric/articles/long-running-operation
+ operationUrl = response.headers.get("Location")
+
+ vMessage = f"operation url: <{operationUrl}>"
+ print(vMessage)
+
+ waitTime = 30 # Example value
+
+ # monitor the operation
+ while True:
+
+ # sleep the specified time --> this wait time might need adjustment
+ time.sleep(waitTime)
+ print(f"sleeping {waitTime} seconds")
+
+ # check the operation status --> sync of artifacts takes time
+ operationResponse = requests.get(operationUrl, headers=vHeader)
+ jsonOperation = operationResponse.text
+ operation = json.loads(jsonOperation)
+
+ print(f"operation response <{operation}>")
+
+ # check operation status and break if success or failure
+ if operation['status'] == "Succeeded" or operation['status'] == "Failed":
+ status = "succeeded" if operation['status'] == "Succeeded" else "failed"
+
+ if status == "succeeded":
+ vMessage = f"initializing git for workspace <{targetWorspaceId}> succeeded"
+ print(f"{vMessage}")
+ remoteCommitHash = operationResponse.json().get('remoteCommitHash', '')
+ if status == "failed":
+ vMessage = f"initializing git for workspace <{targetWorspaceId}> failed"
+ print(f"{vMessage}")
+ remoteCommitHash = ''
+
+ break
+
+ except requests.exceptions.HTTPError as err:
+
+ errorCode = err.response.status_code
+ errorMessage = err.response.reason
+
+ vMessage = f"initializing git for workspace <{targetWorspaceId}> failed. error code <{errorCode}> ; error message <{errorMessage}>"
+ print(f"{vMessage}")
+
+ remoteCommitHash = ''
+ status = "failed"
+
+
+ # return the status and the commit hash
+ return status, remoteCommitHash
+
+##########################################
+# function to update workspace from remote
+##########################################
+
+def git_update(targetWorspaceId, remoteCommitHash, conflictResolutionPolicy):
+
+ vMessage = f"updating workspace <{targetWorspaceId}> from remote"
+ print(vMessage)
+
+ # url
+ vUrl = f"workspaces/{targetWorspaceId}/git/updateFromGit"
+
+ # json body
+ vJsonBody = {
+ "remoteCommitHash": f"{remoteCommitHash}",
+ "conflictResolution": {
+ "conflictResolutionType": "Workspace",
+ "conflictResolutionPolicy": f"{conflictResolutionPolicy}",
+ },
+ "options": {
+ "allowOverrideItems": True
+ }
+ }
+
+ print(vJsonBody)
+
+ try:
+ # post the assignment
+ response = requests.post(vBaseUrl + vUrl, headers=vHeader, json=vJsonBody)
+
+ # Raise an error for bad status codes
+ response.raise_for_status()
+
+ # get the status code and reason
+ status_code = response.status_code
+ status = response.reason
+
+ # check status
+ if status_code == 200:
+
+ vMessage = f"updating workspace <{targetWorspaceId}> from remote succeeded"
+ print(f"{vMessage}")
+ remoteCommitHash = response.json().get('remoteCommitHash', '')
+ status = "succeeded"
+
+ if status_code == 202:
+
+ vMessage = f"updating workspace <{targetWorspaceId}> from remote - status 202"
+ print(vMessage)
+
+ # get the operation url from the header Locaiton
+ # doc https://learn.microsoft.com/en-us/rest/api/fabric/articles/long-running-operation
+ operationUrl = response.headers.get("Location")
+
+ vMessage = f"operation url: <{operationUrl}>"
+ print(vMessage)
+
+ waitTime = 30 # Example value
+
+ # monitor the operation
+ while True:
+
+ # sleep the specified time --> this wait time might need adjustment
+ time.sleep(waitTime)
+ print(f"sleeping {waitTime} seconds")
+
+ # check the operation status --> sync of artifacts takes time
+ operationResponse = requests.get(operationUrl, headers=vHeader)
+ jsonOperation = operationResponse.text
+ operation = json.loads(jsonOperation)
+
+ print(f"operation response <{operation}>")
+
+ # check operation status and break if success or failure
+ if operation['status'] == "Succeeded" or operation['status'] == "Failed":
+ status = "succeeded" if operation['status'] == "Succeeded" else "failed"
+
+ if status == "succeeded":
+ vMessage = f"updating workspace <{targetWorspaceId}> from remote succeeded"
+ print(f"{vMessage}")
+
+ if status == "failed":
+ vMessage = f"updating workspace <{targetWorspaceId}> from remote failed"
+ print(f"{vMessage}")
+
+ break
+
+
+ except requests.exceptions.HTTPError as err:
+
+ errorCode = err.response.status_code
+ errorMessage = err.response.reason
+
+ vMessage = f"updating workspace <{targetWorspaceId}> from remote failed. error code <{errorCode}> ; error message <{errorMessage}>"
+ print(f"{vMessage}")
+
+ remoteCommitHash = ''
+ status = "failed"
+
+ # return the status and the commit hash
+ return status
+
+
+
+#############
+# git process
+#############
+
+try:
+ # step 0 - Disconnect Git if already connected
+ try:
+ statusDisconnect = git_disconnect(targetWorspaceId)
+ except Exception as e:
+ vMessage = f"disconnecting workspace <{targetWorspaceId}> failed. exception: {str(e)}"
+ print(f"{vMessage}")
+
+ # step 1 - Git - Connect
+ git_connect(targetWorspaceId, organizationName, projectName, repositoryName, brancheName)
+
+ # step 2 - Git - Initialize Connection
+ statusInitialization, remoteCommitHash = git_initialize(targetWorspaceId, initializationStrategy)
+ print(f"initialization status <{statusInitialization}>, remoteCommitHash <>{remoteCommitHash}")
+
+ # if the initialisation is successful, proceed further
+ if statusInitialization == "succeeded":
+
+ # step 3 - Git - Update From Git
+ statusUpdate = git_update(targetWorspaceId, remoteCommitHash, conflictResolutionPolicy)
+
+
+ # if the update is successful , disconnect the workspace from git
+ if statusUpdate == "succeeded":
+
+ # step 5 - Git - Disconnect
+ if disconnectGit == "yes":
+ statusDisconnect = git_disconnect(targetWorspaceId)
+ print(f"disconnect status <{statusUpdate}>")
+
+except Exception as e:
+ vMessage = f"git process for workspace <{targetWorspaceId}> failed. exception: {str(e)}"
+ print(f"{vMessage}")
+
+
diff --git a/accelerators/CICD/Git-base-deployments/project-workspace/post_deployment.py b/accelerators/CICD/Git-base-deployments/project-workspace/post_deployment.py
new file mode 100644
index 0000000..9a0abcb
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/project-workspace/post_deployment.py
@@ -0,0 +1,197 @@
+import os
+import requests
+import time
+import re
+import argparse
+import json
+
+
+
+# parser
+parser = argparse.ArgumentParser()
+parser.add_argument("-mappingConnectionsFilePath", type=str)
+args = parser.parse_args()
+
+
+# env variables
+cicdWorkspaceId = os.environ['cicdWorkspaceId']
+sourceWorkspaceId = os.environ['sourceWorkspaceId']
+targetWorkspaceId = os.environ['targetWorkspaceId']
+fabricToken = os.environ['fabricToken']
+sqlToken = os.environ['sqlToken']
+targetStage = os.environ['targetStage']
+projectName = os.environ['projectName']
+featureBranch = os.environ['featureBranch']
+mappingConnectionsFilePath = args.mappingConnectionsFilePath
+
+###############################################################
+# read the mapping connection json file and generate one line
+###############################################################
+with open(mappingConnectionsFilePath, "r") as file:
+ vMappingConnections = json.load(file)
+pMappingConnections = json.dumps(vMappingConnections, separators=(",", ":"))
+
+
+#####################
+# base url and header
+#####################
+vBaseUrl = 'https://api.fabric.microsoft.com/v1/'
+vHeader = {'Content-Type':'application/json','Authorization': f'Bearer {fabricToken}'}
+
+
+#####################
+# Define variables
+#####################
+vWorkspaceId = cicdWorkspaceId
+vNotebookName = "nb_cicd_post_deployment"
+pSourceWorkspaceId = sourceWorkspaceId
+pTargetWorkspaceId = targetWorkspaceId
+pTargetStage = targetStage
+pTimeoutPerCellInSeconds = 300
+pTimeoutInSeconds = 900
+pProjectName = projectName
+pFeatureBranch = featureBranch
+
+#########################################
+# define the function to run the notebook
+#########################################
+def run_notebook(url, headers, body, operation, workspace_id, item_name, item_type, sleep_in_seconds):
+
+ vMessage = f"{operation} {item_type} <{item_name}> in workspace <{workspace_id}>"
+ print(vMessage)
+
+ try:
+
+ # post the assignment
+ if body is None:
+ response = requests.post(url, headers=headers)
+ else:
+ response = requests.post(url, headers=headers, json=body)
+
+ response.raise_for_status()
+
+ if response.status_code not in (200, 201, 202):
+ raise requests.exceptions.HTTPError(f"HTTP Error: {response.status_code} - {response.reason}")
+ else:
+
+ # check status
+ # if response.status_code == 201: # if status is 201 then the create item succeeded
+
+ # vMessage = f"{operation} {item_type} <{item_name}> in workspace <{workspace_id}> succeeded"
+ # print(f"{vMessage}")
+
+ if response.status_code == 202: # if status is 202 then the create item is in progress
+
+ vMessage = f"{operation} {item_type} <{item_name}> in workspace <{workspace_id}> is in progress"
+ print(vMessage)
+
+ # get the operation url from the header location
+ # doc https://learn.microsoft.com/en-us/rest/api/fabric/articles/long-running-operation
+ operation_url = response.headers.get("Location")
+ retry_after = int(response.headers.get("Retry-After"))
+
+ vMessage = f"waiting {retry_after} seconds before getting the operation status from url: <{operation_url}>"
+ print(f"{vMessage}")
+ time.sleep(retry_after)
+
+ # monitor the operation
+ while True:
+
+ try:
+
+ # check the operation
+ operation_response = requests.get(operation_url, headers=headers)
+ operation_response.raise_for_status()
+ operation_data = operation_response.json()
+
+ # Check if the API call is complete
+ status = operation_data.get("status")
+ if status in ["Cancelled", "Completed", "Failed", "Deduped"]:
+
+ vMessage = f"{operation} {item_type} <{item_name}> in workspace <{workspace_id}> finished with the status <{status}>."
+ print(f"{vMessage}")
+ break
+ else:
+ vMessage = f"{operation} {item_type} <{item_name}> in workspace <{workspace_id}> is still running with the status <{status}>."
+ print(f"{vMessage}")
+
+ except requests.exceptions.RequestException as e:
+ vMessage = f"calling operation url failed. exception: {e}"
+ print(f"{vMessage}")
+
+ # sleep the specified time --> this wait time might need adjustment based on your understanding of how long the notebook might run
+ time.sleep(vSleepInSeconds)
+
+ else: # any other status is a failure
+ vMessage = f"{operation} {item_type} <{item_name}> in workspace <{workspace_id}> failed"
+ print(f"{vMessage}")
+
+ except Exception as e:
+ print("failed to call the fabric api. exception:", str(e))
+ return None
+
+
+
+##########################
+# Extract the notebook Id
+##########################
+vUrl = vBaseUrl + f"workspaces/{vWorkspaceId}/notebooks"
+response = requests.get( vUrl, headers=vHeader)
+response.raise_for_status()
+notebooks = response.json().get("value", [])
+vNotebookId = next((nb["id"] for nb in notebooks if nb["displayName"] == vNotebookName), None)
+print(f"notebook id {vNotebookId}")
+
+##########################
+# Run the notebook
+##########################
+
+# set the body
+vJsonBody = {
+ "executionData": {
+ "parameters": {
+ "pSourceWorkspaceId": {
+ "value": f"{pSourceWorkspaceId}",
+ "type": "string"
+ },
+ "pTargetWorkspaceId": {
+ "value": f"{pTargetWorkspaceId}",
+ "type": "string"
+ },
+ "pTargetStage": {
+ "value": f"{pTargetStage}",
+ "type": "string"
+ },
+ "pDebugMode": {
+ "value": "no",
+ "type": "string"
+ },
+ "pTimeoutPerCellInSeconds": {
+ "value": f"{pTimeoutPerCellInSeconds}",
+ "type": "string"
+ },
+ "pTimeoutInSeconds": {
+ "value": f"{pTimeoutInSeconds}",
+ "type": "string"
+ },
+ "pProjectName": {
+ "value": f"{pProjectName}",
+ "type": "string"
+ },
+ "pFeatureBranch": {
+ "value": f"{pFeatureBranch}",
+ "type": "string"
+ },
+ "pMappingConnections": {
+ "value": f"{pMappingConnections}",
+ "type": "string"
+ }
+ }
+ }
+}
+
+vSleepInSeconds=30
+vUrl = vBaseUrl + f"workspaces/{vWorkspaceId}/items/{vNotebookId}/jobs/instances?jobType=RunNotebook"
+run_notebook(vUrl, vHeader, vJsonBody, "executing", vWorkspaceId, vNotebookName, "Notebook", vSleepInSeconds)
+
+
diff --git a/accelerators/CICD/Git-base-deployments/project-workspace/pre_deployment.py b/accelerators/CICD/Git-base-deployments/project-workspace/pre_deployment.py
new file mode 100644
index 0000000..f2f9e16
--- /dev/null
+++ b/accelerators/CICD/Git-base-deployments/project-workspace/pre_deployment.py
@@ -0,0 +1,219 @@
+import os
+import requests
+import time
+import re
+import argparse
+import json
+
+# parser
+parser = argparse.ArgumentParser()
+parser.add_argument("-oneLakeRolesFilePath", type=str)
+parser.add_argument("-oneLakeRulesFilePath", type=str)
+parser.add_argument("-oneLakeEntraMembersFilePath", type=str)
+parser.add_argument("-oneLakeItemMembersFilePath", type=str)
+args = parser.parse_args()
+
+# env variables
+cicdWorkspaceId = os.environ['cicdWorkspaceId']
+sourceWorkspaceId = os.environ['sourceWorkspaceId']
+targetWorkspaceId = os.environ['targetWorkspaceId']
+fabricToken = os.environ['fabricToken']
+sqlToken = os.environ['sqlToken']
+projectName = os.environ['projectName']
+featureBranch = os.environ['featureBranch']
+oneLakeRolesFilePath = args.oneLakeRolesFilePath
+oneLakeRulesFilePath = args.oneLakeRulesFilePath
+oneLakeEntraMembersFilePath = args.oneLakeEntraMembersFilePath
+oneLakeItemMembersFilePath = args.oneLakeItemMembersFilePath
+
+###############################################################
+# read the onelake json files and generate 1 line from each
+###############################################################
+# onelake roles
+with open(oneLakeRolesFilePath, "r") as file:
+ vOnelakeRoles = json.load(file)
+pOnelakeRoles = json.dumps(vOnelakeRoles, separators=(",", ":"))
+
+# onelake rules
+with open(oneLakeRulesFilePath, "r") as file:
+ vOnelakeRules = json.load(file)
+pOnelakeRules = json.dumps(vOnelakeRules, separators=(",", ":"))
+
+# onelake entra members
+with open(oneLakeEntraMembersFilePath, "r") as file:
+ vOnelakeEntraMembers = json.load(file)
+pOnelakeEntraMembers = json.dumps(vOnelakeEntraMembers, separators=(",", ":"))
+
+# onelake item members
+with open(oneLakeItemMembersFilePath, "r") as file:
+ vOnelakeItemMembers = json.load(file)
+pOnelakeItemMembers = json.dumps(vOnelakeItemMembers, separators=(",", ":"))
+
+
+#####################
+# base url and header
+#####################
+vBaseUrl = 'https://api.fabric.microsoft.com/v1/'
+vHeader = {'Content-Type':'application/json','Authorization': f'Bearer {fabricToken}'}
+
+
+#####################
+# Define variables
+#####################
+vWorkspaceId = cicdWorkspaceId
+vNotebookName = "nb_cicd_pre_deployment"
+pSourceWorkspaceId = sourceWorkspaceId
+pTargetWorkspaceId = targetWorkspaceId
+pProjectName = projectName
+pFeatureBranch = featureBranch
+
+#########################################
+# define the function to run the notebook
+#########################################
+def run_notebook(url, headers, body, operation, workspace_id, item_name, item_type, sleep_in_seconds):
+
+ vMessage = f"{operation} {item_type} <{item_name}> in workspace <{workspace_id}>"
+ print(vMessage)
+
+ try:
+
+ # post the assignment
+ if body is None:
+ response = requests.post(url, headers=headers)
+ else:
+ response = requests.post(url, headers=headers, json=body)
+
+ response.raise_for_status()
+
+ if response.status_code not in (200, 201, 202):
+ raise requests.exceptions.HTTPError(f"HTTP Error: {response.status_code} - {response.reason}")
+ else:
+
+ # check status
+ # if response.status_code == 201: # if status is 201 then the create item succeeded
+
+ # vMessage = f"{operation} {item_type} <{item_name}> in workspace <{workspace_id}> succeeded"
+ # print(f"{vMessage}")
+
+ if response.status_code == 202: # if status is 202 then the create item is in progress
+
+ vMessage = f"{operation} {item_type} <{item_name}> in workspace <{workspace_id}> is in progress"
+ print(vMessage)
+
+ # get the operation url from the header location
+ # doc https://learn.microsoft.com/en-us/rest/api/fabric/articles/long-running-operation
+ operation_url = response.headers.get("Location")
+ retry_after = int(response.headers.get("Retry-After"))
+
+ vMessage = f"waiting {retry_after} seconds before getting the operation status from url: <{operation_url}>"
+ print(f"{vMessage}")
+ time.sleep(retry_after)
+
+ # monitor the operation
+ while True:
+
+ try:
+
+ # check the operation
+ operation_response = requests.get(operation_url, headers=headers)
+ operation_response.raise_for_status()
+ operation_data = operation_response.json()
+
+ # Check if the API call is complete
+ status = operation_data.get("status")
+ if status in ["Cancelled", "Completed", "Failed", "Deduped"]:
+
+ vMessage = f"{operation} {item_type} <{item_name}> in workspace <{workspace_id}> finished with the status <{status}>."
+ print(f"{vMessage}")
+ break
+ else:
+ vMessage = f"{operation} {item_type} <{item_name}> in workspace <{workspace_id}> is still running with the status <{status}>."
+ print(f"{vMessage}")
+
+ except requests.exceptions.RequestException as e:
+ vMessage = f"calling operation url failed. exception: {e}"
+ print(f"{vMessage}")
+
+ # sleep the specified time --> this wait time might need adjustment based on your understanding of how long the notebook might run
+ time.sleep(vSleepInSeconds)
+
+ else: # any other status is a failure
+ vMessage = f"{operation} {item_type} <{item_name}> in workspace <{workspace_id}> failed"
+ print(f"{vMessage}")
+
+ except Exception as e:
+ print("failed to call the fabric api. exception:", str(e))
+ return None
+
+
+
+##########################
+# Extract the notebook Id
+##########################
+vUrl = vBaseUrl + f"workspaces/{vWorkspaceId}/notebooks"
+response = requests.get( vUrl, headers=vHeader)
+response.raise_for_status()
+notebooks = response.json().get("value", [])
+vNotebookId = next((nb["id"] for nb in notebooks if nb["displayName"] == vNotebookName), None)
+print(f"notebook id {vNotebookId}")
+
+##########################
+# Run the notebook
+##########################
+
+# set the body
+vJsonBody = {
+ "executionData": {
+ "parameters": {
+ "pToken": {
+ "value": f"{fabricToken}",
+ "type": "string"
+ },
+ "pSqlToken": {
+ "value": f"{sqlToken}",
+ "type": "string"
+ },
+ "pSourceWorkspaceId": {
+ "value": f"{pSourceWorkspaceId}",
+ "type": "string"
+ },
+ "pTargetWorkspaceId": {
+ "value": f"{pTargetWorkspaceId}",
+ "type": "string"
+ },
+ "pDebugMode": {
+ "value": "no",
+ "type": "string"
+ },
+ "pProjectName": {
+ "value": f"{pProjectName}",
+ "type": "string"
+ },
+ "pFeatureBranch": {
+ "value": f"{pFeatureBranch}",
+ "type": "string"
+ },
+ "pOnelakeRoles": {
+ "value": f"{pOnelakeRoles}",
+ "type": "string"
+ },
+ "pOnelakeRules": {
+ "value": f"{pOnelakeRules}",
+ "type": "string"
+ },
+ "pOnelakeEntraMembers": {
+ "value": f"{pOnelakeEntraMembers}",
+ "type": "string"
+ },
+ "pOnelakeItemMembers": {
+ "value": f"{pOnelakeItemMembers}",
+ "type": "string"
+ }
+ }
+ }
+}
+
+vSleepInSeconds=30
+vUrl = vBaseUrl + f"workspaces/{vWorkspaceId}/items/{vNotebookId}/jobs/instances?jobType=RunNotebook"
+run_notebook(vUrl, vHeader, vJsonBody, "executing", vWorkspaceId, vNotebookName, "Notebook", vSleepInSeconds)
+
diff --git a/accelerators/CICD/Git-base-deployments/resources/branch-out-new-workspace.png b/accelerators/CICD/Git-base-deployments/resources/branch-out-new-workspace.png
new file mode 100644
index 0000000..699ac7b
Binary files /dev/null and b/accelerators/CICD/Git-base-deployments/resources/branch-out-new-workspace.png differ
diff --git a/accelerators/CICD/Git-base-deployments/resources/deploy-to-test.png b/accelerators/CICD/Git-base-deployments/resources/deploy-to-test.png
new file mode 100644
index 0000000..c8f3f4b
Binary files /dev/null and b/accelerators/CICD/Git-base-deployments/resources/deploy-to-test.png differ
diff --git a/accelerators/CICD/Git-base-deployments/resources/git-based-deployment.png b/accelerators/CICD/Git-base-deployments/resources/git-based-deployment.png
new file mode 100644
index 0000000..4d47f6e
Binary files /dev/null and b/accelerators/CICD/Git-base-deployments/resources/git-based-deployment.png differ
diff --git a/accelerators/CICD/Git-base-deployments/resources/main-branch-policy.png b/accelerators/CICD/Git-base-deployments/resources/main-branch-policy.png
new file mode 100644
index 0000000..c3cc5de
Binary files /dev/null and b/accelerators/CICD/Git-base-deployments/resources/main-branch-policy.png differ
diff --git a/accelerators/CICD/Git-base-deployments/resources/repository-structure.png b/accelerators/CICD/Git-base-deployments/resources/repository-structure.png
new file mode 100644
index 0000000..ab2db1d
Binary files /dev/null and b/accelerators/CICD/Git-base-deployments/resources/repository-structure.png differ
diff --git a/accelerators/CICD/Git-base-deployments/resources/variable-group-permission.png b/accelerators/CICD/Git-base-deployments/resources/variable-group-permission.png
new file mode 100644
index 0000000..73ef93d
Binary files /dev/null and b/accelerators/CICD/Git-base-deployments/resources/variable-group-permission.png differ
diff --git a/accelerators/CICD/Git-base-deployments/resources/workpace-git-enablement.png b/accelerators/CICD/Git-base-deployments/resources/workpace-git-enablement.png
new file mode 100644
index 0000000..1350160
Binary files /dev/null and b/accelerators/CICD/Git-base-deployments/resources/workpace-git-enablement.png differ
diff --git a/accelerators/CICD/Git-base-deployments/resources/workspace-url.png b/accelerators/CICD/Git-base-deployments/resources/workspace-url.png
new file mode 100644
index 0000000..12f898f
Binary files /dev/null and b/accelerators/CICD/Git-base-deployments/resources/workspace-url.png differ