Azure Policy helps enforcing organizational standards and assess compliance of the different resources created in Azure making it a key component of governance in Azure.

There are now a few built-in policy definitions for Data Factory and we’re going to take a look at one of those policy definitions in this blog post. We’ll also create and apply one custom policy definition to understand how it can be done.

Table of Contents

Prerequisites

If you want to follow the examples described in this post, make sure you have the following items available:

  • An Azure subscription. You may register for a free Azure subscription if you don’t have one yet.
  • An Azure Resource Group where you will place the resources created. Please refer to Create a Resource Group.
  • An Azure Data Factory instance. Please refer to Create a Data Factory to see how to create a new instance using the portal.
  • For our custom example, we’ll need a Self-Hosted Integration Runtime (IR) but it can only be installed on 64-bit Windows machines (find the prerequisites here). The installer can be downloaded from this location and the setting up of the IR is described here. If you don’t have a 64-bit Windows machine, you can use an Azure VM for this purpose. The example should still be clear enough for you to understand it even if you don’t install the Self-Hosted IR.

Azure Policy - why is it important?

I won’t be diving into too much detail when it comes to Azure Policy, I’ll let that topic for another blog post, but it’s important to understand why Azure Policy is a key component in Azure governance.

We all have worked in companies/projects that have a set of defined rules regarding different topics: implementation procedures, security procedures, management procedures. We can also all probably agree that if there are no steps in places to enforce those same procedures, they tend not to be always followed, either by unfamiliarity, oversight or simple disregard of those same procedures.

That’s where Azure Policy comes in and can help us. By using it, we can check for resource consistency, regulatory compliance, security, cost and management. Azure provides us a long list of built-in policy definitions that can be applied to different resources and we can always create new policy definitions that suit our needs.

These policy definitions use a JSON format to form the logic the evaluation uses to determine if a resource is compliant or not. If you want to read more about Azure Policy, the MS Docs are a great resource.

Assign our first built-in policy - use Key Vault to store Linked Services secrets

The first policy definition we’re going to use is a policy to check whether Azure Data Factory (ADF) linked services are using Key Vault for storing secrets.

Let’s first go to ADF and create some linked services that we’ll use in this example.

Create ADF Linked Services

  1. Go to the Azure Portal and select the ADF resource you created and choose the option Author and monitor. As an alternative, you can go straight to adf.azure.com and select your tenancy, subscription and ADF resource created.
Launch ADF Launch ADF
  1. In the ADF page, select the option Manage on the left-hand side, the Linked Services option within the Connections section and click on the New button.
Manage Linked Services Manage Linked Services
  1. Search for “azure sql database” and choose the option available.
Azure SQL DB Linked Service Azure SQL DB Linked Service
  1. Fill in the following details and click the button Create. We’re obviously inserting fake details but for purposes of building this example, we don’t really need to specify real connections.
    • Name - AzureSqlDatabase
    • Account selection method - Enter manually
    • Fully qualified domain name - fakeserver.database.windows.net
    • Database name - fakedatabase
    • Authentication type - SQL authentication
    • User name - fakeuser
    • Password - fakepwd
Linked Service Details Linked Service Details
  1. Create two more linked services, one for an Azure Blob Storage and another for an Oracle database using a random value for the password fields. Details can be seen below.

Oracle Linked Service Oracle Linked Service
Blob Storage Linked Service Blob Storage Linked Service
Oracle Details Oracle Details
Blob Storage Details Blob Storage Details

  1. Click on the Publish all button to save the recently created linked services.
Publish All Publish All

Quick Look at the Policy Definition

With the linked services created, it’s now time to assign the policy definition to our subscription/resource group.

Before assigning the policy, let’s just look briefly at some components of the policy to get an understanding of what it will be looking for. This policy has a long JSON definition so we’re just going to look at some bits of it and focus on the policyRule section.

In Azure Policy, the operator allOf has the same meaning of a logical AND (all conditions need to be true) and the operator anyOf has the same meaning of a logical OR (one or more conditions need to be true).

The policy rule starts by defining that is’s going to look at all linked services. By being included in an allOf section it means that this policy rule will only evaluate to true if this condition and the following ones are all true.

Policy Rule First Condition Policy Rule First Condition

However, the next (long) section is an anyOf block which means that only one of the many conditions specified need to be true for this policy rule to be fulfilled.

Looking at just one of those conditions evaluated, we can see that it’s looking for any linked services that have a property named connectionString and, if they do, whether that connection string contains some specific words such as AccountKey=, PWD=, Password=, CredString=, pwd=.

If this condition evaluates to true, we’ll know that we have a linked service whose definition is specifying the password of a connection directly in the linked service (even though it will show as an encrypted credential in the linked service JSON) and not making use of Key Vault to store that password.

Policy Rule Second Condition Policy Rule Second Condition

Assign Policy Definition

We’ll assign this policy definition to the resource group we created and see the outcome of our resource compliance.

  1. Go to the Azure Portal and select the Policy service.
Azure Policy Service Azure Policy Service
  1. Select Definitions on the left side of the Azure Policy page. In the drop-down for Type choose the option Built-in and in the drop-down for Category choose the option Data Factory. There will be a few policy definitions listed, choose the one named “[Preview] Azure Data Factory linked services should use Key Vault for storing secrets” (Note: It’s possible that at the time you’re reading this blog post, the [Preview] suffix has been removed from the name).
Choose Policy Definition Choose Policy Definition
  1. In the next page, details of the policy definition are available such as its name, description, type, category and also what type of effects can be selected for this policy definition. Click on the button Assign.
Assign Assign
  1. In the Assign Policy page we need to go through a few tabs:
    • Basics tab:
      • Scope: Select the resource group you created in the prerequisites section
      • Assignment name: You can leave the default value or change for something else you rather call your assignment
      • Policy enforcement: Enabled (default value)
    • Parameters tab:
      • Effect: Audit (default value)
    • Remediation tab:
      • No changes needed
    • Non-compliance messages tab:
      • Non-compliance message: Insert message that will show when resource isn’t compliant
    • Review + create tab:
      • Click on the Create button

Basics tab Basics tab
Parameters tab Parameters tab
Non-compliance messages tab Non-compliance messages tab
Review + Create tab Review + Create tab

  1. If you now go back to the main Policy page and select the option Assignments, we can filter by the name you gave your assignment and see, the just created, new assignment.
Policy Assignments Policy Assignments
  1. Selecting the option Compliance, on the left side menu, will lead us to a page where we can filter again by assignment name and see how many resources are compliant with our policy. It may take up to 15-30mins for the policy assignment to kick in and you will see the compliance state of the policy marked as Not started until it’s evaluated for the first time.
Policy Compliance Not Started Policy Compliance Not Started
  1. Once it gets evaluated, we’ll see that we have a Non-compliant state and that none of our resources are in a compliant state. Click on the assignment name to see more details.
Policy Non-Compliant Policy Non-Compliant
  1. Finally, once inside the policy compliance page we can see more details. We have an overall compliance state and how many resources are compliant, exempt or non-compliant. We also have a list, per resource, with their state, scope and when was the last evaluation made. Clicking on the Details link of our Azure SQL Database resource, we can see specific information such as resource name, resource type, its compliance state and the custom non-compliance message we entered when creating a new assignment.
Non-compliance Details Non-compliance Details

Fix a Linked Service

After our assignment evaluation ran, we know that we have 3 resources in a non-compliant state. Let’s change that and make sure that one of these linked services gets changed to a compliant state.

Create Key Vault Linked Service

We’ll be using fake Key Vault details (we didn’t create one) which is enough to test our policy compliance. However, in a real use case scenario, you would have a proper Key Vault resource to store your secrets.

  1. Going back to ADF’s linked services page, select the option New and search for Key Vault.
Key Vault Linked Service Key Vault Linked Service
  1. Fill in the required details. Choose the option Enter manually and insert a Base url such as https://mykeyvault.vault.azure.net/. Click on the option Create once you’re done.
Key Vault Linked Service Details Key Vault Linked Service Details

Change Azure SQL Database Linked Service Definition

  1. Choose the AzureSqlDatabase linked service created previously to open its definition.
Linked Service Details Linked Service Details
  1. Edit the definitions of this linked service and click on the Save button.
    • Azure Key Vault instead of Connection string
    • For AKV linked service select the one created in the step above
    • For Secret name insert a random value such as AzureSqlDBConnectionString (this would be the name of the secret created in Key Vault)
Edit Azure SQL DB Linked Service Details Edit Azure SQL DB Linked Service Details
  1. Click on the Publish all button to save the most recent changes.
Publish All 2 Publish All 2

Check Linked Services Compliance

Going back to the Policy Compliance page, filtering by our assignment name, we can see that we now have 2 out of 4 resources compliant (again, this may take a few minutes before the evaluation runs again).

Policy Compliance Revised Policy Compliance Revised

Checking the assignment details and filtering for compliant resources, the Azure SQL Database is now listed as a compliant resource and we know we’re following the best practices for this particular resource. The Key Vault itself is also in a compliant state. In a real-world scenario, the same would need to be done for the other two linked services created and in a non-compliant state.

Azure SQL DB Linked Service Compliant Azure SQL DB Linked Service Compliant

Create a custom ADF policy definition

A few days ago I came across a topic in Stack Overflow where someone was looking for a way to enforce the use of a Self-Hosted IR for Linked Services. This was the perfect opportunity to make use of this new feature for ADF and implement a custom policy to check for this requirement.

Understanding Linked Services JSON definition

The first thing we need to look at is linked services JSON definition. We need to understand what to look for and where so we can come up with a custom policy.

The official documentation is pretty clear when it comes to this and we can see here that the JSON definition of a linked service will contain a block with property connectVia when the linked service makes use of a self-hosted integration runtime.

Linked Service JSON definition
1
2
3
4
5
6
7
8
9
10
11
12
13
{
    "name": "<Name of the linked service>",
    "properties": {
        "type": "<Type of the linked service>",
        "typeProperties": {
              "<data store or compute-specific type properties>"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Build the policy definition

The custom policy definition can be found on this link. Let’s go through the different components of it.

We’ll start with the properties displayName, description, mode and metadata. The first 2 are used to identify the policy and give some context about it. The mode tag defines which resources are evaluated by the policy definition, with the 2 options being all and indexed.

The metadata tag provides option information about the policy definition such as category which will detemrine under which category in the Azure portal the policy definition is displayed.

General Definition
1
2
3
4
5
6
"displayName":"Azure Data Factory should use Self-Hosted Integration Runtimes for Linked Services definitions",
"description":"All linked services created in Data Factory should connect through a Self-Hosted Integration Runtime when possible",
"mode":"All",
"metadata": {
   "category": "Data Factory"
},

The next block we’re going to look at is the parameters section. This policy definition will contain 2 parameters: one to define the type of effect used by the policy and the other to define a list of linked services types to be considered for policy evaluation. This last parameter is important because not all linked services support an integration runtime choice, Key Vault being one example of that.

For each parameter we need to define a name (effect for example), its type, metadata containing information like displayName or description and, optionally, a defaultValue and a list of allowedValues.

Parameters
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
"parameters": {
  "effect": {
    "type": "String",
    "metadata": {
      "displayName": "Effect",
      "description": "Enable or disable the execution of the policy"
    },
    "allowedValues": [
      "Audit",
      "Deny",
      "Disabled"
    ],
    "defaultValue": "Audit"
  },
  "allowedLinkedServiceResourceTypes": {
    "type": "Array",
    "metadata": {
      "displayName": "Linked services types to check for self-hosted IR", 
      "description": "This parameter should contain the list of all possible types of linked services to check for the use of self-hosteIR."
    },
    "allowedValues": [
      "AzureBlobFS",
      "AzureBlobStorage",
      "AzureSqlDatabase",
      "Oracle",
      "PostgreSql"
    ]
 }

As you can see the policy definition only consider five type of linked services. If you need to add more types to the policy definition, you can easily see the type value of a linked service by checking its JSON definition on the Linked services page in ADF.

Linked Service JSON Definition Linked Service JSON Definition

Finally, we have the policyRule block where we define which conditions need to be true for the policy to be enforced.

There are 3 conditions that need to be true: - the resource type needs to be a linked service - the property connectVia won’t exist in the JSON definition - the linked service type is within one of the types selected when assigning the policy

Parameters
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
"policyRule": {
  "if": {
    "allOf": [
      {
        "field": "type",
        "equals": "Microsoft.DataFactory/factories/linkedservices"
      },
      {
        "field": "Microsoft.DataFactory/factories/linkedservices/connectVia",
        "exists": "false"
      },
      {
        "field": "Microsoft.DataFactory/factories/linkedservices/type",
        "in": "[parameters('allowedLinkedServiceResourceTypes')]"
      }
    ]
  },
  "then": {
    "effect": "[parameters('effect')]"
  }
}

More information about a policy definition structure can be found here.

Create a self-hosted integration runtime in Data Factory

As listed on the prerequisites section, you’ll need a self-hosted IR to test this example. If you followed the links available on that section, you should have a self-hosted integration runtime created in ADF looking like this.

Self-Hosted Integration Runtime Self-Hosted Integration Runtime

If you didn’t have the possibility of creating a self-hosted IR, just follow along and the explanation should be clear.

Create a policy definition in the Portal

We’ll now go back to the Azure Portal and the Policy Definitions page where we’ll select the option + Policy definition

New Policy Definition New Policy Definition

In the New Policy definition page, you should choose the Definition location as the subscription you’re using ((it has to be a subscription or a management group), give a Name to the policy and in the Category option choose Use existing and select Data Factory from the drop-down box.

In the Policy Rule copy the JSON code available in GitHub and paste it there. Once all this is done, click on the option Save

Policy Definition 1 Policy Definition 1
Policy Definition 2 Policy Definition 2

Back in the Policy Definitions page, filtering by Type custom, you’ll see the new policy definition created.

Custom Policies Custom Policies

Assign the Custom Policy Definition

We now need to create an assignment with the custom policy definition to evaluate the state of our resources against that policy.

  1. Go back to the Policy Definitions page, select the Type custom and click on the Name of the custom policy definition created in the previous section.
Custom Policies Custom Policies
  1. Select the Assign option
Assign Custom Policy Assign Custom Policy
  1. In the Assign Policy page the only required changes are:
    • Scope tab:
      • Scope: Select the resource group you created in the prerequisites section
    • Parameters tab:
      • Effect: Audit (default value)
      • Linked services types: Check all 5 available in the list
    • Non-compliance messages tab:
      • Non-compliance message: Insert message that will show when resource isn’t compliant
    • Review + create tab:
      • Click on the Create button

Custom Policy Basics tab Custom Policy Basics tab
Custom Policy Parameters tab Custom Policy Parameters tab
Custom Policy Non-compliance messages tab Custom Policy Non-compliance messages tab
Custom Policy Review + Create tab Custom Policy Review + Create tab

  1. Selecting the option Compliance we can see how many resources are compliant with our new assignment (always taking into consideration it may take a few minutes for the policy to run). 3 out of 4 possible resources are in a non-compliant state which is the expected result, none of our linked services is making use of a self-hosted IR except for the Key Vault linked service which doesn’t qualify to be evaluated by this policy.
Custom Policy Compliance Custom Policy Compliance
  1. Checking the assignment details by clicking on its name, we have the 3 linked services not compliant with this policy listed. If we click on the Details link, we can see our non-compliance message.
Custom Policy Compliance Details Custom Policy Compliance Details

Change one Linked Service to use a self-hosted integration runtime

Let’s now edit the definition of one of our Linked Services to see if its compliance state changes.

  1. Back in the ADF Linked Services page, click on the Oracle linked service
Edit Oracle Linked Service Edit Oracle Linked Service
  1. In the option Connect via integration runtime, we’re going to change this value to make use of the recently created self-hosted IR. Because we changed a property, we need to input a random password again. Click the Apply button after making these 2 changes.
Edit Oracle Linked Service Details Edit Oracle Linked Service Details
  1. If we look at the JSON definition of the Oracle Linked Service, we can see that it now included the property connectVia as expected.
Oracle Linked Service JSON Definition Oracle Linked Service JSON Definition
  1. Waiting a few more minutes for the policy evaluation to run again, we now have 2 out of 4 resources in a compliant state and by clicking on the assignment name, we can look at the details page and see that the Oracle linked service is now in a compliant state.
Custom Policy Compliance Re-evaluated Custom Policy Compliance Re-evaluated
Custom Policy Compliance Details Re-evaluated Custom Policy Compliance Details Re-evaluated

Wrapping Up

On this blog post you saw how you can make use of the new Azure Policy definitions for Data Factory and also I how to create custom policy definitions that can be applied to Data Factory resources.

Don’t forget to delete any resources you’ve created to follow this blog post if you no longer need them.

I hope this post was useful, thanks for reading!