Automating Business Hierarchy Groups in Okta with Terraform

Introduction

Last week we discussed what went into Terraform and Github Pipelines, and Okta workspaces. A brief summary is below.

Key Points
Workspace Segmentation

Dividing infrastructure into distinct workspaces to scale and prevent configuration drift.

Core Stack: Critical components like automation scripts, groups, applications, policies, and core infrastructure.

Standard Stack: Non-critical elements such as contributions from other teams, branding, and less essential applications.

Development Workflow

Develop: Individual local environments for initial development.

Feature/Change Validation: Testing and quality assurance in a preview environment.

Release: Deployment to the production environment.

Branch Protection

Enforces branch protection policies on main (production) branches to ensure stability.

Development and testing occur in separate branches to mitigate risks.

Multiple Repositories and Workspaces

Using multiple GitHub repositories and Terraform workspaces for self-contained modules.

Facilitates automated runs and reduces interdependencies.

Personal Development Tenants

Team members maintain personal tenants for local Terraform development.

Promotes isolated testing and minimizes the impact on shared environments.

The blog post outlines a structured approach to scaling and automating Okta environments using Terraform. The segmentation, workflow, and use of personal tenants create a scalable, organized framework that balances control with flexibility for growth.

This blog post, will generally go over scaling our Department Groups using Terraform automatically, and using Group Rules to do so. This includes a break down of the code, and the pieces for each.

The Implementation

Now let’s get on to the configuration.

Automate Group Memberships

A good group structure is one of the most important things you can have. This seems like the most straightforward task, but given the chaos of groups and organizational structures inside a company, it is anything but that. Below, I outline the complications we experienced and how we built it using Terraform.

Some of this was done with the help of ChatGPT, for example, by creating lowercase, hyphenated names of cities, offices, countries, etc. However, this is still completely doable without the use of ChatGPT.

Automate Business Line Creation of Groups and their Memberships

One of the trickier methods for us to accomplish. We have a large structure for our business lines, which can change anytime. We have 200+ business line structures that decline and expand depending on business needs. We also have had severe amounts of Tech Debt, which we have been slowly cleaning up.

Requirements

An internal immutable reference ID for an object, which means that the ID cannot change.
The ability to manage or control the change management of the groups, while still being able to automate everything within the group.
Have an external system managing the organization of teams (such as NetSuite, Workday, BambooHR, a Database system, etc).
Groups have a naming scheme or policy that is easy to read, follow, and use by other company employees.

So how did we plan this?

First, we need to determine the data for each user. We could do this in various ways, but if we have a source of truth that is already feeding us this data, why not just query Okta directly for that information?

To make this easy, you can start that query with the following:

1
2
3
4
5


data "okta_users" "active_users" {
 search {
 expression = "status eq \"ACTIVE\""
 }
}

But let’s say we have a new inbound employee in the pipeline who will be a part of a new team that has not yet been created in Okta. Well, then, we need to also search for Staged and Provisioned users:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


data "okta_users" "provisioned_users" {
 search {
 expression = "status eq \"PROVISIONED\""
 }
}

data "okta_users" "staged_users" {
 search {
 expression = "status eq \"STAGED\""
 }
}

That’s great. We now have a query of all Okta users with this data. How can we work with all of it simultaneously without worrying about individual resources?

The below helps us take information out.

1
2
3
4
5
6
7


locals {
 # Combine all users from different status queries into one list
 all_users = concat(
 data.okta_users.active_users.users,
 data.okta_users.provisioned_users.users,
 data.okta_users.staged_users.users
 )

This allows us to consolidate the list of data and start working off a single object. Which is much better for us in the long term as we begin to work with more significant amounts of data.

So now that we have this data consolidated in a way that we can read and is easy to work with, how do we start picking out the data we need to actually work with and manipulate? Remember that the search takes all Okta data, and we don’t care about it; it only takes bits and pieces.

And because we are also working with the requirement that a Key ID belong to and map/match to its corresponding Readable Name for the business structure, how do we ensure that the data doesn’t shift incorrectly?

We need to map the data.

1
2
3
4


attribute_pairs = [
    { level = "Cost Center", id_attribute = "companynameBkeyIdCostCenter", non_id_attribute = "companynameCostCenter", prefix = "companyname-cc" },
    { level = "Division", id_attribute = "companyNameBkeyIdDivision", non_id_attribute = "companyNameDivision", prefix = "companyName-div" },
  ]

The easier way to read this is below:

Level	ID Attribute	Non-ID Attribute	Prefix
Cost Center	companynameBkeyIdCostCenter	companynameCostCenter	companyname-cc
Division	companynameBkeyIdDivision	companynameDivison	companyname-div

The items in the first row are our key components, and each line represents what they are and where they should be placed within the mapping.

So now that we have our list of mappings and how we want to process and manipulate the data structure, we need to generate the data from those mappings.

We do that by running through extracting and manipulating the data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


 # Generate tuples for each attribute pair using the consolidated user list
 generate_tuples = distinct(flatten([
 for pair in local.attribute_pairs : [
 for user in local.all_users : {
 level                     = pair.level,
 attribute_name_id         = pair.id_attribute,
 attribute_name            = pair.non_id_attribute,
 attribute_id_value        = lookup(jsondecode(user.custom_profile_attributes), pair.id_attribute, ""),
 attribute_name_value      = lookup(jsondecode(user.custom_profile_attributes), pair.non_id_attribute, ""),
 prefix                    = pair.prefix,
 attribute_name_normalized = replace(replace(replace(replace(replace(replace(lookup(jsondecode(user.custom_profile_attributes), pair.non_id_attribute, ""), "[^a-zA-Z0-9 ]", ""), " ", ""), "&", ""), "-", ""), "_", ""), ",", ""),
 attribute_name_lowercase  = lower(replace(replace(replace(replace(replace(replace(lookup(jsondecode(user.custom_profile_attributes), pair.non_id_attribute, ""), "[^a-zA-Z0-9 ]", ""), " ", ""), "&", ""), "-", ""), "_", ""), ",", ""))
 }
 if lookup(jsondecode(user.custom_profile_attributes), pair.id_attribute, "") != ""
 && lookup(jsondecode(user.custom_profile_attributes), pair.non_id_attribute, "") != ""
 ]
 ]))
}

From there, we do a for_each loop over the generated tuple to output the desired results and groups.

The complete code

The complete section of the code is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98


data "okta_users" "active_users" {
  search {
    expression = "status eq \"ACTIVE\""
  }
}

data "okta_users" "locked_out_users" {
  search {
    expression = "status eq \"LOCKED_OUT\""
  }
}

data "okta_users" "password_expired_users" {
  search {
    expression = "status eq \"PASSWORD_EXPIRED\""
  }
}

data "okta_users" "provisioned_users" {
  search {
    expression = "status eq \"PROVISIONED\""
  }
}

data "okta_users" "staged_users" {
  search {
    expression = "status eq \"STAGED\""
  }
}

data "okta_users" "suspended_users" {
  search {
    expression = "status eq \"SUSPENDED\""
  }
}

locals {
  # Combine all users from different status queries into one list
  all_users = concat(
    data.okta_users.active_users.users,
    data.okta_users.locked_out_users.users,
    data.okta_users.password_expired_users.users,
    data.okta_users.provisioned_users.users,
    data.okta_users.staged_users.users,
    data.okta_users.suspended_users.users
  )

  # Define a list of attribute pairs to process
  attribute_pairs = [
    { level = "Cost Center", id_attribute = "companyNameBkeyIdCostCenter", non_id_attribute = "companyNameCostCenter", prefix = "companyName-cc" },
    { level = "Division", id_attribute = "companyNameBkeyIdDivision", non_id_attribute = "companyNameDivision", prefix = "companyName-div" },
  ]

  # Generate tuples for each attribute pair using the consolidated user list
  generate_tuples = distinct(flatten([
    for pair in local.attribute_pairs : [
      for user in local.all_users : {
        level                     = pair.level,
        attribute_name_id         = pair.id_attribute,
        attribute_name            = pair.non_id_attribute,
        attribute_id_value        = lookup(jsondecode(user.custom_profile_attributes), pair.id_attribute, ""),
        attribute_name_value      = lookup(jsondecode(user.custom_profile_attributes), pair.non_id_attribute, ""),
        prefix                    = pair.prefix,
        attribute_name_normalized = replace(replace(replace(replace(replace(replace(lookup(jsondecode(user.custom_profile_attributes), pair.non_id_attribute, ""), "[^a-zA-Z0-9 ]", ""), " ", ""), "&", ""), "-", ""), "_", ""), ",", ""),
        attribute_name_lowercase  = lower(replace(replace(replace(replace(replace(replace(lookup(jsondecode(user.custom_profile_attributes), pair.non_id_attribute, ""), "[^a-zA-Z0-9 ]", ""), " ", ""), "&", ""), "-", ""), "_", ""), ",", ""))
      }
      if lookup(jsondecode(user.custom_profile_attributes), pair.id_attribute, "") != ""
      && lookup(jsondecode(user.custom_profile_attributes), pair.non_id_attribute, "") != ""
    ]
  ]))
}

resource "okta_group" "dynamic_org_structure_groups" {
  for_each = {
    for tuple in local.generate_tuples : "${tuple.attribute_name}_${tuple.attribute_id_value}" => tuple
  }

  name        = "${each.value.prefix}-${each.value.attribute_name_lowercase}"
  description = "Contains all Employees located in the ${each.value.attribute_name_value} ${each.value.level} Organizational Structure."
  custom_profile_attributes = jsonencode({
    "adminNotes"   = "Created by TF - your description here",
    "groupOwner"   = "HR",
    "groupDynamic" = true
  })
}

resource "okta_group_rule" "dynamic_org_structure_group_rules" {
  for_each = {
    for tuple in local.generate_tuples : "${tuple.attribute_name}_${tuple.attribute_id_value}" => tuple
  }

  name              = substr("TF - Rule for ${each.value.prefix}-${each.value.attribute_name_lowercase}", 0, 49)
  status            = "ACTIVE"
  group_assignments = [okta_group.dynamic_org_structure_groups[each.key].id]
  expression_type   = "urn:okta:expression:1.0"
  expression_value  = "user.${each.value.attribute_name_id}==\"${each.value.attribute_id_value}\""
  users_excluded    = []
}

Info

You will see some custom Group Schema Attributes; you could remove them from the code, add them to your local group schema, or edit them to suit your needs.

Now, if we ever need to prevent the destruction of any of these groups or the group rules, we can always add the following into the resources blocks:

1
2
3
4



 lifecycle {
 prevent_destroy = true
 }

Meaning, we have a fully automated group management system that can only be destroyed if we desire to do so. If we need to be modular so that only certain group Hierarchies are not destroyed, we can manipulate the code base to have an “if present” statement and make a “True” or “False” option in the mapping for the level.

Additionally, we can always switch this from Group Rules, which have their own scaling problems within Okta, to manual group assignment, as we have all of the user data. If we do this, we must swap out the group rule assignments and create a more complex assignment system using the okta_group_membership resource. This will be covered in a different blog post once we approach that.

Update

Since writing this, we have improved a few things. For example, what if you need to also manually write in new business lines and teams?

1
2
3
4


variable "manual_entries" {
  default = [
  ]
}

Is all that is needed at the top, and then you would just concatenate the manual list of entries into the existing structure. This way, if a business line get’s added, we can manually remove the entry and keep it updated going forward automatically with the rest of the structure.

I won’t be able to post the full list of updates we have done, but, this should be sufficient to get you started.

And that is it

Thanks for taking the time to read this, be on the lookout next week for another blog post about automating and managing core Okta Groups.

A lot will be covered over the next several parts, which sums up how we have terraformed certain pieces of our Okta environment. If you have questions and are looking for a community resource, I would heavily recommend reaching out to #okta-terraform on MacAdmins, as I would say at least 30% (note, I made this statistic up) of the organizations using Terraform hang out in this channel. Otherwise, you can always find an alternative unofficial community for assistance or ideas.