What is a Landing Zone?
According to Google’s best practice and reference designs, a landing zone provides an Infrastructure-as-Code (IaC) driven approach to creating an organisation in GCP, and for subsequently deploying tenants and their resources to this GCP organisation. It enforces best practices and automation, whilst delegating appropriate levels of resource control to tenants.
Side note: What is a tenant? A tenant is an independent consumer of the platform. The tenant could be an individual, but is more likely to be a team that is responsible for an application, or a related set of applications.
Why Do We Want One?
Landing zones are a way for an organisation / enterprise to build their Google Cloud environment in a structured and consistent way, following a load of proven best practice. It ensures that all the tenants that run on the landing zone avoid re-inventing the wheel, are using appropriate shared components, are adhering to agreed policies, and are only building their environments using approved IaC routes.
- Avoidance of unmanaged (Google) project sprawl. I.e. because projects are deployed within standard “tenant” folder, with standard naming conventions, and with a standardised approach to labelling resources.
- Avoidance of unnecessary complexity. I.e. because all solutions align to a prescribed way of doing things.
- Speed, through automation and repeatability.
- Reliability, through automation, through immutable infrastructure, and through providing a common approach to monitoring, logging and alerting.
- Your solutions are all hosted in an environment that provides a common approach to security, access control, and patching. Thus, a stronger security posture, and compliance is more easily achieved.
LZ In More Detail
When you deploy a landing zone, you get something like this…
When you deploy the LZ, you get…
- A repeatable and consistent way to deploy cloud services in your enterprise, using a standardised set of tools and infrastructure-as-code (IaC). Or, to put it another way… it stops everyone reinventing the wheel, and limits product sprawl. Curated and endorsed design blueprints are implemented as IaC.
- IaC capability in the form of:
- A tenant factory: i.e. the ability to create a top-level folder for a tenant, and associated service account for the tenant.
- A project factory: i.e. the ability for each tenant to create its own projects, using its own service account.
- Where resources can only be deployed using the service account and IaC, with the exception of sandbox projects where experimentation is possible.
- All driven by a CI/CD toolchain.
- Enforced use of infrastructure automation in order to ensure consistency and agility, to prevent configuration drift, and to align to the principles of automation and immutable infrastructure. Or, to put it another way… It means you get what you expected, everytime! If you let engineers build environments using the Console, then you’ve lost any shot at consistency, and you’ve eliminated much of the benefit of using cloud. The LZ ensures that your resources can only be deployed using IaC, via service accounts.
- A complete organisational hierarchy that supports:
- The creation of multiple tenants, where each tenant is a consumer of the platform with a clearly defined boundary, isolated from other tenants, and with autonomy to create and manage their own resources (to an appropriate degree).
- The creation of (Google Cloud) projects within each tenancy.
- A typical organisation resource hierarchy for an enterprise landing zone, is shown below. In this example, top level infrastructure management interests (i.e. the purview of the “Platform Team”) are separated into a folder called “org-mng”. Whereas all tenant resourcs are deployed under a top level folder called “LZiaB”. (This is just LZ-in-a-Box!) Prod is for production workloads, Non-PRD is for any non-prod workload (e.g. QA, Integration, Perf, etc), and Sandbox is for experimentation and circumvents the need for resources to be created only using IaC and service accounts.
- A default set of organisation policies, which are aligned to best practices, and inherited down the resource hierarchy. These are used to enforce security policies. Typical policies might include:
- Preventing the creation of “default” networks. (Because most organisations don’t need a default subnet in every region!)
- Preventing the use external IP addresses on compute instance. (Because this is a major security risk.)
- Enforced use of OS Login, to ensure that SSH access to instances is done using IAM-controlled Google identities and with secure management of SSH keys.
- Only allowing use of approved CIS-compliant shielded VMs.
- A predefined network topology
- For example, we might create a network topology that relies on a shared VPC where solutions are deployed within service projects
- Or we might create a network topology that leverages the hub-and-spoke pattern. With hub-and-spoke, we provide a shared VPC as a hub, but tenants deploy resources to separate spoke VPCs. These VPCs are connected to the hub using network peering, or VPN.
- The shared VPC component of the network topology hosts resources that need to be shared by multiple tenants running on your LZ. For example:
- Any dedicated or partner interconnect that you may choose to implement, in order to provide private high-bandwidth low-latency RFC1918 communication to your on-premises network.
- Any centralised network security controls, e.g. ingress and egress patterns.
TL;DR… It allows an organisation to get speed, repeatability, consistency, security, common experience, and a bunch of best practice.
How To Build a LZ?
There are a couple of ways you can do this.
Roll Your Own
Here I’ve created a simple demo where I create a very basic landing zone from scratch, using my own custom Terraform. It produces an organisation hierarchy that looks like this:
- The initial Google Cloud foundational steps are performed by hand.
- Then we run some bash scripts to create the Tenant / Project Factory.
- Use TF to deploy shared services and the CI/CD pipeline.
- Use the TF Tenant / Project factory to spin a sample tenant, with a sample application.
It’s a bit primitive, and there are better ways!
Use Google’s Terraform Example Foundation
Google have created many Terraform modules as part of the CFT. The TEF is used to create an LZ, by composing many of these modules in a configurable way, and by setting a number of programatic defaults.
The TEF is designed to be executed by following a number of documented steps in order. I.e.
- Boostrap → Seeding project (TF hosting and service accounts) + CI/CD Pipeline
- Organisation → Folder and project hierarchy
- Environments → Folder and projects for monitoring, secrets, shared and networking
- Networking → E.g. Shared VPC or Hub-and-Spoke, FW, etc.
- Tenants → Sample folders and service projects
But before you jump in and start implementing that…
Google’s Cloud Foundation Fabric and Fabric FAST
The Cloud Foundation Fabric provides a set of Terraform modules, a landing zone blueprint, a set of reference blueprints to achieve certain goals (e.g. working with Cloud SQL, Dataflow and BigQuery, GKE, etc), and FAST.
FAST is a production-ready landing zone blueprint implementation. It is a Terraform-based solution to bootstrapping and building a GCP LZ, from scratch.
It has a load of predefined defaults, but which can easily be configured and overridden. FAST is more automated than the TEF and has fewer manual steps to trip you up. Additionally, FAST has been seeing much stronger contributions of late. This makes me think that TEF is effectively deprecated, in favour of FAST.
So, if you want to bootstrap and build your own Google Cloud LZ, I’d recommend using FAST.