Tooling
Terraform

Note: Image not supported yet!

Terraform

Terraform (opens in a new tab) is used to automate and scale infrastructure through IaC ( Infrastructure as Code ). In short it's an abstracted layer to interface with service APIs using HCL configuration syntax (opens in a new tab) very similar to JSON. For these services there are many provider (opens in a new tab) plugins which create that API layer which can be used to create infrastructure and manage those services. This makes it easier to manage as well as consume information concerned with infrastructure and it's various components. By having more accessible information of infrastructure it opens doors to automation and modularization of all resources managed by Terraform.

If a provider doesn't exist that fits a use case or service you can always write your own plugin (opens in a new tab).

Prerequisites

To make good use of this repository and what it can offer then you must meet these requirements and have these tools installed with working knowledge or expertise of their usage.

Toolset

Required

Optional:

Container tools

These tools are installed inside the container and can be leveraged in automation.

If you don't have those tools or aren't going to write lot of Terraform then see small change(s) section.

Skill Requirements

Aside from knowing how to use the tools listed in the prerequisites section you'll be expected to understand the services you are working with and how they are used. As well as how to resolve any issues that come up in the development lifecycle. This includes manually validating Terraform against actual state in the service. Finding resolution to conflicts in service resources, debugging permissions, etc. If you don't feel comfortable then ask for help. You can reach outh to #web-infrastructure-op (opens in a new tab), @cloudops, or @devops in Slack.

Reading Material

Working Concepts

The working concepts described within are critical to maintaining a foundation of core concepts which help make this project ( and ourselves ) successful with our infrascructure automation operations.

Self service

This project is meant to be enjoyed by all and should emobody the premise of self service as much as possible through a solid GitOps workflow. This empowers people by giving them insight and control to get things done in such a way that it enriches everyone involved directly and indirectly.

For instance if you need to give yourself access to something or change a resource you simply need to open a pull request to grant yourself access.

Respository Structure & key files

Terraform
├── .github # For managing codeowner groups to specific parts of the codebase
├── helpers # is for bin scripts which are used in the container
├── {service}
│   ├── {sub-service}
│   │   └── docs # Automated doc generation using 
├── .git-secrets # Regex strings/rules can be tweaked here to ensure that secrets don't sneak their way into version control
├── ci.sh # The runner used for terraform and it's tooling for the entire development lifecycle
├── Dockerfile # The primary container and tooling for this repository
├── Jenkinfile # Configured and runs the jenkins job for this repoistory outlines every step in the development lifecycle
├── lefthook.yml # Automates steps in workflow for ease of use
└── README.md # This doc ( the one you are reading ) contains overall project

Docker image

The docker image is an alpine linux (opens in a new tab) based image off of the official hashicorp (opens in a new tab) Terraform docker image with a few select tools to help ease of use. Alpine is a lighteweight linux distribution which is great for security and containerization.

Environments

PMC has the requirement to maintain multiple cloud environments simultaneously. Each environment is defined in the Jenkinsfile and is configured per AWS account. Each directory of terraform configuration references an environment setup script in helpers which matches the name of the aws account.

Each environment also runs with it's own set of credentials inside of the container so that there's no collision or conflict in context switching between accounts once the container has already been started.

GitOps workflow

GitOps is simply a workflow which centers it's workflow around git operations using {git|web}-hooks. It's a process which streamlines the lifecycle of a feature or bug in such a way that everything can be tracked by down to commits and other meta information. It's a flexible no nonsense approach to getting things done efficiently saving time and taking a lot of guesswork or mystery out of the process.

Development Lifecycle

The development lifecycle of a feature for this repository uses plans on feature branches and apply on a merge to main. Once the PR is approved you can merge into main. For environments that have a staging or development environment the resources changed will be applied to those environments.

Once the pull request for a feature branch has passed it's status checks for it's plan action then a pull request against main can be opened to merge into main.

If the status checks for a branch don't pass then it's up to you resolve and/or coordinate with ops to resolve any conflicts before a merge is possible.

If status checks pass and the pull request against main has been created then you may merge your feature to main.

State

All state is stored or should be configured to be stored remotely in S3 (opens in a new tab). These state files are versioned and specific to each run. Anytime you are working with state it's in a remote sense so you will need access to AWS in order to interact with state.

TFENV

tfenv is used for all configurations of terraform in this repository. The majority of things are ran with the latest default version defined in .terraform-version. This ensures most of our terraform is ran consistently and is up to date. You can in certain circumstances override that version by adding a .terraform-version inside of the directory your working in and it will use that version instead. This is helpful when certain terraform lags behind or you need to quickly validate or test a different version on a particular section of code.

Working process

The working process is going to be different depending on you needs and level of involvement you want/need to undertake. For small changes you can work entirely within GitHub while large changes will have a certain level of needs to be met prior to you being effective.

Docker container

Each environment/account a.k.a. TF_ENV is ran in a container self isolated from the other environments. This takes away any mishap to where they might conflict with other accounts. So each of those can be ran simultaneously and are on pre-commit defined in lefthook.yml.


Small change(s) 🐤

If you have just a small set of changes to make and don't want to worry about access or even cloning this repository then there is a way to make this as painless as possible.

  1. Login to GitHub
  2. Navigate to file and click edit
    1. You can edit files directly with a URL like these:
      1. https://github.com/penske-media-corp/terraform/edit/main/dns/example/cname.tf
      2. https://github.com/penske-media-corp/terraform/edit/{branch-name}/{path-to-file}
      3. Note: Image not supported yet!
  3. Submit PR and wait for both CI to complete and a review
  4. The team(s) responsible for the code will be automatically notified via Slack and added to your Pull Request according to ownership defined in .github/CODEOWNERS
    1. If you want to update/create a team of your own codeowners feel free to do so by simply updating the CODEOWNERS file and adding/removing members to teams defined in github-org/penske-media-corp or github-org/sheknows
  5. If your Terraform works passes plan on your branch then it can be merged into main almost immediately once it's passed status checks
  6. Merge to main 🥳

Large change(s) 🐓

  1. Clone this repo
    1. git clone https://github.com/penske-media-corp/terraform.git && cd terraform
  2. lefthook githooks ( optional but not as much fun or easy to work with )
    1. lefthook install
      1. If you don't want to use lefthook or you don't like it's configuration you may choose to either not install or override the yml config with your own.
  3. Create a feature|bug|fix|etc.
    1. git checkout -b ...
  4. Write your terraform
    1. Write the HCL you for the changes you need
      1. If you have general questions most of the Terraform documentation (opens in a new tab) is helpful for that
  5. Test your changes
    1. Access
      1. If you don't have keys or tokens to AWS then you cannot test the plan without pushing to your branch which can then be validated in Jenkins. Which is reasonable and achieved by committing often while polling your branch in Jenkins for pushed i.e. https://jenkins.sheknows.io/job/penske-media-corp/job/terraform/job/{BRANCH_NAME}/ (opens in a new tab)
      2. Take note of the current version or the version of Terraform being ran for the Terraform you are working with
    2. GitHooks
      1. If you've installed lefthook and aws-vault then all you need to do is git commit -m "your message" and it will run against a diff on TF_DIFF_BRANCH ( main=default ) and run only the terraform you've been working with. If all of the checks pass and the plan looks ok then all that's left to do is git push to your branch.
      2. If you have not installed lefthook or want to run manually you will have to emulate the commands in the lefthook.yml file
  6. Submit a PR by pushing your branch.
    1. If not apparent or needs details such as "please merge by..." or "infra does xyz" then please add those to the PR details for the reviewer.
    2. Review your own Terraform plan by clicking the details section in the PR comments. You may need to request RO access from Ops for Jenkins
    3. Once the plan and the code have been reviewed the reviewer indicate that you are/not ready to merge the PR
    4. Merge the PR
    5. terraform apply is ran for the main branch 1. if issues come up on a merge where Terraform doesn't have permission or a resource is in conflict then it will need to be manually resolved either by yourself or Ops

Running in bulk

Sometimes you want to run things in bulk when there are no git diff changes. This could be to validate state files, or apply changes in bulk when editing core templates||modules. For that there is a default BULK_PATH env var which defaults to ./ but say for instance there's a core change in fastly/vcl by default there's no script to appply to all sites that use that configuration because it's a template and has no state therefore it will not run by itself.

  • To run all fastly configs you would simply need to export the variable export BULK_PATH=./fastly && ./ci.sh pmc-it
  • To run for a particular fastly site you may run something like export BULK_PATH=./fastly/rollingstone && ./ci.sh pmc-it
  • To run for all dns on an account you may run something like export BULK_PATH=./dns && ./ci.sh pmc-it

Tips & Best practice

  • DO NOT alter the state of the container runtime environment variables in any of the bin scripts such as TF_COMMAND, AWS_ACCESS_KEY_ID this can be dangerous
  • Don't expect user input using variables at runtime and instead expect that the configuration should be run in automation without user interaction
  • Each stage of the Jenkinsfile has environment defined i.e. AWS_ACCESS_KEY_ID
  • Ensure that when working with a provider that you always specify a version
  • Ensure you have a VPN tunnel setup and working as a lot of the configurations use consul for keys or secrets
  • Environment defined variables should NOT be altered or changed inside of the docker-compose file itself instead override using the -e flag when running docker-compose ...
  • Gather your secrets for the environment you need. You can use the Jenkinsfile for reference on which variables you need
  • If importing existing infrastructure for aws this tool https://github.com/dtan4/terraforming#run-as-docker-container- seems to get the job done and is included in our docker container for terraform
  • If writing a shell script which is utilized by terraform it must in in sh format and not bash if the logic in sh is not enough then the script is too complicated
  • State even when working locally is stored remotely in S3
  • Think about the resources you're creating and the order you create them in
  • Use default environment configuration when available such as the defaults TF_* that come with Terraform as well as the provider environment vars such as GITHUB_TOKEN instead of introducing a new var into the env such as GH_TOK
  • .gitattributes contains files which are locked with git lfs. More information can be found here (opens in a new tab)

Access

Before you're able to run Terraform at all you need credentials for AWS to read/write the remote state. Ensure that the credentials you're working with are specific to running Terraform and not your personal credentials. You may need API keys for other services that you're working with. Please refer to the docs directory within your the Terraform that you're working with.

AWS

If you aren't planning on testing locally or worrying about credentials then you DO NOT need credentials and everything can be ran through webhooks in GitHub. However, if you are developing a feature, importing resources, or making a larger set of changes then access and running in developent is in your best interest to setup an AWS user with readonly access for plans. If you need to do more than run a plan such as deploying or importing resources then you will need a higher level of access. For access which is required outside of automation please reach out in #web-infrastructure-op (opens in a new tab) or #all-it (opens in a new tab) to help point you in the right direction.

AWS Vault ( optional )

It's recommended that you use aws-vault to manage your aws account credentials. This will ensure more secure switching between accounts without compromising sensitive login information. You can then specify the environment to run on. If you need to work with multiple account credentials. Then spin up a separate TF container for each AWS account.

Access

As a basic requirement then you will need to have service api keys as well as AWS access keys for running Terraform. You may obtain RO keys for AWS the aws terraforming user from Ops. AWS access is broken down into multiple accounts. SK, PMC, & IT with dev/prod accounts for each. When setting up your account consider using an otp or virtual mfa for the account.

Git Hooks and secrets

Secrets should never be committed into git ever. There is a lefthook (opens in a new tab) configuration available which uses git secrets (opens in a new tab) to detect when sensitive credentials are added to git. To make that available install both lefthook and git secrets and it should just work after that.

**If you notice a secret pattern to add please follow the guide on their github page to add a pattern. There is also already file for it .git-secrets

Fixing broken or adding new run scripts for CI

Each directory with an aws account script has a symlinked shell script from the helpers directory for each account it supports infrastructure on. These are fairly generic and if that doesn't meet the use case then they can be easily overridden by adding a shell script with the same name you want to overwrite. Existing but broken infrastructure files exist as prefixed files fix-* and can be fixed 1x1 by creating a branch removing the prefix and pushing to the repository.

State validation and downstream manual runs

Note: Image not supported yet! To validate state on a particular directory you can build the job with params. There are two params for state validation which are relevant. The first is a checkbox for state validation that when checked will essentially run a plan. The second is the BULK_PATH variable. By default the BULK_PATH variable is set to ./ which is the root of the repository and will run against all accounts on all directories. To target a particular directory for state validation ensure that you've checked the STATE_VALIDATION option so that it runs a plan and set the BULK_PATH to something like ./github-org/wpcomvip and it will run against that directory and any sub directories.

Ensure that the params are correct for what you want to run or it could have an adverse affect that's irreversible or at least cause a 911.

VALIDATE_STATE==true = terraform plan BULK_PATH==./ = repository root

To run an apply you would leave the VALIDATE_STATE checkbox blank and set a path ./fastly or ./github-org/wpcomvip etc. It shouldn't run an apply against the root of the repo which is a safety of sorts.

The relevant logic is here (opens in a new tab) and here (opens in a new tab)

The logic here is that everything by default runs a plan and environment state is never altered once the container is started. This ensures that a logic error between plan &* apply doesn't happen or at least is less likely to happen as an accident.