This post is what I wish I read before recently working on replacing the SSH access to the bastion in my terraform project with SSM & EC2 Instance Connect.
Motivation
I'm a bit paranoid about security. Not as paranoid as Amelia who's a cybersecurity consultant but paranoid nonetheless.
I didn't always care about security until I got burnt.
7 years ago, while working for an early-stage startup, I woke up one day to find an email from Digital Ocean about a compromised server being shut down by their team.
An attacker had identified an insecure port and gained access to the server. They had integrated it with a botnet, connecting it to a C&C server.
I was new to managing servers then, DigitalOcean didn't have their firewall feature and I didn't even know about firewalls anyway.
I was determined to protect our servers from future attacks so I spent the next few days learning about security, IP-tables, firewalls, and general server hardening.
I've come a long way from my self-taught server hardening crash course but I still have a strong drive for security and that's what has driven this exercise.
My use case
I had a terraform configuration with:
- An ECS cluster in a private subnet behind an ALB.
- A Postgres RDS instance in a private subnet.
- A bastion in a public subnet.
With this configuration, I would connect to the database, and EC2 instances using SSH tunneling via the bastion.
The networking module
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 2.69.0"
name = "staff-advocacy-vpc"
cidr = var.vpc_cidr
azs = var.availability_zones
# ALB & Bastion
public_subnets = var.public_subnets_cidr
# ECS CLUSTER
private_subnets = var.private_subnets_cidr
# RDS CLUSTER
database_subnets = var.database_subnets_cidr
# DNS
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Terraform = "true"
Environment = var.environment
}
}
The database module
locals {
create_test_resources = true
}
module "db" {
source = "terraform-aws-modules/rds/aws"
version = "2.20.0"
vpc_security_group_ids = [module.postgres_security_group.this_security_group_id]
create_db_subnet_group = false
db_subnet_group_name = local.create_test_resources ? var.subnet_group_name : ""
username = var.database_username
password = var.database_password
port = var.database_port
identifier = var.identifier
name = var.database_name
engine = "postgres"
engine_version = var.database_engine_version
create_db_option_group = false
create_db_parameter_group = false
allocated_storage = db_storage
instance_class = var.db_instance_class
maintenance_window = var.db_maintenance_window
backup_window = var.db_backup_window
tags = {
Terraform = "true"
Environment = var.environment
}
}
module "postgres_security_group" {
source = "terraform-aws-modules/security-group/aws//modules/postgresql"
version = "~> 3.0"
name = "${var.identifier}-sg"
vpc_id = var.vpc_id
# using computed_* here to get around count issues.
ingress_cidr_blocks = var.vpc_cidr_block
computed_ingress_cidr_blocks = var.vpc_cidr_block
number_of_computed_ingress_cidr_blocks = 1
ingress_rules = ["postgresql-tcp"]
egress_cidr_blocks = ["0.0.0.0/0"]
egress_rules = ["http-80-tcp", "https-443-tcp"]
tags = {
Terraform = "true"
Name = "${var.environment}-rds-sg"
Environment = var.environment
}
}
The bastion module
module "bastion" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "2.16.0"
ami = "ami-0e2e14798f7a300a1"
name = var.name
associate_public_ip_address = true
instance_type = "t2.small"
vpc_security_group_ids = [module.bastion_security_group.this_security_group_id]
subnet_ids = var.vpc_public_subnets
key_name = var.bastion_key_name
}
module "bastion_security_group" {
source = "terraform-aws-modules/security-group/aws"
version = "3.1.0"
name = "${var.name}-sg"
vpc_id = var.vpc_id
ingress_cidr_blocks = ["0.0.0.0/0"]
ingress_rules = ["ssh-tcp"]
egress_cidr_blocks = ["0.0.0.0/0"]
egress_rules = ["postgresql-tcp", "http-80-tcp", "https-443-tcp"]
}
Fargate cluster
The full cluster module includes a lot more than a cluster, and alb resource definition but all of this is irrelevant here.
module "ecs_cluster" {
source = "terraform-aws-modules/ecs/aws"
name = "${var.name}-${var.environment}"
container_insights = true
capacity_providers = ["FARGATE", "FARGATE_SPOT"]
default_capacity_provider_strategy = [{
capacity_provider = "FARGATE"
weight = "1"
}]
tags = {
Environment = var.environment
}
}
resource "aws_lb" "main" {
name = "${var.name}-alb-${var.environment}"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = var.vpc_public_subnets
enable_deletion_protection = false
tags = {
Name = "${var.name}-alb-${var.environment}"
Environment = var.environment
}
}
resource "aws_lb" "main" {
name = "${var.name}-alb-${var.environment}"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = var.vpc_public_subnets
enable_deletion_protection = false
tags = {
Name = "${var.name}-alb-${var.environment}"
Environment = var.environment
}
}
resource "aws_alb_target_group" "main" {
name = "${var.name}-tg-${var.environment}"
port = 80
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "ip"
health_check {
healthy_threshold = "3"
interval = "30"
protocol = "HTTP"
matcher = "200"
timeout = "3"
path = var.health_check_path
unhealthy_threshold = "2"
}
tags = {
Name = "${var.name}-tg-${var.environment}"
Environment = var.environment
}
}
# Redirect to https listener
resource "aws_alb_listener" "http" {
load_balancer_arn = aws_lb.main.id
port = 80
protocol = "HTTP"
default_action {
type = "redirect"
redirect {
port = 443
protocol = "HTTPS"
status_code = "HTTP_301"
}
}
}
# Redirect traffic to target group
resource "aws_alb_listener" "https" {
load_balancer_arn = aws_lb.main.id
port = 443
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-2016-08"
certificate_arn = var.alb_tls_cert_arn
default_action {
target_group_arn = aws_alb_target_group.main.id
type = "forward"
}
}
To connect to my database, I use local ssh port forwarding:
# Run in the foreground
ssh -N ubuntu@<bastion_ip> -L 8888: rds-db.weoweio.ap-southeast-2.rds.amazonaws.com:5432
# Run in the background
ssh -Nf ubuntu@<bastion_ip> -L 9999: rds-db.weoweio.ap-southeast-2.rds.amazonaws.com:5432
This setup has always worked, and with fail2ban/Guardduty and IP whitelist via security groups, it works great.
Challenges with this configuration
- I have to keep OpenSSH updated to mitigate emerging attacks that target vulnerabilities in outdated versions.
- I have to manage ssh keys. ssh key management is hard, especially with large teams.
- Auditing ssh sessions is a painful task.
- Keeping public IP/DNS increases my attack surface.
The solution
EC2 Instance Connect
When I read about EC2 Instance Connect in 2019 as a way to connect to an ec2 instance with temporary SSH keys, I was excited.
EC2 Instance Connect lets me manage SSH access via IAM.
It works by allowing a user to send temporary public keys to the EC2 instance using a CLI, the user then has 60seconds to authenticate using the private key.
So instead of using long-term SSH keys that live on the bastion, I can use temporary/disposable keys that are automagically discarded after 60seconds.
SSM
Using EC2 Instance Connect helps eliminate most of the problems listed above except the last one.
Using the AWS systems manager StartSSHSession
document, I'm able to take the security a notch higher by eliminating the public DNS on the bastion.
How to do this in terraform
To do this in terraform, I have to :
- Configure IAM permissions for the bastion — Add an IAM instance profile that includes 2 policies.
- arn:aws:iam::aws:policy/EC2InstanceConnect
- arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
- Install EC2 instance connect on the bastion or update my AMI. I found that EC2 instance connect comes pre-installed with certain AMIs (Amazon Linux > 2 2.0.20190618 and Ubuntu > 20.04 ).
- Install
ec2_instance_connect CLI
on my local environment. - Install SSM plugin for the AWS CLI to my local environment. Reference here
And some cleanup steps:
- Remove the SSH key pair on the bastion.
- Remove the SSH ingress rules from my bastion security group.
- Move the bastion from the public subnet to a private subnet.
The changes to the bastion module
module "bastion" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "2.16.0"
ami = "ami-0e2e14798f7a300a1"
name = var.name
associate_public_ip_address = true
instance_type = "t2.small"
vpc_security_group_ids = [module.bastion_security_group.this_security_group_id]
# Move the bastion into a private subnet
subnet_ids = var.vpc_private_subnets
# remove the redundant ssh key pair
key_name = var.bastion_key_name
}
module "bastion_security_group" {
source = "terraform-aws-modules/security-group/aws"
version = "3.1.0"
name = "${var.name}-sg"
vpc_id = var.vpc_id
# remove redundant ssh ingress rule
ingress_cidr_blocks = ["0.0.0.0/0"]
ingress_rules = ["ssh-tcp"]
egress_cidr_blocks = ["0.0.0.0/0"]
egress_rules = ["postgresql-tcp", "http-80-tcp", "https-443-tcp"]
}
module ec2_connect_role_policy {
source = "terraform-aws-modules/iam/aws//modules/iam-assumable-role"
version = "~> 3.7.0"
role_name = "${var.name}-ec2-connect-role"
role_requires_mfa = false
create_role = true
create_instance_profile = true
trusted_role_services = ["ec2.amazonaws.com"]
custom_role_policy_arns = ["arn:aws:iam::aws:policy/EC2InstanceConnect", "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"]
}
SSH Tunneling
To set up the ssh tunnel I:
- Generate a temporary ssh key.
- Use ec2-instance-connect to upload the key to the bastion
- SSH to the server over session manager using SSH ProxyCommand and the AWS CLI SSM plugin
# Generate a temporary ssh key.
ssh-keygen -t rsa -f /ssh_key -N ''
# Use ec2-instance-connect to upload the key to the bastion
aws ec2-instance-connect send-ssh-public-key --instance-id <instance_id> --instance-os-user <os_user> --availability-zone <az> --ssh-public-key file:///ssh_key.pub
#ssh to the server over session manager using the aws cli SSM plugin
ssh <os_user>@<instance_id> -i /ssh_key -Nf \
-L 9999:<rds_endpoint> \
-o "StrictHostKeyChecking=no" \
-o "UserKnownHostsFile=/dev/null" \
-o ProxyCommand="aws ssm start-session --target %h --document AWS-StartSSHSession --parameters portNumber=%p --region=<region>"
My primary concern here is rds access. I haven't had the need to ssh into a fargate managed container in the ECS cluster but I imagine it would work the same way.