AWS Deployment¶
This guide covers deploying Opifex models on Amazon Web Services. Opifex provides Python modules for generating AWS infrastructure configurations -- EKS clusters, IAM roles, VPC networking, Secrets Manager, and CloudWatch monitoring. You use these modules to produce configuration dictionaries or export Terraform files, then apply them with your own infrastructure tooling.
Table of Contents¶
- Overview
- Prerequisites
- Container Image
- AWSDeploymentManager
- Kubernetes Manifests
- Deployment Workflow
- Monitoring
- Troubleshooting
Overview¶
The deployment infrastructure is provided as Python modules, not as ready-made YAML manifests or Helm charts. The key components are:
| Module | Purpose |
|---|---|
opifex.deployment.cloud.aws |
AWSDeploymentManager and AWSConfig for EKS/VPC/IAM configuration generation |
opifex.deployment.kubernetes |
ManifestGenerator for producing K8s Deployment, Service, and Ingress YAML |
opifex.deployment.server |
FastAPI model serving server |
opifex.deployment.core_serving |
InferenceEngine, ModelRegistry, DeploymentConfig |
Source: src/opifex/deployment/cloud/aws.py
Prerequisites¶
- An AWS account with permissions for EKS, EC2, VPC, IAM, CloudWatch
- AWS CLI v2 configured (
aws configure) kubectlandeksctlinstalled- Docker for building container images
- Python 3.12+ with
uvandopifexinstalled locally
Container Image¶
The project Dockerfile builds a GPU-ready image based on nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04. It installs Python 3.12, uv, and all project dependencies.
# Build locally
docker build -t opifex:latest .
# Tag and push to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com
docker tag opifex:latest <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/opifex:latest
docker push <ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/opifex:latest
The default CMD runs the test suite. Override at runtime to start the serving API:
AWSDeploymentManager¶
AWSDeploymentManager generates configuration dictionaries for AWS infrastructure. It does not call AWS APIs directly -- you export the configurations and apply them with Terraform, CDK, or the AWS CLI.
AWSConfig¶
from opifex.deployment.cloud.aws import AWSConfig, AWSDeploymentManager
config = AWSConfig(
region="us-east-1",
cluster_name="opifex-cluster",
vpc_cidr="10.0.0.0/16",
# Override defaults:
node_group_config={
"desired_size": 3,
"max_size": 10,
"min_size": 1,
"instance_types": ["p3.2xlarge"], # GPU instances for model serving
"disk_size": 100,
"capacity_type": "ON_DEMAND",
"ami_type": "AL2_x86_64_GPU",
},
)
AWSConfig fields:
region-- AWS region (default:us-east-1)cluster_name-- EKS cluster name (default:opifex-cluster)vpc_cidr-- VPC CIDR block (default:10.0.0.0/16)node_group_config-- dict withdesired_size,max_size,min_size,instance_types,disk_size,capacity_type,ami_typenetwork_config-- dict withavailability_zones,private_subnets,public_subnets,enable_nat_gateway,enable_dns_hostnamessecurity_config-- dict withenable_logging,log_types,enable_private_access,enable_public_access,public_access_cidrs
Configuration Generation Methods¶
manager = AWSDeploymentManager(config)
# Each method returns a dict suitable for Terraform, CloudFormation, or inspection
eks_config = manager.generate_eks_cluster_config()
node_config = manager.generate_node_group_config()
iam_roles = manager.generate_iam_roles()
vpc_config = manager.generate_vpc_config()
secrets_config = manager.generate_secrets_manager_config()
cloudwatch_config = manager.generate_cloudwatch_config()
| Method | Returns |
|---|---|
generate_eks_cluster_config() |
EKS cluster definition (version, VPC config, encryption, logging) |
generate_node_group_config() |
Node group with scaling, instance types, AMI |
generate_iam_roles() |
Cluster role, node role, and service role with inline policies |
generate_vpc_config() |
VPC, subnets (public/private), NAT gateways, security groups |
generate_secrets_manager_config() |
Secrets for database passwords, API keys, OAuth credentials |
generate_cloudwatch_config() |
Log groups, CPU/memory alarms, dashboard definition |
Export as Terraform¶
from pathlib import Path
manager.export_terraform_config(Path("./terraform-aws"))
# Creates:
# terraform-aws/main.tf (provider + resources as JSON)
# terraform-aws/variables.tf (region, cluster_name, instance_types, desired_capacity)
# terraform-aws/outputs.tf (cluster_endpoint, CA cert, VPC ID, security group ID)
Then apply with standard Terraform:
Kubernetes Manifests¶
Use ManifestGenerator from opifex.deployment.kubernetes to programmatically generate Kubernetes YAML for Deployment, Service, and Ingress resources.
from pathlib import Path
from opifex.deployment.kubernetes import ManifestGenerator
gen = ManifestGenerator(
namespace="opifex",
app_name="opifex-api",
image="<ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/opifex:latest",
)
# Generate individual manifests as dicts
deployment = gen.generate_deployment(
replicas=3,
cpu_request="1",
memory_request="4Gi",
cpu_limit="4",
memory_limit="8Gi",
port=8080,
environment_variables={
"JAX_PLATFORMS": "gpu",
"OPIFEX_PORT": "8080",
"OPIFEX_WORKERS": "2",
},
)
service = gen.generate_service(port=8080, target_port=8080, service_type="LoadBalancer")
ingress = gen.generate_ingress(host="opifex.example.com", service_port=8080)
# Or export all manifests to YAML files at once
gen.export_all_manifests(Path("./k8s-manifests"), replicas=3, service_port=8080)
The generated deployment includes:
- Liveness and readiness probes pointing to
/health - Default JAX environment variables (
JAX_PLATFORMS,XLA_PYTHON_CLIENT_MEM_FRACTION) - Resource requests and limits
- Optional
nodeSelectorfor GPU node targeting
Source: src/opifex/deployment/kubernetes/manifest_generator.py
Additional Kubernetes modules:
AutoScaler-- HPA/VPA configurationResourceManager-- namespace and quota managementKubernetesOrchestrator-- orchestration for production deployments
Source: src/opifex/deployment/kubernetes/__init__.py
Deployment Workflow¶
A typical end-to-end deployment:
- Build and push the container image to ECR (see Container Image)
- Generate AWS infrastructure using
AWSDeploymentManager.export_terraform_config()and apply with Terraform - Configure kubectl for your EKS cluster:
- Generate and apply K8s manifests:
# Generate from Python python -c " from pathlib import Path from opifex.deployment.kubernetes import ManifestGenerator gen = ManifestGenerator('opifex', 'opifex-api', '<ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/opifex:latest') gen.export_all_manifests(Path('./k8s-manifests'), replicas=3, service_port=8080) " # Apply kubectl create namespace opifex kubectl apply -f k8s-manifests/ - Verify the deployment:
GPU Node Configuration¶
For GPU workloads on EKS, use a GPU-enabled node group (p3.2xlarge, p3.8xlarge, or g4dn.* instances) with the AL2_x86_64_GPU AMI type. Install the NVIDIA device plugin:
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml
Then add GPU resource requests to your deployment manifest:
deployment = gen.generate_deployment(
environment_variables={"JAX_PLATFORMS": "gpu"},
node_selector={"accelerator": "nvidia-gpu"},
)
# Manually add GPU limits to the generated dict if needed:
deployment["spec"]["template"]["spec"]["containers"][0]["resources"]["limits"]["nvidia.com/gpu"] = "1"
Monitoring¶
AWSDeploymentManager.generate_cloudwatch_config() produces:
- Log groups:
/aws/eks/<cluster>/cluster(30-day retention) and/aws/eks/<cluster>/application(7-day retention) - Alarms: High CPU (>80%) and high memory (>85%) with 5-minute evaluation periods
- Dashboard: CloudWatch dashboard with CPU and memory utilization widgets
The FastAPI server also exposes a /metrics endpoint returning request count, average latency, throughput, and uptime.
Troubleshooting¶
EKS Cluster Access¶
# Verify AWS identity
aws sts get-caller-identity
# Update kubeconfig
aws eks update-kubeconfig --region us-east-1 --name opifex-cluster
# Check nodes are ready
kubectl get nodes -o wide
GPU Not Available in Pods¶
# Verify NVIDIA device plugin is running
kubectl get pods -n kube-system -l name=nvidia-device-plugin-ds
# Check GPU allocation on nodes
kubectl describe nodes | grep -A5 "nvidia.com/gpu"
# Verify JAX sees GPU inside a pod
kubectl exec -it <pod-name> -n opifex -- python -c "import jax; print(jax.devices())"