GCP Deployment¶
This guide covers deploying Opifex models on Google Cloud Platform. Opifex provides Python modules for generating GCP infrastructure configurations -- GKE clusters, Cloud IAM, VPC networking, Secret Manager, and Cloud Monitoring. You use these modules to produce configuration dictionaries or export Terraform files, then apply them with your own infrastructure tooling.
Table of Contents¶
- Overview
- Prerequisites
- Container Image
- GCPDeploymentManager
- Kubernetes Manifests
- Deployment Workflow
- Monitoring
- Troubleshooting
Overview¶
The deployment infrastructure is provided as Python modules, not as ready-made YAML manifests or Helm charts. The key components are:
| Module | Purpose |
|---|---|
opifex.deployment.cloud.gcp |
GCPDeploymentManager and GCPConfig for GKE/VPC/IAM configuration generation |
opifex.deployment.kubernetes |
ManifestGenerator for producing K8s Deployment, Service, and Ingress YAML |
opifex.deployment.server |
FastAPI model serving server |
opifex.deployment.core_serving |
InferenceEngine, ModelRegistry, DeploymentConfig |
Source: src/opifex/deployment/cloud/gcp.py
Prerequisites¶
- A GCP project with billing enabled
- Google Cloud SDK (
gcloud) configured (gcloud auth login) kubectlinstalled (gcloud components install kubectl)- Docker for building container images
- Python 3.12+ with
uvandopifexinstalled locally
Container Image¶
The project Dockerfile builds a GPU-ready image based on nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04. It installs Python 3.12, uv, and all project dependencies.
# Build locally
docker build -t opifex:latest .
# Tag and push to GCR
docker tag opifex:latest gcr.io/<PROJECT_ID>/opifex:latest
docker push gcr.io/<PROJECT_ID>/opifex:latest
# Or use Artifact Registry
docker tag opifex:latest <REGION>-docker.pkg.dev/<PROJECT_ID>/opifex/opifex:latest
docker push <REGION>-docker.pkg.dev/<PROJECT_ID>/opifex/opifex:latest
The default CMD runs the test suite. Override at runtime to start the serving API:
GCPDeploymentManager¶
GCPDeploymentManager generates configuration dictionaries for GCP infrastructure. It does not call GCP APIs directly -- you export the configurations and apply them with Terraform, Deployment Manager, or gcloud commands.
GCPConfig¶
from opifex.deployment.cloud.gcp import GCPConfig, GCPDeploymentManager
config = GCPConfig(
project_id="my-opifex-project",
region="us-central1",
zone="us-central1-a",
cluster_name="opifex-cluster",
# Override defaults:
node_pool_config={
"initial_node_count": 3,
"machine_type": "n1-standard-8",
"disk_size_gb": 100,
"auto_scaling": {"min_node_count": 1, "max_node_count": 10},
"gpu_config": {
"accelerator_type": "nvidia-tesla-t4",
"accelerator_count": 1,
},
},
)
GCPConfig fields:
project_id(required) -- GCP project IDregion-- GCP region (default:us-central1)zone-- GCP zone (default:us-central1-a)cluster_name-- GKE cluster name (default:opifex-cluster)node_pool_config-- dict withinitial_node_count,machine_type,disk_size_gb,auto_scaling,gpu_confignetwork_config-- dict withnetwork,subnetwork,authorized_networks,enable_private_nodessecurity_config-- dict withenable_workload_identity,enable_network_policy,enable_pod_security_policy,master_authorized_networks
GPU Recommendations¶
The Dockerfile uses CUDA 12.4, which requires compute capability 5.0+. Recommended GPUs:
| GPU | Accelerator Type | Use Case |
|---|---|---|
| T4 | nvidia-tesla-t4 |
Cost-effective inference |
| V100 | nvidia-tesla-v100 |
Training and inference |
| A100 | nvidia-a100-80gb |
Large-scale training |
Note: K80 GPUs are incompatible with CUDA 12.x and should not be used.
Configuration Generation Methods¶
manager = GCPDeploymentManager(config)
# Each method returns a dict suitable for Terraform or inspection
gke_config = manager.generate_gke_cluster_config()
iam_policy = manager.generate_iam_policy()
vpc_config = manager.generate_vpc_config()
secrets_config = manager.generate_secret_manager_config()
monitoring_config = manager.generate_monitoring_config()
| Method | Returns |
|---|---|
generate_gke_cluster_config() |
GKE cluster definition (node config, networking, workload identity, private cluster) |
generate_iam_policy() |
IAM bindings for container developer, secret accessor, monitoring editor, log writer |
generate_vpc_config() |
VPC, subnet with secondary IP ranges, firewall rules (internal, SSH, API) |
generate_secret_manager_config() |
Secret definitions with replication policies |
generate_monitoring_config() |
Notification channels, alert policies (CPU >80%, memory >90%) |
Export as Terraform¶
from pathlib import Path
manager.export_terraform_config(Path("./terraform-gcp"))
# Creates:
# terraform-gcp/main.tf (provider + resources as JSON)
# terraform-gcp/variables.tf (project_id, region, zone, cluster_name)
# terraform-gcp/outputs.tf (cluster_endpoint, CA cert, VPC name, subnet name)
Then apply with standard Terraform:
cd terraform-gcp
terraform init
terraform plan -var="project_id=my-opifex-project"
terraform apply -var="project_id=my-opifex-project"
Kubernetes Manifests¶
Use ManifestGenerator from opifex.deployment.kubernetes to programmatically generate Kubernetes YAML for Deployment, Service, and Ingress resources.
from pathlib import Path
from opifex.deployment.kubernetes import ManifestGenerator
gen = ManifestGenerator(
namespace="opifex",
app_name="opifex-api",
image="gcr.io/<PROJECT_ID>/opifex:latest",
)
# Generate individual manifests as dicts
deployment = gen.generate_deployment(
replicas=3,
cpu_request="1",
memory_request="4Gi",
cpu_limit="4",
memory_limit="8Gi",
port=8080,
environment_variables={
"JAX_PLATFORMS": "gpu",
"OPIFEX_PORT": "8080",
"OPIFEX_WORKERS": "2",
},
)
service = gen.generate_service(port=8080, target_port=8080, service_type="LoadBalancer")
ingress = gen.generate_ingress(host="opifex.example.com", service_port=8080)
# Or export all manifests to YAML files at once
gen.export_all_manifests(Path("./k8s-manifests"), replicas=3, service_port=8080)
The generated deployment includes:
- Liveness and readiness probes pointing to
/health - Default JAX environment variables (
JAX_PLATFORMS,XLA_PYTHON_CLIENT_MEM_FRACTION) - Resource requests and limits
- Optional
nodeSelectorfor GPU node targeting
Source: src/opifex/deployment/kubernetes/manifest_generator.py
Additional Kubernetes modules:
AutoScaler-- HPA/VPA configurationResourceManager-- namespace and quota managementKubernetesOrchestrator-- orchestration for production deployments
Source: src/opifex/deployment/kubernetes/__init__.py
Deployment Workflow¶
A typical end-to-end deployment:
-
Enable required APIs:
-
Build and push the container image to GCR or Artifact Registry (see Container Image)
-
Generate GCP infrastructure using
GCPDeploymentManager.export_terraform_config()and apply with Terraform -
Configure kubectl for your GKE cluster:
-
Install NVIDIA GPU device plugin (for GPU node pools):
-
Generate and apply K8s manifests:
python -c " from pathlib import Path from opifex.deployment.kubernetes import ManifestGenerator gen = ManifestGenerator('opifex', 'opifex-api', 'gcr.io/<PROJECT_ID>/opifex:latest') gen.export_all_manifests(Path('./k8s-manifests'), replicas=3, service_port=8080) " kubectl create namespace opifex kubectl apply -f k8s-manifests/ -
Verify the deployment:
Monitoring¶
GCPDeploymentManager.generate_monitoring_config() produces:
- Notification channels: email-based alert routing
- Alert policies: High CPU (>80% for 5 minutes) and high memory (>90% for 5 minutes) on
k8s_containerresources
The FastAPI server also exposes a /metrics endpoint returning request count, average latency, throughput, and uptime.
For GCP-native monitoring:
# Set up log-based metrics
gcloud logging metrics create opifex_errors \
--description="Opifex application errors" \
--log-filter='resource.type="k8s_container" AND severity="ERROR"'
Troubleshooting¶
GKE Cluster Access¶
# Verify authentication
gcloud auth list
# Update kubeconfig
gcloud container clusters get-credentials opifex-cluster --zone us-central1-a
# Check nodes are ready
kubectl get nodes -o wide
GPU Not Available in Pods¶
# Check GPU node pool exists
gcloud container node-pools list --cluster=opifex-cluster --zone=us-central1-a
# Verify NVIDIA device plugin is running
kubectl get pods -n kube-system -l k8s-app=nvidia-gpu-device-plugin
# Check GPU allocation on nodes
kubectl describe nodes | grep -A5 "nvidia.com/gpu"
# Verify JAX sees GPU inside a pod
kubectl exec -it <pod-name> -n opifex -- python -c "import jax; print(jax.devices())"
Quota Issues¶
# Check project quotas
gcloud compute project-info describe --project=<PROJECT_ID>
# Check regional GPU quota
gcloud compute regions describe us-central1 --project=<PROJECT_ID>