Контейнеры в облаке: ECS vs EKS

Зачем контейнеры в облаке

Локально вы запускаете docker compose up и всё работает. В production нужно решить:

Как запускать контейнеры на нескольких серверах?
Как масштабировать при росте нагрузки?
Как обновлять без downtime?
Как восстанавливаться при сбоях?

Оркестраторы контейнеров решают все эти задачи.

┌─────────────────────────────────────────────────────────┐
│                 Container Orchestration                  │
│                                                         │
│  ┌─────────┐   ┌─────────────┐   ┌──────────────────┐ │
│  │   ECS   │   │     EKS     │   │   App Runner     │ │
│  │ Fargate │   │ (Kubernetes)│   │  (Zero config)   │ │
│  │         │   │             │   │                  │ │
│  │ AWS-    │   │ Standard    │   │ Push image →     │ │
│  │ native  │   │ K8s API     │   │ get URL          │ │
│  │ simple  │   │ portable    │   │                  │ │
│  └─────────┘   └─────────────┘   └──────────────────┘ │
│                                                         │
│  Complexity: Low ◄──────────────────────────► High      │
│  Control:    Low ◄──────────────────────────► Full      │
└─────────────────────────────────────────────────────────┘

Сравнение платформ

Критерий	ECS Fargate	EKS	App Runner
Сложность настройки	Средняя	Высокая	Минимальная
Управление серверами	Нет	Зависит (Fargate/EC2)	Нет
Kubernetes знания	Не нужны	Обязательны	Не нужны
Auto-scaling	Target tracking	HPA/VPA/KEDA	Встроенное
Networking	VPC, ALB, Service Discovery	K8s Services, Ingress	Автоматическое
CI/CD	CodePipeline, GitHub Actions	ArgoCD, Flux, Helm	Авто-деплой из ECR
Multi-cloud	Нет	Да (K8s portable)	Нет
Стоимость control plane	Бесплатно	$73/мес	Бесплатно
Стоимость compute	vCPU + Memory per task	vCPU + Memory per node	vCPU + Memory + requests
Лучше для	Большинство проектов	Сложные microservice mesh	Простые API, MVP

ECS Fargate: подробно

Основные концепции

┌──────────────────────────────────────────┐
│              ECS Cluster                  │
│                                          │
│  ┌────────────────────────────────────┐  │
│  │           Service: API             │  │
│  │  Desired count: 3                  │  │
│  │  ┌────────┐ ┌────────┐ ┌────────┐ │  │
│  │  │ Task 1 │ │ Task 2 │ │ Task 3 │ │  │
│  │  │ nginx  │ │ nginx  │ │ nginx  │ │  │
│  │  │ php    │ │ php    │ │ php    │ │  │
│  │  └────────┘ └────────┘ └────────┘ │  │
│  └────────────────────────────────────┘  │
│                                          │
│  ┌────────────────────────────────────┐  │
│  │        Service: Worker             │  │
│  │  Desired count: 2                  │  │
│  │  ┌────────┐ ┌────────┐            │  │
│  │  │ Task 1 │ │ Task 2 │            │  │
│  │  │ worker │ │ worker │            │  │
│  │  └────────┘ └────────┘            │  │
│  └────────────────────────────────────┘  │
└──────────────────────────────────────────┘

Cluster -- логическая группа сервисов
Task Definition -- описание контейнера (image, CPU, memory, environment, ports)
Task -- запущенный экземпляр Task Definition (аналог docker run)
Service -- управляет N tasks, обеспечивает desired count, rolling updates

Task Definition для PHP-приложения

# ECR Repository for Docker images
resource "aws_ecr_repository" "app" {
  name                 = "${var.project}-api"
  image_tag_mutability = "IMMUTABLE"

  image_scanning_configuration {
    scan_on_push = true
  }

  encryption_configuration {
    encryption_type = "AES256"
  }
}

# ECS Task Definition: PHP API (nginx + php-fpm sidecar)
resource "aws_ecs_task_definition" "api" {
  family                   = "${var.project}-api"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = var.environment == "prod" ? 1024 : 256
  memory                   = var.environment == "prod" ? 2048 : 512
  execution_role_arn       = aws_iam_role.ecs_execution.arn
  task_role_arn            = aws_iam_role.ecs_task.arn

  container_definitions = jsonencode([
    {
      name      = "nginx"
      image     = "${aws_ecr_repository.app.repository_url}:nginx-${var.image_tag}"
      essential = true
      cpu       = var.environment == "prod" ? 256 : 64
      memory    = var.environment == "prod" ? 512 : 128

      portMappings = [{
        containerPort = 80
        protocol      = "tcp"
      }]

      dependsOn = [{
        containerName = "php"
        condition     = "HEALTHY"
      }]

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.api.name
          "awslogs-region"        = var.region
          "awslogs-stream-prefix" = "nginx"
        }
      }
    },
    {
      name      = "php"
      image     = "${aws_ecr_repository.app.repository_url}:php-${var.image_tag}"
      essential = true
      cpu       = var.environment == "prod" ? 768 : 192
      memory    = var.environment == "prod" ? 1536 : 384

      environment = [
        { name = "APP_ENV", value = var.environment },
        { name = "DATABASE_URL", value = "postgresql://${var.db_user}:${var.db_pass}@${aws_db_instance.main.address}:5432/${var.db_name}" },
        { name = "REDIS_URL", value = "rediss://${aws_elasticache_replication_group.main.primary_endpoint_address}:6379" },
        { name = "MESSENGER_TRANSPORT_DSN", value = "sqs://sqs.${var.region}.amazonaws.com/${data.aws_caller_identity.current.account_id}/${var.project}-messages" },
      ]

      secrets = [
        { name = "APP_SECRET", valueFrom = "${aws_secretsmanager_secret.app.arn}:APP_SECRET::" },
        { name = "JWT_PASSPHRASE", valueFrom = "${aws_secretsmanager_secret.app.arn}:JWT_PASSPHRASE::" },
      ]

      healthCheck = {
        command     = ["CMD-SHELL", "php-fpm-healthcheck || exit 1"]
        interval    = 15
        timeout     = 5
        retries     = 3
        startPeriod = 30
      }

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.api.name
          "awslogs-region"        = var.region
          "awslogs-stream-prefix" = "php"
        }
      }
    }
  ])

  tags = {
    Name        = "${var.project}-api"
    Environment = var.environment
  }
}

ECS Service с ALB и Auto-Scaling

# ECS Service
resource "aws_ecs_service" "api" {
  name            = "${var.project}-api"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.api.arn
  desired_count   = var.environment == "prod" ? 3 : 1
  launch_type     = "FARGATE"

  deployment_configuration {
    maximum_percent         = 200
    minimum_healthy_percent = 100
  }

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }

  network_configuration {
    subnets          = aws_subnet.private[*].id
    security_groups  = [aws_security_group.app.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.api.arn
    container_name   = "nginx"
    container_port   = 80
  }

  lifecycle {
    ignore_changes = [desired_count]  # Managed by auto-scaling
  }

  tags = {
    Name        = "${var.project}-api"
    Environment = var.environment
  }
}

# Auto-Scaling: target tracking on CPU
resource "aws_appautoscaling_target" "api" {
  max_capacity       = var.environment == "prod" ? 20 : 3
  min_capacity       = var.environment == "prod" ? 3 : 1
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.api.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

# Scale on CPU utilization
resource "aws_appautoscaling_policy" "api_cpu" {
  name               = "${var.project}-api-cpu-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.api.resource_id
  scalable_dimension = aws_appautoscaling_target.api.scalable_dimension
  service_namespace  = aws_appautoscaling_target.api.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
    target_value       = 70.0
    scale_in_cooldown  = 300
    scale_out_cooldown = 60
  }
}

# Scale on request count per target
resource "aws_appautoscaling_policy" "api_requests" {
  name               = "${var.project}-api-request-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.api.resource_id
  scalable_dimension = aws_appautoscaling_target.api.scalable_dimension
  service_namespace  = aws_appautoscaling_target.api.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ALBRequestCountPerTarget"
      resource_label         = "${aws_lb.main.arn_suffix}/${aws_lb_target_group.api.arn_suffix}"
    }
    target_value       = 1000.0
    scale_in_cooldown  = 300
    scale_out_cooldown = 60
  }
}

Worker Service (для очередей)

# Worker Task Definition (single container, no nginx)
resource "aws_ecs_task_definition" "worker" {
  family                   = "${var.project}-worker"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = 512
  memory                   = 1024
  execution_role_arn       = aws_iam_role.ecs_execution.arn
  task_role_arn            = aws_iam_role.ecs_task.arn

  container_definitions = jsonencode([{
    name      = "worker"
    image     = "${aws_ecr_repository.app.repository_url}:php-${var.image_tag}"
    essential = true

    command = ["php", "bin/console", "messenger:consume", "async", "--time-limit=3600", "--memory-limit=512M"]

    environment = [
      { name = "APP_ENV", value = var.environment },
      { name = "DATABASE_URL", value = "postgresql://${var.db_user}:${var.db_pass}@${aws_db_instance.main.address}:5432/${var.db_name}" },
      { name = "MESSENGER_TRANSPORT_DSN", value = "sqs://sqs.${var.region}.amazonaws.com/${data.aws_caller_identity.current.account_id}/${var.project}-messages" },
    ]

    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-group"         = aws_cloudwatch_log_group.worker.name
        "awslogs-region"        = var.region
        "awslogs-stream-prefix" = "worker"
      }
    }
  }])
}

# Worker service (scales based on SQS queue depth)
resource "aws_ecs_service" "worker" {
  name            = "${var.project}-worker"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.worker.arn
  desired_count   = var.environment == "prod" ? 2 : 1
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = aws_subnet.private[*].id
    security_groups  = [aws_security_group.app.id]
    assign_public_ip = false
  }

  lifecycle {
    ignore_changes = [desired_count]
  }
}

# Scale worker based on SQS queue depth
resource "aws_appautoscaling_policy" "worker_sqs" {
  name               = "${var.project}-worker-sqs-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.worker.resource_id
  scalable_dimension = aws_appautoscaling_target.worker.scalable_dimension
  service_namespace  = aws_appautoscaling_target.worker.service_namespace

  target_tracking_scaling_policy_configuration {
    customized_metric_specification {
      metric_name = "ApproximateNumberOfMessagesVisible"
      namespace   = "AWS/SQS"
      statistic   = "Average"
      dimensions {
        name  = "QueueName"
        value = "${var.project}-messages"
      }
    }
    target_value       = 10.0  # Scale when >10 messages per worker
    scale_in_cooldown  = 300
    scale_out_cooldown = 60
  }
}

CI/CD для ECS (GitHub Actions)

# .github/workflows/deploy.yml
name: Deploy to ECS

on:
  push:
    branches: [main]

env:
  AWS_REGION: us-east-1
  ECR_REPOSITORY: myapp-api
  ECS_CLUSTER: myapp-prod
  ECS_SERVICE: myapp-api

jobs:
  deploy:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read

    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789:role/github-deploy
          aws-region: ${{ env.AWS_REGION }}

      - name: Login to ECR
        id: ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Build and push images
        env:
          REGISTRY: ${{ steps.ecr.outputs.registry }}
          IMAGE_TAG: ${{ github.sha }}
        run: |
          # Build PHP image
          docker build -f api/Dockerfile --target prod \
            -t $REGISTRY/$ECR_REPOSITORY:php-$IMAGE_TAG api/
          docker push $REGISTRY/$ECR_REPOSITORY:php-$IMAGE_TAG

          # Build Nginx image
          docker build -f api/docker/nginx/Dockerfile \
            -t $REGISTRY/$ECR_REPOSITORY:nginx-$IMAGE_TAG api/
          docker push $REGISTRY/$ECR_REPOSITORY:nginx-$IMAGE_TAG

      - name: Update ECS task definition
        id: task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: ecs/task-definition.json
          container-name: php
          image: ${{ steps.ecr.outputs.registry }}/${{ env.ECR_REPOSITORY }}:php-${{ github.sha }}

      - name: Deploy to ECS
        uses: aws-actions/amazon-ecs-deploy-task-definition@v2
        with:
          task-definition: ${{ steps.task-def.outputs.task-definition }}
          service: ${{ env.ECS_SERVICE }}
          cluster: ${{ env.ECS_CLUSTER }}
          wait-for-service-stability: true

EKS: когда нужен Kubernetes

Когда выбирать EKS

Команда уже знает Kubernetes
Нужна multi-cloud портативность
Сложные сетевые политики (NetworkPolicy)
Custom operators (CRD)
Service mesh (Istio, Linkerd)
Advanced deployment (canary с Flagger, progressive delivery)

Пример: Deployment для PHP

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  labels:
    app: api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
        - name: nginx
          image: 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:nginx-latest
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "250m"
              memory: "256Mi"

        - name: php
          image: 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:php-latest
          resources:
            requests:
              cpu: "250m"
              memory: "512Mi"
            limits:
              cpu: "1000m"
              memory: "1Gi"
          env:
            - name: APP_ENV
              value: "prod"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: app-secrets
                  key: database-url
          readinessProbe:
            httpGet:
              path: /health
              port: 80
            initialDelaySeconds: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health
              port: 80
            initialDelaySeconds: 30
            periodSeconds: 10

---
# HPA for auto-scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

App Runner: для простых случаев

# App Runner — simplest deployment
resource "aws_apprunner_service" "api" {
  service_name = "${var.project}-api"

  source_configuration {
    image_repository {
      image_identifier      = "${aws_ecr_repository.app.repository_url}:latest"
      image_repository_type = "ECR"

      image_configuration {
        port = "80"
        runtime_environment_variables = {
          APP_ENV      = var.environment
          DATABASE_URL = "postgresql://${var.db_user}:${var.db_pass}@${aws_db_instance.main.address}:5432/${var.db_name}"
        }
      }
    }

    auto_deployments_enabled = true

    authentication_configuration {
      access_role_arn = aws_iam_role.apprunner_ecr.arn
    }
  }

  instance_configuration {
    cpu    = "1024"
    memory = "2048"
  }

  auto_scaling_configuration_arn = aws_apprunner_auto_scaling_configuration_version.api.arn

  network_configuration {
    egress_configuration {
      egress_type       = "VPC"
      vpc_connector_arn = aws_apprunner_vpc_connector.main.arn
    }
  }

  tags = {
    Name        = "${var.project}-api"
    Environment = var.environment
  }
}

Push image → автоматический деплой → получаете URL. Без кластеров, task definitions, load balancers.

Decision Matrix

Начинаю новый проект?
  └── Нужен быстрый старт, MVP?
  │    └── Да → App Runner
  │    └── Нет
  │         └── Команда знает Kubernetes?
  │              └── Да → EKS (portability + ecosystem)
  │              └── Нет → ECS Fargate (default choice)
  │
  └── Мигрирую с Docker Compose?
       └── <5 сервисов, простая сеть → ECS Fargate
       └── >5 сервисов, service mesh → EKS
       └── 1 сервис, нужен URL → App Runner

Рекомендация: начинайте с ECS Fargate. Это покрывает 80% кейсов. Мигрируйте на EKS только если нужна портативность или сложные K8s-фичи.