Back to Blog
PalantirApollocontinuous deploymentair-gappedKubernetesdata platformDevOpsmicroservices

Palantir Apollo Deep Dive: Continuous Deployment to Air-Gapped Environments

Analyze Palantir Apollo's deployment architecture for air-gapped military environments, and explore open-source alternatives like AIP for enterprise deployment.

Coomia TeamPublished on May 1, 202515 min read
Share this articleTwitter / X

#TL;DR

  • Apollo is Palantir's continuous deployment platform that automatically deploys hundreds of microservices to any environment -- from commercial clouds to air-gapped military facilities -- making it one of Palantir's least-known but most critical competitive moats.
  • Traditional CI/CD tools (Jenkins, ArgoCD, Spinnaker) cannot solve the core challenge: how do you achieve continuous delivery in fully air-gapped environments with no remote access? Apollo solves this through a pull-based architecture and environment-aware release channels.
  • Coomia DIP adopts a progressive evolution strategy from Docker Compose to K8s Operator, reproducing Apollo's core deployment philosophy within the open-source ecosystem while managing 11 infrastructure components.

#1. Why Deployment Capability Is Palantir's Hidden Moat

When people discuss Palantir, they typically focus on the Ontology, AI/ML capabilities, or data fusion. Few realize that deployment capability is Palantir's true "secret weapon."

Consider these scenarios:

Code
Scenario A: A National Defense Data Center
+-------------------------------------+
|  SCIF (Sensitive Compartmented       |
|        Information Facility)         |
|  +-----------------------------+    |
|  |  Completely air-gapped      |    |
|  |  No internet connection     |    |
|  |  Physical isolation         |    |
|  |  Must run full Foundry      |    |
|  +-----------------------------+    |
|  Classification: TS/SCI             |
+-------------------------------------+

Scenario B: A Naval Destroyer
+-------------------------------------+
|  Maritime Combat Environment         |
|  +-----------------------------+    |
|  |  Intermittent satellite     |    |
|  |  Extremely low bandwidth   |    |
|  |  Outages lasting weeks     |    |
|  |  Must run independently    |    |
|  +-----------------------------+    |
|  Hardware: Limited server racks     |
+-------------------------------------+

Scenario C: Commercial Cloud (AWS/Azure/GCP)
+-------------------------------------+
|  Standard SaaS Deployment            |
|  +-----------------------------+    |
|  |  Persistent connectivity   |    |
|  |  Elastic resources         |    |
|  |  Multi-tenant isolation    |    |
|  |  Auto-scaling required     |    |
|  +-----------------------------+    |
|  Compliance: SOC2, FedRAMP          |
+-------------------------------------+

The same software needs to be deployed to all three radically different environments. That is the problem Apollo was built to solve.

Traditional SaaS companies only manage cloud environments. Traditional defense contractors customize deployments per customer. Palantir must do both -- using a single, unified automation system.

#2. Apollo's Architecture

#2.1 Core Design Principles

Apollo's design is built on several key principles:

Code
Apollo Design Principles
================================================

1. Environment Agnostic
   Same artifact -> any environment

2. Pull-Based Architecture
   Environments pull updates; the center doesn't push

3. Declarative State
   Describe desired state, not execution steps

4. Progressive Delivery
   Canary -> limited scope -> full rollout

5. Self-Healing
   Detect drift, auto-remediate

#2.2 High-Level Architecture

Code
                    Apollo Control Plane
        +----------------------------------+
        |                                  |
        |  +----------+  +--------------+  |
        |  | Release   |  | Environment  |  |
        |  | Manager   |  | Registry     |  |
        |  +-----+-----+  +------+------+  |
        |        |               |          |
        |  +-----v---------------v-------+  |
        |  |    Deployment Orchestrator   |  |
        |  +-------------+---------------+  |
        |                |                  |
        |  +-------------v---------------+  |
        |  |    Artifact Repository       |  |
        |  |  (Docker Images, Configs)    |  |
        |  +-------------+---------------+  |
        +----------------+------------------+
                         |
            +------------+----------------+
            |            |                |
    +-------v--+  +------v---+  +--------v----+
    |  SaaS    |  | Private  |  | Air-Gapped  |
    |  Cloud   |  | Cloud    |  | Environment |
    |          |  |          |  |             |
    | +------+ |  | +------+ |  | +------+    |
    | |Apollo| |  | |Apollo| |  | |Apollo|    |
    | |Agent | |  | |Agent | |  | |Agent |    |
    | +--+---+ |  | +--+---+ |  | +--+---+    |
    |    |     |  |    |     |  |    |        |
    | +--v---+ |  | +--v---+ |  | +--v---+    |
    | |Local | |  | |Local | |  | |Local |    |
    | | K8s  | |  | | K8s  | |  | | K8s  |    |
    | +------+ |  | +------+ |  | +------+    |
    +----------+  +----------+  +-------------+
     Real-time     Periodic      Offline sync
      pull          pull

#2.3 Apollo Agent: The Autonomous Brain Inside Each Environment

Every deployment environment runs an Apollo Agent -- the linchpin of the entire system:

Code
Apollo Agent Internal Architecture
+--------------------------------------------+
|  Apollo Agent                              |
|                                            |
|  +--------------+  +-------------------+   |
|  | State Manager |  | Health Monitor    |   |
|  |              |  |                   |   |
|  | Desired state|  | Service health    |   |
|  | Current state|  | Resource usage    |   |
|  | State diff   |  | Error rate        |   |
|  +------+-------+  +--------+----------+   |
|         |                   |              |
|  +------v-------------------v----------+   |
|  |      Reconciliation Engine          |   |
|  |                                     |   |
|  |  if current_state != desired_state: |   |
|  |      compute minimal changeset      |   |
|  |      execute progressive deploy     |   |
|  |      validate deployment result     |   |
|  |      report status                  |   |
|  +------------------+------------------+   |
|                     |                      |
|  +------------------v------------------+   |
|  |      Local Artifact Cache           |   |
|  |  (operates independently offline)   |   |
|  +---------/---------------------------+   |
+--------------------------------------------+

#3. Continuous Deployment in Air-Gapped Environments

#3.1 The Air-Gapped Challenge

Air-gapped environments are where Apollo truly demonstrates its value:

  • No internet connection whatsoever
  • No remote SSH or VPN access
  • Strict physical security (specific personnel with specific media entering specific rooms)
  • Update frequency may be weekly, monthly, or even quarterly
Code
Air-Gapped Update Workflow
================================================

Step 1: Prepare update bundle in connected environment
+--------------------------------+
|  Apollo Build System            |
|                                |
|  1. Collect all changed images |
|  2. Package config changes     |
|  3. Generate checksums (SHA)   |
|  4. Encrypt the bundle         |
|  5. Write to secure media      |
+----------------+---------------+
                 |
                 v
+--------------------------------+
|  Secure Media (Encrypted HDD)  |
|  +---------------------------+ |
|  |  manifest.json            | |
|  |  images/                  | |
|  |    +-- service-a:v2.3.1   | |
|  |    +-- service-b:v1.8.0   | |
|  |    +-- service-c:v4.1.2   | |
|  |  configs/                 | |
|  |    +-- env-specific.yaml  | |
|  |    +-- secrets.enc        | |
|  |  checksums.sha256         | |
|  +---------------------------+ |
+----------------+---------------+
                 | Physical transport
                 v
Step 2: Import and deploy in air-gapped env
+--------------------------------+
|  Air-Gapped Data Center         |
|                                |
|  1. Attach secure media        |
|  2. Verify integrity           |
|  3. Agent reads manifest       |
|  4. Delta-import images        |
|  5. Rolling update by deps     |
|  6. Health check & auto-rollback|
+--------------------------------+

#3.2 Release Channels

Apollo uses release channels to manage update cadence across environments:

Code
Release Channels
================================================

Timeline >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

dev      ####
         v2.4.0-dev.1 (immediate)

staging      ####
             v2.4.0-rc.1 (1 day after dev)

canary           ####
                 v2.4.0-rc.1 (5% traffic)

prod                 ####
                     v2.4.0 (canary stable 3d)

gov-cloud                    ####
                             v2.4.0 (prod+7d)

air-gap                              ####
                                     v2.4.0 (gov+14d)

Stability ^
          |  air-gap (most stable, highest lag)
          |  gov-cloud
          |  prod
          |  canary
          |  staging
          |  dev (newest, highest risk)
          +----------------------------> Time

#4. Managing Microservices at Scale

#4.1 Palantir's Microservice Scale

Palantir Foundry consists of hundreds of microservices. Apollo must manage the dependency relationships, version compatibility, and deployment ordering across all of them.

Code
Foundry Microservice Dependency Graph (Simplified)
================================================

                  +-------------+
                  |   Gateway   |
                  +------+------+
                         |
          +--------------+---------------+
          |              |               |
   +------v------+ +----v-----+ +-------v------+
   | Auth Service| | Ontology | | Search       |
   | (AuthN)     | | Service  | | Service      |
   +------+------+ +----+-----+ +-------+------+
          |              |               |
          |       +------+-------+       |
          |       |      |       |       |
          |  +----v--+ +-v---+ +-v-----+|
          |  |Object | |Link | |Action ||
          |  |Store  | |Store| |Exec   ||
          |  +---+---+ +--+--+ +--+---+|
          |      |        |       |    |
          |      +----+---+       |    |
          |           |           |    |
   +------v-----------v-----------v----v---+
   |           Data Foundation              |
   |  (Spark, Parquet, Iceberg, HDFS)       |
   +----------------------------------------+

Apollo must understand:
- Auth must be ready before other services
- Ontology depends on Object Store and Link Store
- Upgrading Data Foundation requires pausing Pipelines
- Search Service can be upgraded independently

#4.2 Deployment Order Orchestration

Python
# Apollo Deployment Orchestration (Conceptual Code)

class DeploymentPlan:
    """Apollo deployment plan generator"""

    def generate_plan(self, current_state, desired_state):
        """
        Generate optimal deployment plan based on dependency graph.

        Rules:
        1. Infrastructure layer first
        2. Stateless services can be parallelized
        3. Stateful services upgrade serially
        4. Health check validation after each step
        """
        changes = self.diff(current_state, desired_state)
        graph = self.build_dependency_graph(changes)
        phases = self.topological_sort(graph)

        plan = []
        for phase in phases:
            parallel_group = []
            for service in phase:
                step = DeploymentStep(
                    service=service,
                    strategy=self.choose_strategy(service),
                    health_check=service.health_endpoint,
                    rollback_trigger=RollbackTrigger(
                        error_rate_threshold=0.05,
                        latency_p99_threshold_ms=500,
                        window_seconds=300
                    )
                )
                parallel_group.append(step)
            plan.append(parallel_group)

        return plan

    def choose_strategy(self, service):
        """Choose deployment strategy based on service characteristics"""
        if service.is_stateful:
            return RollingUpdate(max_unavailable=0, max_surge=1)
        elif service.is_critical:
            return CanaryDeployment(
                initial_percentage=5,
                increment=10,
                interval_minutes=5
            )
        else:
            return BlueGreen()

#5. Canary Deployments and Automatic Rollback

#5.1 Canary Deployment Flow

Code
Canary Deployment Detailed Flow
================================================

Phase 1: Deploy canary instance (5% traffic)
+-----------------------------------------+
|  Service v1.0 #################### 95%   |
|  Service v1.1 #                    5%   |
|                                         |
|  Metrics:                               |
|  - Error rate: v1.0=0.1%, v1.1=0.08%   |
|  - P99 latency: v1.0=45ms, v1.1=42ms   |
|  - CPU usage: v1.0=35%, v1.1=33%        |
+-----------------------------------------+
          | 5 min observation [OK]
          v
Phase 2: Expand canary (25% traffic)
+-----------------------------------------+
|  Service v1.0 ###############      75%   |
|  Service v1.1 #####                25%   |
|                                         |
|  Metrics:                               |
|  - Error rate: v1.0=0.1%, v1.1=0.09%   |
|  - P99 latency: v1.0=45ms, v1.1=40ms   |
+-----------------------------------------+
          | 5 min observation [OK]
          v
Phase 3: Majority traffic (75%)
+-----------------------------------------+
|  Service v1.0 #####                25%   |
|  Service v1.1 ###############      75%   |
+-----------------------------------------+
          | 10 min observation [OK]
          v
Phase 4: Full cutover (100%)
+-----------------------------------------+
|  Service v1.1 #################### 100%  |
|                                         |
|  Keep v1.0 instances 30 min for rollback |
+-----------------------------------------+

#5.2 Automatic Rollback Triggers

YAML
# Apollo Rollback Strategy Configuration (Conceptual)
rollback:
  triggers:
    - metric: error_rate
      threshold: 0.05          # Error rate exceeds 5%
      window: 5m
      action: immediate_rollback

    - metric: latency_p99
      threshold: 500ms         # P99 latency exceeds 500ms
      window: 5m
      comparison: baseline     # Compare against baseline version
      action: pause_and_alert

    - metric: pod_restart_count
      threshold: 3             # Pod restarts exceed 3 times
      window: 10m
      action: immediate_rollback

    - metric: memory_usage
      threshold: 90%           # Memory usage exceeds 90%
      window: 3m
      action: pause_and_alert

  rollback_strategy:
    type: fast_rollback        # Immediately switch to old version
    preserve_old_version: 30m  # Keep old version for 30 minutes
    notify:
      - channel: slack
      - channel: pagerduty
        severity: P2

#6. How Apollo Manages Palantir's Own Infrastructure

Apollo is not only used for customer deployments -- all of Palantir's own infrastructure is managed by Apollo. This means Apollo even deploys and manages itself.

Code
Apollo Bootstrapping
================================================

Layer 0: Bare Metal / Cloud VM
+------------------------------------+
|  OS + Container Runtime            |
+------------------+-----------------+
                   | Apollo Bootstrap
                   v
Layer 1: Apollo Core
+------------------------------------+
|  Apollo Agent (minimal version)    |
|  Apollo State Store                |
|  Apollo Artifact Cache             |
+------------------+-----------------+
                   | Apollo self-deploys
                   v
Layer 2: Infrastructure Services
+------------------------------------+
|  Kubernetes Control Plane          |
|  Container Registry (internal)     |
|  Certificate Manager               |
|  DNS Service                       |
|  Monitoring (Prometheus + Grafana) |
+------------------+-----------------+
                   | Apollo deploys apps
                   v
Layer 3: Foundry Platform Services
+------------------------------------+
|  Auth Service                      |
|  Ontology Service                  |
|  Data Foundation                   |
|  Pipeline Engine                   |
|  ... (hundreds of microservices)   |
+------------------------------------+

#7. Comparison with Mainstream CI/CD Tools

DimensionApolloArgoCDSpinnakerFluxCD
Air-gapped deploymentNative supportNot supportedNot supportedNot supported
Pull-based architectureYesYesNo (push)Yes
Multi-environment mgmtUnified control planeMulti-cluster configSupported but complexMulti-repo needed
Canary deploymentBuilt-in + auto metricsRequires Argo RolloutsBuilt-inRequires Flagger
Microservice dependencyBuilt-in dep graphSync Wave basedPipeline basedKustomize based
Self-healingDeep self-healingBasic syncNoneBasic sync
Offline update bundlesAuto-generatedNot supportedNot supportedNot supported
ScaleHundreds of servicesMediumLarge scaleSmall-medium
Learning curveHigh (internal tool)MediumHighLow
Open sourceNoYesYesYes

While Apollo is powerful, it remains Palantir's proprietary internal tool -- unavailable to enterprises directly. For organizations seeking similar capabilities while maintaining technology independence, Coomia DIP offers a progressive deployment approach built entirely on open-source technologies, ensuring zero vendor lock-in.

#Why Traditional Tools Fall Short

Code
Hidden Assumptions of Traditional CI/CD
================================================

Jenkins / GitLab CI / GitHub Actions:
  Assumption 1: Build system can reach target env   <- Air-gap breaks this
  Assumption 2: Network connectivity is stable       <- Ships/aircraft break this
  Assumption 3: Deployment status is observable      <- Classified envs break this
  Assumption 4: Rollback = redeploy old version      <- State migrations break this

ArgoCD / FluxCD:
  Assumption 1: Git repo is accessible to cluster    <- Air-gap breaks this
  Assumption 2: Container registry is online         <- Needs local registry
  Assumption 3: Kubernetes API is remotely reachable <- Classified envs break this

Apollo's Solution:
  [OK] Does not assume network connectivity
  [OK] Does not assume remote access
  [OK] Local autonomy + offline updates
  [OK] State-aware upgrade strategies

#8. How AIP Handles Deployment

#8.1 Design Philosophy: Progressive Evolution

AIP adopts a progressive deployment strategy from simple to complex:

Code
AIP Deployment Evolution
================================================

Phase 1: Docker Compose (Current)
+----------------------------------------+
|  docker-compose.yml                    |
|                                        |
|  Use cases:                            |
|  - Development environments           |
|  - Small POC deployments              |
|  - Single-machine deployment          |
|                                        |
|  Strengths: Simple, zero deps, fast    |
|  Limits: No auto-scaling, no HA       |
+----------------------------------------+
                   |
                   v
Phase 2: K8s Helm Charts (Planned)
+----------------------------------------+
|  helm/mds/                             |
|  +-- Chart.yaml                        |
|  +-- values.yaml                       |
|  +-- values-dev.yaml                   |
|  +-- values-prod.yaml                  |
|  +-- templates/                        |
|      +-- control-plane/                |
|      +-- data-plane/                   |
|      +-- intelligence-plane/           |
|                                        |
|  Use cases:                            |
|  - Production Kubernetes deployment    |
|  - Multi-environment configuration     |
|                                        |
|  Strengths: K8s ecosystem, declarative |
|  Limits: Requires K8s ops capability   |
+----------------------------------------+
                   |
                   v
Phase 3: K8s Operator (Long-term Goal)
+----------------------------------------+
|  AIP Operator                          |
|                                        |
|  apiVersion: mds.coomia.com/v1         |
|  kind: MDSCluster                      |
|  spec:                                 |
|    controlPlane:                        |
|      replicas: 3                       |
|    dataPlane:                           |
|      sparkWorkers: 5                   |
|    intelligencePlane:                   |
|      reasoningEngines: 2               |
|                                        |
|  Use cases:                            |
|  - Enterprise deployment              |
|  - Automated Day 2 Operations         |
|                                        |
|  Strengths: Fully automated, self-heal |
+----------------------------------------+

#9. Real-World Case Study: Apollo in Military Scenarios

#9.1 Carrier Strike Group Deployment

Code
Carrier Strike Group Foundry Deployment Topology
================================================

        Satellite Comm Link (intermittent)
              ^
              |
              |    +-------------------------+
              |    | Shore-Based Apollo       |
              |    | Control Center           |
              |    | (continuously updates    |
              |    |  artifact repository)    |
              |    +-------------------------+
              |
    ----------+------------ At Sea -----------
              |
     +--------v----------------------------+
     |  CVN-XX Aircraft Carrier            |
     |  +--------------------------+       |
     |  | Foundry (Full Version)    |       |
     |  | - All microservices       |       |
     |  | - Local data lake         |       |
     |  | - Apollo Agent            |       |
     |  | - GPU cluster (AI/ML)     |       |
     |  +--------------------------+       |
     +--------------+----------------------+
                    | Fleet network
          +---------+---------+
          |         |         |
   +------v--+ +---v-----+ +-v------+
   | DDG-XX  | | CG-XX   | |SSN-XX  |
   |Destroyer| | Cruiser  | |Submarine|
   |         | |         | |        |
   | Foundry | | Foundry | |Foundry |
   |(Compact)| |(Compact)| |(Minimal)|
   +---------+ +---------+ +--------+

   Full:    ~200 microservices, server room required
   Compact: ~50 core microservices, 2-3 servers
   Minimal: ~15 critical microservices, single hardened server

#10. Deployment Philosophy Learned from Apollo

#10.1 Deployment as a Product

The most important lesson Apollo teaches us: deployment is not an appendage of development -- deployment itself is a product.

Code
Traditional Thinking:
  Code -> Build -> Test -> "Throw it over the wall to ops"

Apollo Thinking:
  Code + Deploy Spec -> Build + Deploy Validation -> Test + Deploy Test -> Auto-Deploy

  Deployment is a first-class citizen:
  +----------------------------------+
  |  Each service defines:           |
  |  1. Business logic               |
  |  2. API contract                 |
  |  3. Deploy constraints           |
  |     (resources, deps, ordering)  |
  |  4. Health check criteria        |
  |  5. Rollback conditions          |
  |  6. Data migration scripts       |
  |                                  |
  |  Missing any = release blocked   |
  +----------------------------------+

#Key Takeaways

  1. Apollo's core innovation is not CI/CD itself, but "environment-agnostic continuous deployment" -- it solves the unified deployment problem from SaaS to air-gapped military environments, a fundamental blind spot for traditional CD tools.

  2. Pull-based architecture + local autonomy is the only viable approach for air-gapped deployment -- Apollo Agent runs independently in each environment, maintaining service operations and basic self-healing even when completely disconnected from the control plane.

  3. AIP chose a pragmatic progressive path -- starting from Docker Compose and evolving toward Helm Charts and K8s Operator, building Apollo-like deployment capabilities on open-source toolchains while managing 11 infrastructure components.

#Want Palantir-Level Capabilities? Try AIP

Palantir's technology vision is impressive, but its steep pricing and closed ecosystem put it out of reach for most organizations. Coomia DIP is built on the same Ontology-driven philosophy, delivering an open-source, transparent, and privately deployable data intelligence platform.

  • AI Pipeline Builder: Describe in natural language, get production-grade data pipelines automatically
  • Business Ontology: Model your business world like Palantir does, but fully open
  • Decision Intelligence: Built-in rules engine and what-if analysis for data-driven decisions
  • Open Architecture: Built on Flink, Doris, Kafka, and other open-source technologies — zero lock-in

👉 Start Your Free Coomia DIP Trial | View Documentation

Related Articles