CB Smart Recruit logo

Senior / Lead / Principal Platform Engineer (DevOps / Cloud Infrastructure)

CB Smart Recruit
2 days ago
Full-time
On-site
Los Angeles, California, United States
$200,000 - $300,000 USD yearly
Cloud Computing & DevOps

Location: West Hollywood / Los Angeles, CA
Work Model: On-site (5 days per week)
Employment Type: Full-Time
Compensation: $200,000–$300,000+ USD (depending on experience and seniority), plus a competitive sign-on bonus.

Applicants must be legally authorized to work in the United States. Visa sponsorship is not available for this role.

About the Opportunity

Our client is a well-funded, early-stage AI company building a next-generation intelligence platform for high-stakes, real-world decision making.

The platform ingests and fuses data from satellite feeds, autonomous sensors, logistics networks, enterprise systems, and open-source intelligence (OSINT) to power production AI/ML workloads, knowledge graphs, and intelligent decision-making systems.

This is not a traditional SaaS, DevOps, or chatbot company. The engineering team is building production AI infrastructure where reliability, scalability, security, and developer productivity are mission-critical.

We're looking for a Senior, Lead, or Principal Platform Engineer who enjoys building platforms—not simply maintaining them. You'll own the cloud infrastructure, Kubernetes platform, CI/CD and GitOps workflows, infrastructure automation, and internal developer platform that enables engineering teams to build and deploy production AI systems at scale.

This is a highly collaborative, hands-on engineering role with significant ownership and influence over the platform architecture.

The Role

As a Platform Engineer, you'll design, build, and operate the infrastructure that powers complex AI/ML workloads, while creating the internal tooling and platform capabilities that help software engineers move faster and more reliably.

The ideal candidate has a strong software engineering foundation, deep cloud infrastructure expertise, and experience owning production Kubernetes environments from design through day-to-day operations.

Key Responsibilities

Platform Engineering

  • Design, build, and operate scalable cloud infrastructure supporting production AI/ML workloads.
  • Own Kubernetes infrastructure, including architecture, networking, security, upgrades, scaling, and operational reliability.
  • Build and evolve an internal developer platform that improves engineering productivity and deployment velocity.
  • Develop self-service infrastructure and automation that enables engineering teams to ship software quickly and safely.
  • Continuously improve developer experience through platform engineering best practices.

Cloud Infrastructure & DevOps

  • Design and implement modern CI/CD and GitOps workflows for production environments.
  • Build reusable Infrastructure-as-Code solutions using Terraform and related tooling.
  • Architect highly available, resilient, and cost-efficient cloud infrastructure.
  • Drive adoption of containerization, Kubernetes, and cloud-native infrastructure across engineering teams.
  • Support AI-powered development workflows using tools such as Claude Code, Cursor, GitHub Copilot, or similar technologies.

AI Infrastructure

  • Build and optimize infrastructure supporting GPU-accelerated machine learning workloads.
  • Improve GPU provisioning, scheduling, utilization, and resource management.
  • Support scalable infrastructure for model training, inference, and AI services deployed in production.
  • Partner closely with AI engineers to optimize platform performance and reliability.

Reliability & Operations

  • Lead the investigation and resolution of complex production incidents across cloud infrastructure, Kubernetes, networking, and applications.
  • Perform root-cause analysis and implement long-term improvements that increase reliability.
  • Build comprehensive monitoring, alerting, logging, and observability solutions.
  • Drive platform reliability, performance optimization, and operational excellence.

Collaboration & Architecture

  • Partner with software engineers, AI engineers, security teams, and technical leadership on platform architecture decisions.
  • Produce technical design documentation for major infrastructure initiatives.
  • Champion engineering best practices around automation, scalability, security, testing, and reliability.
  • Evaluate emerging technologies that improve infrastructure capabilities and developer productivity.

Required Qualifications

  • Bachelor's degree in Computer Science, Software Engineering, Information Technology, or a related technical discipline (Master's preferred).
  • 5+ years of experience building and operating production cloud infrastructure, Platform Engineering, DevOps, or Site Reliability Engineering (SRE) environments.
  • Strong software engineering foundation with experience building automation, tooling, services, or developer platforms using Python, Go, Bash, or similar languages.
  • Demonstrated ownership of production Kubernetes clusters, including architecture, networking, upgrades, scaling, and operational support.
  • Hands-on experience designing and building Infrastructure-as-Code solutions using Terraform, including authoring reusable modules.
  • Strong experience designing and building CI/CD and GitOps pipelines—not simply maintaining existing pipelines.
  • Deep experience with Google Cloud Platform (GCP) and/or AWS.
  • Strong understanding of containerization technologies including Docker and Kubernetes.
  • Experience building and operating production-scale distributed systems.
  • Strong troubleshooting skills across cloud infrastructure, Kubernetes, networking, and applications.
  • Experience with observability platforms such as Prometheus, Grafana, Datadog, ELK, or equivalent.
  • Excellent communication and collaboration skills.

Preferred Qualifications

Experience with one or more of the following is highly desirable:

  • AI/ML infrastructure and GPU-accelerated workloads.
  • NVIDIA GPU infrastructure and CUDA environments.
  • Internal developer platforms and self-service infrastructure.
  • GitOps methodologies.
  • AI-native development tools such as Claude Code, Cursor, GitHub Copilot, or Codex.
  • Security-focused environments including DevSecOps practices.
  • Air-gapped, sovereign, or highly regulated deployment environments.
  • Defense, aerospace, government, or other mission-critical industries.
  • FedRAMP, ITAR, CMMC, or similar compliance frameworks.
  • Serverless architectures and distributed systems.

What We're Looking For

Successful candidates will demonstrate:

  • A platform engineering mindset with experience designing, building, and owning infrastructure—not simply maintaining existing environments.
  • A strong software engineering foundation and passion for automation.
  • Experience building platforms and internal tooling that improve developer productivity.
  • Excellent systems thinking across cloud infrastructure, Kubernetes, networking, security, and distributed systems.
  • A high level of ownership and comfort working in fast-moving environments with significant technical responsibility.
  • A pragmatic approach to balancing reliability, scalability, security, and developer experience.

Compensation & Benefits

  • Base salary: $200,000–$300,000+, depending on experience and seniority.
  • Competitive sign-on bonus.
  • Comprehensive benefits package.
  • Opportunity to join a well-funded, high-growth AI company at an early stage with significant technical ownership.
  • Long-term career growth with opportunities to take on broader platform and infrastructure leadership responsibilities as the organization continues to scale.

Why Join?

  • Build production infrastructure powering real-world AI systems—not internal IT or traditional enterprise DevOps.
  • Own the Kubernetes platform, developer experience, and cloud infrastructure that enables AI engineers to move faster.
  • Work alongside a highly technical engineering team solving challenging platform and infrastructure problems.
  • Support GPU-accelerated AI/ML workloads deployed in production.
  • Help shape the technical foundation of a rapidly growing AI company where engineering quality, ownership, and innovation are highly valued.

If you're passionate about Platform Engineering, cloud infrastructure, Kubernetes, automation, and building the systems that power next-generation AI applications, we'd love to hear from you.


Package Details

  • Competitive base salary of $200,000–$300,000+ USD (depending on experience and seniority)
  • Competitive sign-on bonus
  • Comprehensive benefits package
  • Significant technical ownership
  • The opportunity to join a well-funded, early-stage AI company building next-generation AI infrastructure.

Candidates located anywhere in the U.S. are encouraged to apply. The company offers a competitive sign-on bonus for successful hires. Please note that relocation assistance is not provided.