Senior / Lead / Principal Platform Engineer (DevOps / Cloud Infrastructure)

CB Smart Recruit

2 days ago

Full-time

On-site

Los Angeles, California, United States

$200,000 - $300,000 USD yearly

Cloud Computing & DevOps

Location: West Hollywood / Los Angeles, CA
Work Model: On-site (5 days per week)
Employment Type: Full-Time
Compensation: $200,000–$300,000+ USD (depending on experience and seniority), plus a competitive sign-on bonus.

Applicants must be legally authorized to work in the United States. Visa sponsorship is not available for this role.

About the Opportunity

Our client is a well-funded, early-stage AI company building a next-generation intelligence platform for high-stakes, real-world decision making.

The platform ingests and fuses data from satellite feeds, autonomous sensors, logistics networks, enterprise systems, and open-source intelligence (OSINT) to power production AI/ML workloads, knowledge graphs, and intelligent decision-making systems.

This is not a traditional SaaS, DevOps, or chatbot company. The engineering team is building production AI infrastructure where reliability, scalability, security, and developer productivity are mission-critical.

We're looking for a Senior, Lead, or Principal Platform Engineer who enjoys building platforms—not simply maintaining them. You'll own the cloud infrastructure, Kubernetes platform, CI/CD and GitOps workflows, infrastructure automation, and internal developer platform that enables engineering teams to build and deploy production AI systems at scale.

This is a highly collaborative, hands-on engineering role with significant ownership and influence over the platform architecture.

The Role

As a Platform Engineer, you'll design, build, and operate the infrastructure that powers complex AI/ML workloads, while creating the internal tooling and platform capabilities that help software engineers move faster and more reliably.

The ideal candidate has a strong software engineering foundation, deep cloud infrastructure expertise, and experience owning production Kubernetes environments from design through day-to-day operations.

Key Responsibilities

Platform Engineering

Design, build, and operate scalable cloud infrastructure supporting production AI/ML workloads.
Own Kubernetes infrastructure, including architecture, networking, security, upgrades, scaling, and operational reliability.
Build and evolve an internal developer platform that improves engineering productivity and deployment velocity.
Develop self-service infrastructure and automation that enables engineering teams to ship software quickly and safely.
Continuously improve developer experience through platform engineering best practices.

Cloud Infrastructure & DevOps

Design and implement modern CI/CD and GitOps workflows for production environments.
Build reusable Infrastructure-as-Code solutions using Terraform and related tooling.
Architect highly available, resilient, and cost-efficient cloud infrastructure.
Drive adoption of containerization, Kubernetes, and cloud-native infrastructure across engineering teams.
Support AI-powered development workflows using tools such as Claude Code, Cursor, GitHub Copilot, or similar technologies.

AI Infrastructure

Build and optimize infrastructure supporting GPU-accelerated machine learning workloads.
Improve GPU provisioning, scheduling, utilization, and resource management.
Support scalable infrastructure for model training, inference, and AI services deployed in production.
Partner closely with AI engineers to optimize platform performance and reliability.

Reliability & Operations

Lead the investigation and resolution of complex production incidents across cloud infrastructure, Kubernetes, networking, and applications.
Perform root-cause analysis and implement long-term improvements that increase reliability.
Build comprehensive monitoring, alerting, logging, and observability solutions.
Drive platform reliability, performance optimization, and operational excellence.

Collaboration & Architecture

Partner with software engineers, AI engineers, security teams, and technical leadership on platform architecture decisions.
Produce technical design documentation for major infrastructure initiatives.
Champion engineering best practices around automation, scalability, security, testing, and reliability.
Evaluate emerging technologies that improve infrastructure capabilities and developer productivity.

Required Qualifications

Bachelor's degree in Computer Science, Software Engineering, Information Technology, or a related technical discipline (Master's preferred).
5+ years of experience building and operating production cloud infrastructure, Platform Engineering, DevOps, or Site Reliability Engineering (SRE) environments.
Strong software engineering foundation with experience building automation, tooling, services, or developer platforms using Python, Go, Bash, or similar languages.
Demonstrated ownership of production Kubernetes clusters, including architecture, networking, upgrades, scaling, and operational support.
Hands-on experience designing and building Infrastructure-as-Code solutions using Terraform, including authoring reusable modules.
Strong experience designing and building CI/CD and GitOps pipelines—not simply maintaining existing pipelines.
Deep experience with Google Cloud Platform (GCP) and/or AWS.
Strong understanding of containerization technologies including Docker and Kubernetes.
Experience building and operating production-scale distributed systems.
Strong troubleshooting skills across cloud infrastructure, Kubernetes, networking, and applications.
Experience with observability platforms such as Prometheus, Grafana, Datadog, ELK, or equivalent.
Excellent communication and collaboration skills.

Preferred Qualifications

Experience with one or more of the following is highly desirable:

AI/ML infrastructure and GPU-accelerated workloads.
NVIDIA GPU infrastructure and CUDA environments.
Internal developer platforms and self-service infrastructure.
GitOps methodologies.
AI-native development tools such as Claude Code, Cursor, GitHub Copilot, or Codex.
Security-focused environments including DevSecOps practices.
Air-gapped, sovereign, or highly regulated deployment environments.
Defense, aerospace, government, or other mission-critical industries.
FedRAMP, ITAR, CMMC, or similar compliance frameworks.
Serverless architectures and distributed systems.

What We're Looking For

Successful candidates will demonstrate:

A platform engineering mindset with experience designing, building, and owning infrastructure—not simply maintaining existing environments.
A strong software engineering foundation and passion for automation.
Experience building platforms and internal tooling that improve developer productivity.
Excellent systems thinking across cloud infrastructure, Kubernetes, networking, security, and distributed systems.
A high level of ownership and comfort working in fast-moving environments with significant technical responsibility.
A pragmatic approach to balancing reliability, scalability, security, and developer experience.

Compensation & Benefits

Base salary: $200,000–$300,000+, depending on experience and seniority.
Competitive sign-on bonus.
Comprehensive benefits package.
Opportunity to join a well-funded, high-growth AI company at an early stage with significant technical ownership.
Long-term career growth with opportunities to take on broader platform and infrastructure leadership responsibilities as the organization continues to scale.

Why Join?

Build production infrastructure powering real-world AI systems—not internal IT or traditional enterprise DevOps.
Own the Kubernetes platform, developer experience, and cloud infrastructure that enables AI engineers to move faster.
Work alongside a highly technical engineering team solving challenging platform and infrastructure problems.
Support GPU-accelerated AI/ML workloads deployed in production.
Help shape the technical foundation of a rapidly growing AI company where engineering quality, ownership, and innovation are highly valued.

If you're passionate about Platform Engineering, cloud infrastructure, Kubernetes, automation, and building the systems that power next-generation AI applications, we'd love to hear from you.

Package Details

Competitive base salary of $200,000–$300,000+ USD (depending on experience and seniority)
Competitive sign-on bonus
Comprehensive benefits package
Significant technical ownership
The opportunity to join a well-funded, early-stage AI company building next-generation AI infrastructure.

Candidates located anywhere in the U.S. are encouraged to apply. The company offers a competitive sign-on bonus for successful hires. Please note that relocation assistance is not provided.

Apply now