Home Job Details
N
Information Technology 🏢 Full Time ⭐️ Verified

Senior AI Infrastructure Engineer

Nexus Future Systems
San Francisco
Estimated Salary
USD 180.000 – USD 260.000
New
Live Update
1 Juli 2026
Deadline
1 Jul 2027

Job Description

Are you ready to architect the foundation of the next technological era? Nexus Future Systems is seeking a visionary Senior AI Infrastructure Engineer to build the resilient backbone required for the rapid evolution of Artificial General Intelligence by 2026.

We are not just maintaining legacy systems; we are constructing the distributed ecosystems that will train, deploy, and scale trillion-parameter models globally. If you are passionate about bridging the gap between cutting-edge machine learning research and rock-solid production engineering, we want you on our team.

Why this role matters:
We are preparing for a paradigm shift in compute. You will be instrumental in designing the infrastructure that powers the autonomous agents and generative systems of tomorrow.

Responsibilities

  • Design and implement high-availability, distributed training clusters capable of handling massive-scale LLM workloads.
  • Optimize inference pipelines for real-time, edge-computing applications and low-latency deployments.
  • Collaborate closely with ML researchers to integrate novel neural architectures into production environments.
  • Implement robust observability, logging, and monitoring solutions (Prometheus, Grafana, Datadog) to ensure system health.
  • Ensure data sovereignty, security compliance (SOC2, GDPR), and disaster recovery protocols for sensitive AI workloads.
  • Automate infrastructure provisioning and management using Infrastructure as Code (Terraform, Ansible).

Qualifications

  • 5+ years of experience in systems engineering or software development with a specific focus on AI/ML infrastructure.
  • Deep proficiency in Python, C++, or Rust with a strong understanding of memory management.
  • Strong expertise in containerization (Docker, Kubernetes) and cloud platforms (AWS, GCP, or Azure).
  • Experience with model serving frameworks (Triton Inference Server, TorchServe) and hardware acceleration (NVIDIA GPUs, TPUs).
  • BS or MS in Computer Science, Electrical Engineering, or a related technical field.
  • Proven track record of working in fast-paced, high-growth startup environments.

Required Skills

Python Kubernetes AWS Machine Learning Docker PyTorch C++ Distributed Systems Linux Terraform

Ready to Take This Challenge?

Make sure your resume is ready. Submit your application now before the deadline.

Apply Now

Related Jobs

Similar job recommendations for you

View All