Job Description
Are you ready to architect the foundation of the next technological era? Nexus Future Systems is seeking a visionary Senior AI Infrastructure Engineer to build the resilient backbone required for the rapid evolution of Artificial General Intelligence by 2026.
We are not just maintaining legacy systems; we are constructing the distributed ecosystems that will train, deploy, and scale trillion-parameter models globally. If you are passionate about bridging the gap between cutting-edge machine learning research and rock-solid production engineering, we want you on our team.
Why this role matters:
We are preparing for a paradigm shift in compute. You will be instrumental in designing the infrastructure that powers the autonomous agents and generative systems of tomorrow.
Responsibilities
- Design and implement high-availability, distributed training clusters capable of handling massive-scale LLM workloads.
- Optimize inference pipelines for real-time, edge-computing applications and low-latency deployments.
- Collaborate closely with ML researchers to integrate novel neural architectures into production environments.
- Implement robust observability, logging, and monitoring solutions (Prometheus, Grafana, Datadog) to ensure system health.
- Ensure data sovereignty, security compliance (SOC2, GDPR), and disaster recovery protocols for sensitive AI workloads.
- Automate infrastructure provisioning and management using Infrastructure as Code (Terraform, Ansible).
Qualifications
- 5+ years of experience in systems engineering or software development with a specific focus on AI/ML infrastructure.
- Deep proficiency in Python, C++, or Rust with a strong understanding of memory management.
- Strong expertise in containerization (Docker, Kubernetes) and cloud platforms (AWS, GCP, or Azure).
- Experience with model serving frameworks (Triton Inference Server, TorchServe) and hardware acceleration (NVIDIA GPUs, TPUs).
- BS or MS in Computer Science, Electrical Engineering, or a related technical field.
- Proven track record of working in fast-paced, high-growth startup environments.