Job Description
Are you ready to architect the backbone of tomorrow's Artificial General Intelligence? Nexus Horizon Labs is looking for a visionary Senior AI Infrastructure Engineer to lead our high-performance computing initiatives. As we prepare for the massive scalability demands of 2026 and beyond, you will be at the forefront of building resilient, secure, and lightning-fast AI ecosystems.
We are not just building software; we are engineering the future. Join a team of elite engineers dedicated to pushing the boundaries of what is possible in deep learning, quantum-ready architectures, and next-gen cloud integration.
Responsibilities
- Architect Scalable Infrastructure: Design and deploy high-availability GPU clusters optimized for massive model training and inference workloads.
- Optimize Training Pipelines: Implement advanced data pipelines and distributed training strategies to reduce latency and maximize compute efficiency.
- Cloud & Hybrid Strategy: Lead the migration and management of complex cloud environments (AWS/Azure/GCP) with a focus on cost optimization and security compliance.
- System Reliability: Build automated monitoring and alerting systems to ensure 99.99% uptime for critical AI services.
- Future-Proofing: Research and prototype technologies relevant to the 2026 tech landscape, including edge computing and federated learning.
- Team Mentorship: Guide junior engineers and conduct code reviews to maintain the highest standards of engineering excellence.
Qualifications
- Experience: 5+ years of experience in systems engineering, backend development, or infrastructure architecture.
- Programming: Proficiency in Python, C++, and shell scripting with a deep understanding of low-level system optimization.
- Cloud Expertise: Strong hands-on experience with Kubernetes, Docker, and major cloud providers.
- AI Stack: Familiarity with PyTorch, TensorFlow, and MLOps tools (MLflow, Kubeflow).
- Problem Solving: Proven ability to troubleshoot complex, multi-node distributed systems issues under pressure.
- Communication: Excellent written and verbal communication skills for technical documentation and stakeholder presentations.