Job Description
Shape the Future of Intelligence
Are you ready to architect the systems that will define the year 2026 and beyond? Nexus Horizon Labs is on the hunt for a visionary Senior AI Infrastructure Architect to lead our next-generation AI deployment strategy. We are not just building software; we are engineering the backbone of tomorrow's autonomous systems and generative AI ecosystems. If you possess an obsession with scalability, performance, and cutting-edge technology, we want to meet you.
As a key member of our elite engineering team, you will bridge the gap between theoretical AI models and robust, production-grade infrastructure. You will ensure our platforms are resilient, secure, and ready to handle the exponential growth of data in the coming decade.
Responsibilities
- Architect & Deploy: Design and implement high-availability, distributed AI infrastructure using cloud-native technologies (Kubernetes, Docker, AWS/GCP).
- Future-Proofing: Lead the strategic roadmap for infrastructure upgrades to support 2026+ computational demands and generative model scaling.
- Performance Optimization: Continuously monitor, tune, and optimize system performance to ensure low latency and high throughput for AI inference.
- Collaboration: Partner with data scientists and ML engineers to translate model requirements into scalable engineering solutions.
- Security & Compliance: Enforce rigorous security protocols and data governance standards across all AI workloads.
- Team Leadership: Mentor junior architects and engineers, fostering a culture of innovation and technical excellence.
Qualifications
- Education: Masterβs degree in Computer Science, Engineering, or a related field; PhD preferred.
- Experience: 7+ years of experience in software engineering, with at least 3 years specifically in AI infrastructure or high-scale distributed systems.
- Technical Stack: Deep expertise in Python, Go, or Java; proficiency in containerization (Docker/K8s) and cloud platforms.
- AI Knowledge: Strong understanding of machine learning frameworks (TensorFlow, PyTorch) and model deployment strategies.
- Problem Solving: Exceptional ability to troubleshoot complex system failures and architect for failure resilience.
- Communication: Excellent verbal and written communication skills for cross-functional stakeholder management.