Join Nvidia as Devops Engineer
Are you passionate about cutting-edge technology and cloud infrastructure? NVIDIA, a leader in AI, gaming, and GPU innovations, is on the hunt for talented DevOps Engineers to join their team in Bengaluru, India. This is your opportunity to contribute to revolutionary advancements in computing technology.
Table of Contents
About the Role: Site Reliability Engineer – GPU Cloud
NVIDIA’s DevOps team plays a critical role in ensuring the smooth operation and scalability of their GPU cloud platform, which supports a wide range of AI, machine learning, and graphics workloads. In this role, you’ll be at the forefront of creating robust, reliable systems that power some of the most advanced computing environments in the world.
What you’ll be doing
The NVIDIA GPU cloud is a hosted platform for internal R&D teams and external AI/ML stack customers. This SRE team is accountable for the setup, management, reliability and availability of this infrastructure spanning 1000s of GPU nodes.
As an SRE, you are responsible for:
- Providing scalable and robust service oriented infrastructure automation, monitoring and analytics solutions for NVIDIA’s on-prem and cloud based GPU infrastructure.
- You will own the whole life cycle of new tools and services – from requirements gathering, to design documentation, validation and deployment.
- Provide customer support on a rotation basis.
What we need to see
- Minimum of 3 years Experience in automating and handling large-scale distributed system software deployments in on-prem/cloud environments.
- Proficiency in any language – Go/Python/Perl/C++/Java/C.
- Strong command on terraform, Kubernetes and cloud infra administration.
- Excellent debugging and troubleshooting skills.
- Excellent interpersonal, and written communication skills.
- B.E in Computer Science or a related technical field involving coding (e.g., physics or mathematics)
Ways to stand out from the crowd
- Ability to decompose complex requirements into simple tasks and reuse available solutions to implement most of those.
- Unit testing and benchmarking are an integral part of your code.
- Ability to reason and choose the best possible algorithm to meet scaling and availability challenges.
How to Apply
Excited to become part of NVIDIA’s journey? Don’t wait! Apply for the Site Reliability Engineer – GPU Cloud position today by visiting their official career site:
👉 Apply Now
For more jobs visit careerflock official page. Join Nvidia as Devops Engineer, Join Nvidia as Devops Engineer, Join Nvidia as Devops Engineer, Join Nvidia as Devops Engineer, Join Nvidia as Devops Engineer