Position Summary
We are seeking a highly skilled and dedicated Level 3 Technical Support Specialist to join our team. The ideal candidate will have extensive experience in troubleshooting and maintaining complex IT infrastructures. This position requires expertise in a wide range of technologies, including Ubuntu & KVM, Windows Server 2019, MS SQL Server, Kubernetes, Containers (Production & Staging), Openstack, CEPH Storage, Huawei Server, Huawei Switches, Huawei Firewall, Nvidia CUDA & CUDA Libraries, Kubeflow, DOCKER, mellanox switch, Nvidia spectrum switches, and Nvidia A100 Server. The candidate must be adept at quickly identifying and resolving technical issues to ensure optimal system performance and user satisfaction.
Responsibilities
· Provide Level 3 technical support and troubleshooting for platforms including Ubuntu & KVM, Windows Server 2019, and MS SQL Server.
· Manage and maintain Kubernetes clusters and container environments in both production and staging settings. This includes scaling, monitoring, and securing containerized applications to ensure high availability and performance.
· Oversee the deployment, configuration, and maintenance of Openstack environments and CEPH Storage solutions to support scalable and resilient cloud infrastructure.
· Maintain and troubleshoot Huawei Servers, Switches, and Firewalls, ensuring secure and efficient network operations. Implement best practices for network security and performance optimization.
· Implement and optimize Nvidia CUDA & CUDA Libraries, Kubeflow, and DOCKER environments for high-performance computing tasks. This involves developing and tuning machine learning models and applications for maximum efficiency.
· Configure and manage advanced networking equipment, including mellanox and Nvidia spectrum switches, as well as Nvidia A100 Servers. Ensure seamless integration and operation within the broader IT infrastructure.
· Collaborate with cross-functional teams, including developers, network engineers, and system administrators, to identify and resolve complex technical issues. Provide mentorship and guidance to junior support staff.
· Conduct root cause analysis for recurring issues and implement long-term solutions. Document processes, procedures, and troubleshooting steps to enhance the knowledge base.
· Participate in on-call rotations to provide 24/7 support for critical systems. Ensure timely resolution of incidents and minimal downtime for business operations.
· Stay current with emerging technologies and industry trends. Continuously improve technical skills through training, certifications, and hands-on experience.
Qualifications
· Bachelor of Science in Computer Science or a related field.
· Minimum of 5 years of experience in technical support and troubleshooting, with a focus on enterprise-grade IT environments.
· Strong knowledge of operating systems (Ubuntu, Windows Server) and virtualization technologies (KVM, Docker).
· Experience with database management (MS SQL Server), including performance tuning, backup/restore operations, and disaster recovery planning.
· Proficiency in cloud platforms (Openstack, CEPH Storage) and their integration with on-premises infrastructure.
· Expertise in networking equipment (Huawei Servers, Switches, Firewalls; mellanox switches, Nvidia spectrum switches), including setup, configuration, and troubleshooting.
· Experience with high-performance computing (Nvidia CUDA & CUDA Libraries, Kubeflow, Nvidia A100 Servers), including the deployment and optimization of GPU-accelerated applications.
· Strong understanding of orchestration tools (Kubernetes) and their use in managing microservices architectures.
· Excellent problem-solving and communication skills, with the ability to convey complex technical concepts to non-technical stakeholders.
· Ability to work independently and as part of a team, managing multiple priorities and projects simultaneously.
Certifications (Preferred)
· Certified Kubernetes Administrator (CKA)
· Microsoft Certified: Azure Solutions Architect Expert
· Huawei Certified ICT Professional (HCIP)
· Docker Certified Associate (DCA)
Professional Affiliations
· Member of the Linux Foundation
· Member of the Cloud Native Computing Foundation (CNCF)