GBB is looking for a HPC Senior System Integrator/System Administrator who is highly motivated, creative and innovative and has a strong passion for cluster administration, system integration and validation of HPC clusters . In this role you will have an opportunity to influence the overall integration, delivery and management of the largely Open Source HPC Products/Solutions.
These Systems/Servers incorporate Intel and AMD processors, NVIDIA GPUs, Phi coprocessors, storage,InfiniBand, graphics and Linux software. Knowledge of parallel processing (problem decomposition and work distribution), parallel programming (e.g., use of MPI, OpenMP) and computer architecture is a must.
Bachelor’s degree in computer science, Computer Engineering, Computational Science, equivalent mathematical sciences or related field with a minimum of
3+ years of relevant work experience; or equivalent combination of education, training and experience
3+ years of experience with software development in Linux
3+ years of experience with HPC clusters and systems integration
Job responsibilities include:
Installation, configuration, fine-tuning, and troubleshooting multi-vendor multi-site Linux HPC servers.
Building and deploying open source software and software from vendors/partners.
Diagnosing and resolving system operational problems quickly and effectively.
Verifying full operation of systems including network, Systems and storage performance .
Configuration of the scheduling and queuing system.
Assist tech support with technical questions and problems encountered by customers.
Coordinating with vendors to resolve hardware and software problems.
Documenting system administration procedures for routine and complex tasks on wikis.
Maintaining and monitoring the security of the HPC systems and servers.
Ability to establish and maintain effective working relationships with coworkers, managers and clients.
Availability to travel for limited number of on-site cluster system installations or maintenance.
Desired Skills and Experience:
Knowledge and experience with building, configuration and administration of Linux (especially Ubuntu/CENT OS/RHEL)
Expert knowledge of related parallel distributed file system like Lustre/IBM GPFS
Excellent knowledge of networking and cluster-based distributed computing.
Ability to deploy open-source and commercial HPC Platforms.
Strong scripting skills (Bash, Perl or Python ) and an ability to program is an advantage.
Work on diagnostic and debugging of complex HPC HW and SW issues, proposal of Workarounds when applicable
Effective trouble shooting/problem solving with capabilities on root cause analysis.
Specific areas of desired expertise include:
Cluster solutions integration and administration
Linux(CENT/UBUNTU,RHEL) operating systems and OS components for HPC clusters
Cluster provisioning, systems management, resource management middleware
Cluster interconnect fabrics and software stack
HPC Cluster storage solutions
Parallel programming models for HPC clusters
Fortran and C programming
High Performance Sequential Programming, Multicore/Multi CPU programming C, C++ GP/GPU Architectures and Programming Multi node Cluster Network
Hands-on experience on SMD Codes, HPF,MPI CC, Limpak, LA Pack, Cuda Codes, Open MPI,Schedulers & Ganglia