I. JOB OVERVIEW
Job Description Summary: |
GW Information Technology (GW IT) provides empowering tools and caring support for all members of The George Washington University (GW) community. We are focused on driving digital transformation and innovation to enable the academic and operational excellence of our students, faculty, staff, and researchers. At GW IT, we are committed to cultivating a team culture that values diversity, inclusion, respect, and collaboration, and invests in each of our team members to grow in their technology and career skills.
Research Technology Services (
RTS) is a team of Research Computing and Data (
RCD) professionals within GW IT that provide and support various cyberinfrastructure (CI) systems and services in support of GW's research mission. The
RTS service portfolio includes high-performance computing clusters, public cloud infrastructure, purpose-built computational, data management, storage platforms, application and workflow support across a wide array of research disciplines. All
RTS members have blend of "system facing" and "researcher-facing" duties, with diverse responsibilities across Applications, System Administration, User Support, High-performance Computing, Research Data Management, Networking, Cloud Computing, and Cybersecurity.
As a Lead High Performance Computing (
HPC) Engineer, you will be responsible for designing, implementing, and maintaining high-performance computing systems to meet the computational needs of the
RTS. In collaboration with other high-performance computing (
HPC) engineers, this senior position is accountable for the operations of multiple
HPC systems and contributes to the strategic planning for next-generation services aligned with High Performance Computing services. As part of an advanced team of engineers, this role works closely with the GW research community to define and deliver
HPC, related advanced compute and storage infrastructure to support the rapidly evolving research needs. The lead
HPC Engineer also engages directly on research projects to understand and consult on the best options available as well as serving as the highest tier of support escalation for operational issues. This position develops and conducts advanced training and mentors other engineers on the team to enhance the interdisciplinary capabilities of the research technology support organization.
While the Lead
HPC Engineer position's focus is more on the
HPC service than other
RTS services, the Lead
HPC Engineer is encouraged to gain advanced knowledge and specialization in a broad range the listed domains. As a leader in these areas within the
RTS, this position should actively contribute to the support and adoption of new technologies within the Research Technology Services team.
Responsibilities Include:
- Follow industry standards to plan and execute system-wide changes, meet with other IT stakeholders, and interact with other teams within GWIT to ensure the HPC environment is operating optimally.
- Proactively monitor and gather statistics on the HPC infrastructure to identify problems and system issues and work with RTS and GWIT engineers to resolve any bottlenecks in the systems.
- Keep up with new research areas and new technologies to stay ahead of the needs of researchers.
- Psc updated the formatting/order here; the above duties were listed before the "While the Lead HPC..." paragraph.
Research Computing and Data (
RDC)
- Categorize and match research computing demands with appropriate platforms, e.g., cloud, HPC, & HTC to aid researchers and stakeholders in planning to meet their research objectives.
- Assist researchers achieve compliance in storing and handling restricted or regulated data.
- Create and update knowledge base articles, FAQs, and support documentation.
High-Performance Computing (
HPC) and Big Data
- Implement and maintain job schedulers, resource managers, and data transfer solutions for efficient HPC and big data operation.
- Design, build, and manage scalable HPC clusters and storage solutions to support a wide range of research and computational workloads.
- Lead the integration of HPC systems with big data platform as needed (e.g., Hadoop, Spark) to process and analyze large datasets.
- Collaborate with research scientists to optimize software implementations and workflows for HPC environments, enhancing performance and scalability.
- Participate in outreach events and workshops to educate and update users about HPC developments, tools, and resources.
- Support team members in ongoing HPC enhancements, maintenance and upgrades.
- Working with others on complex R&D projects involving teams of scientific researchers, hackers, and developers.
Cloud Computing
- Architect, deploy, and manage cloud-based HPC solutions (AWS, Azure, Google Cloud) for scalable, on-demand research computing resources.
- Lead efforts to migrate traditional HPC workloads to cloud environments while maintaining performance and cost-effectiveness.
- Ensure seamless integration between on-premise HPC systems and cloud infrastructure, enabling hybrid computing models.
- Ensure monitoring of the cloud infrastructure and services, respond to alerts, and take appropriate actions to maintain system performance and availability.
Networking
- Support the Architecture and maintenance of high-speed networking solutions (InfiniBand, Ethernet) for low-latency, high-bandwidth data transfers in H environments.
- Collaborate with networking teams to ensure secure, robust, and scalable data transfer between HPC clusters, storage systems, and research facilities.
- Implement and monitor security best practices for data integrity, confidentiality, and regulatory compliance in HPC and cloud computing systems.
- Troubleshooting of network connectivity, routing, and switching issues.
AI & Machine Learning Integration
- Understanding of ML/AI products and technologies.
- Work closely with data scientists and researchers to integrate AI/ML workloads with HPC and cloud infrastructures.
- Optimize AI model training and deployment using GPU-accelerated computing, distributed training frameworks, and HPC architectures.
- Lead AI/ML-related projects that require high-performance computing resources for tasks such as model training, inference, and data analysis.
Research Computing & Collaboration
- Collaborate with researchers, faculty, and technical teams to understand scientific workflows and compute requirement
- Provide technical leadership, mentorship, and training to junior engineers and researchers in HPC and cloud computing best practices.
- Engage in strategic planning to enhance research computing capabilities, including capacity planning, infrastructure upgrades, and the adoption of emerging technologies.
Applications
- Collaborate with users to ensure applications run efficiently, meeting BO performance requirements and user expectations.
- Provide training to users, as needed.
- Install and deploy applications as needed by users.
- Troubleshoot and resolve application-related issues.
- Support planning and execution of application upgrades and deployments.
Data Management Plans
- Develop and implement data management plans that comply with NIST/CMMC standards, ensuring proper data handling, storage, and retention.
- Educate researchers and stakeholders on data management best practices and policies.
- Audit current practices and deployments to ensure they are consistent with the Industry Standards.
Storage Systems
- Manage storage devices to store and retrieve large volumes of data for various research computing applications and platforms.
- Optimize storage systems for performance, scalability, and disaster recovery.
- Research and propose new storage systems and methodologies for use with HPC and other RTS systems.
Cybersecurity and Identity
- Work with the security team to conduct regular security assessments and reviews of the cyberinfrastructure to identify vulnerabilities and risks. Recommend and implement security postures and protocols to mitigate potential threats and breaches.
- Work with GW Information Security team to support infosec related activities.
- Facilitate consistent identity and group management throughout research cyberinfrastructure with Enterprise Active Directory.
Performs other related duties as assigned. The omission of specific duties does not preclude the supervisor from assigning duties that are logically related to the position.
While the position is designated at the GW Ashburn campus,
RTS team members may have the option of choosing either Ashburn or Foggy Bottom as their primary location. Team members regularly are expected to travel between the campuses, regardless of their primary location. |
Minimum Qualifications: |
Qualified candidates will hold a Bachelor's degree in an appropriate area of specialization plus 5 years of relevant professional experience, or, a Master's degree or higher in a relevant area of study plus 3 years of relevant professional experience. Degree must be conferred by the start date of the position. Degree requirements may be substituted with an equivalent combination of education, training and experience. |
Additional Required Licenses/Certifications/Posting Specific Minimum Qualifications: |
|
Preferred Qualifications: |
- Experience in a large-scale production high performance computing environment.
- Familiarity with a variety of the HPC subject area concepts and practices in the context of academic research, to include basic understanding of sponsored research compliance requirements.
- Strong expertise in HPC technologies, including parallel computing architectures, job scheduling systems (Slurm), and interconnect technologies (e.g., InfiniBand, Ethernet.
- Proficiency in programming languages commonly used in scientific computing.
- Experience with HPC storage systems, file systems (e.g., Lustre, GPFS), a data management strategies.
- Excellent leadership and communication skills, with the ability to effectively collaborate with stakeholders across the organization.
- Knowledge of security best practices and experience. implementing security controls in HPC environments.
- Excellent oral and written communication skills; ability to prepare and present comprehensive presentations to IT and business executives.
- Demonstrated experience working in an environment with rapidly changing job priorities.
- Strong analytical and troubleshooting skills.
- Ability to creatively improve workflows and processes.
- Experience scripting in Perl, Python, or Bash.
- Experience with Linux kernel modules, preferably for Lustre, NVIDIA GPUs, and Mellanox InfiniBand card.
- Familiarity with the Simple Linux Utility for Resource Management (Slurm) workload manager, or other job schedulers, including the setup and maintenance of a multi-factor fair-share priority scheme.
- Familiarization with virtualization environments for front-end and maintenance image management.
- Familiarity with ticket tracking systems and service level management.
|
Hiring Range |
$92,790.58 - $150,696.60 |
GW Staff Approach to Pay |
How is pay for new employees determined at GW? |
Healthcare Benefits
GW offers a comprehensive benefit package that includes medical, dental, vision, life & disability insurance, time off & leave, retirement savings, tuition, well-being and various voluntary benefits. For program details and eligibility, please visit https://hr.gwu.edu/benefits-programs.
II. JOB DETAILS
Campus Location: |
Ashburn, Virginia |
College/School/Department: |
GW IT |
Family |
Information Technology |
Sub-Family |
High Performance Computing |
Stream |
Individual Contributor |
Level |
Level 3 |
Full-Time/Part-Time: |
Full-Time |
Hours Per Week: |
40 + |
Work Schedule: |
Monday - Friday, 8 am - 5 pm |
Will this job require the employee to work on site? |
Yes |
Employee Onsite Status |
Hybrid |
Telework: |
Yes |
Required Background Check: |
Criminal History Screening, Education/Degree/Certifications Verification, Social Security Number Trace, and Sex Offender Registry Search |
Special Instructions to Applicants: |
Employer will not sponsor for employment Visa status |
Internal Applicants Only? |
No |
Posting Number: |
S013625 |
Job Open Date: |
02/24/2025 |
Job Close Date: |
|
If temporary, grant funded, Sponsored Project funded or limited term appointment, position funded until: |
|
Background Screening |
Successful Completion of a Background Screening will be required as a condition of hire. |
EEO Statement: |
The university is an Equal Employment Opportunity/Affirmative Action employer that does not unlawfully discriminate in any of its programs or activities on the basis of race, color, religion, sex, national origin, age, disability, veteran status, sexual orientation, gender identity or expression, or on any other basis prohibited by applicable law. |
|