Platform & HPC Data Engineer
MQ Prime is a Virginia based small business who are experts in the world of Cyber Engineering and Software Development. Built on decades of expertise, we support Commercial and Government clients, providing development, design and implementation of cutting-edge solutions. Our personnel maintain support for cyber solutions throughout all of the Government and continue to develop capabilities to fill operational gaps.
MQ Prime offers a salary and benefits package that surpasses industry standards while also providing a varied and expanding portfolio of programs at multiple classification levels to enable employee growth. We want you to grow as we do. Come join us!
We are looking for a skilled Platform and HPC Data Engineer to support the design, implementation, and optimization of data management solutions in high-performance computing (HPC) environments. The ideal candidate will have extensive experience working with various file systems, data labeling/tagging systems, and the configuration of a wide range of storage appliances. This role involves ensuring that data workflows, storage configurations, and metadata management are efficient, scalable, and aligned with organizational and government security requirements.
Principal Responsibilities:
-
Platform and HPC Data Engineering: Design and implement data management systems and architectures for HPC platforms, focusing on optimizing data flow, storage, and access in large-scale computing environments
-
File System Management: Oversee the configuration, maintenance, and optimization of distributed file systems (e.g., Lustre, IBM Spectrum Scale, NFS, GPFS) and storage solutions used in HPC environments to ensure efficient performance, scalability, and reliability
-
Data Labeling and Tagging: Implement and manage metadata-driven systems for data labeling/tagging. This includes the development of strategies for classifying, indexing, and organizing datasets to enhance data discoverability, access control, and auditing
-
Storage Appliance Configuration: Configure and maintain various storage appliances (e.g., NetApp, Dell EMC, HPE) and integrated storage solutions. Ensure that storage devices are optimized for performance, capacity, and availability within the HPC ecosystem
-
Data Integration and Workflow Optimization: Integrate data storage and management systems with HPC clusters, ensuring seamless data flow between compute nodes and storage appliances. Optimize data pipelines to support high-throughput workloads and minimize bottlenecks in I/O performance
-
Performance Tuning: Monitor and improve the performance of storage systems, focusing on I/O throughput, latency, and efficient resource allocation. Use performance metrics to guide optimizations across storage appliances and file systems
-
Security and Compliance: Implement security best practices for data access, protection, and management, ensuring compliance with government regulations and internal data governance policies. Configure encryption, access control, and secure data sharing methods
-
Automation and Scripting: Develop and maintain automation scripts (e.g., using Python, Bash, or Perl) to streamline storage configurations, data labeling/tagging, and system monitoring tasks. Automate processes related to data integration and HPC platform management
-
Collaboration and Support: Work closely with data scientists, HPC administrators, software developers, and other technical staff to support ongoing projects. Provide expertise in troubleshooting data storage issues and ensuring optimal system performance
-
Documentation and Reporting: Maintain thorough documentation for storage configurations, file system setups, data labeling/tagging procedures, and performance optimization strategies. Provide regular reports on system health, data management processes, and any improvements made
Minimum Requirements:
-
Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field
-
7+ years of experience in managing data infrastructure in HPC environments, with expertise in file systems, storage appliances, and data workflows
-
Hands-on experience with distributed file systems, including Lustre, IBM Spectrum Scale (GPFS), NFS, and others commonly used in HPC settings
-
Proven experience with storage appliance configuration (e.g., NetApp, Dell EMC, HPE, or similar systems), including performance tuning, capacity management, and reliability
-
Strong experience in implementing data labeling/tagging systems, metadata management, and structuring large datasets for efficient access and compliance
-
Knowledge of high-performance networking protocols (e.g., InfiniBand, RDMA) and their role in data transfer and storage optimization
-
Familiarity with data access protocols like GridFTP, rsync, and NFS for large-scale data transfer
-
Willing to work on-site in Herndon, VA
-
TS/SCI clearance with CI Poly
Preferred qualifications:
-
Master’s degree in Computer Science, Information Technology, Engineering, or a related field
-
Experience with cloud storage integration or hybrid cloud environments, with knowledge of cloud-native storage solutions (e.g., AWS S3, Ceph, OpenShift)
-
Familiarity with high-performance computing (HPC) schedulers (e.g., SLURM, PBS, Torque) and their interaction with data storage systems
-
Understanding of data protection mechanisms, including data replication, backup strategies, and disaster recovery in HPC environments
-
Experience with containerization (Docker, Singularity) in an HPC context for data processing and application deployment
-
Experience with machine learning or data science workflows in HPC environments