Site Reliability Engineer
MQ Prime is a Virginia based small business who are experts in the world of Cyber Engineering and Software Development. Built on decades of expertise, we support Commercial and Government clients, providing development, design and implementation of cutting-edge solutions. Our personnel maintain support for cyber solutions throughout all of the Government and continue to develop capabilities to fill operational gaps.
MQ Prime offers a salary and benefits package that surpasses industry standards while also providing a varied and expanding portfolio of programs at multiple classification levels to enable employee growth. We want you to grow as we do. Come join us!
We are seeking a Site Reliability Engineer (RSE). In our OpenShift PaaS organization, you will be responsible for ensuring the availability, performance, and scalability of our OpenShift environments. You will collaborate with development, operations, and product teams to automate processes, build robust monitoring systems, and enhance the overall reliability of our platforms.
Principal Responsibilities:
-
Design, implement, and maintain highly available OpenShift clusters to support mission-critical applications
-
Develop and maintain automation scripts and tools to streamline deployment, scaling, and recovery processes using tools like Ansible, Terraform, and Helm
-
Build and enhance monitoring and alerting systems (e.g., Prometheus, Grafana, ELK)
-
Respond to and resolve incidents, conducting post-mortem analyses to identify root causes
-
Analyze and optimize system performance, ensuring minimal latency and maximum throughput
-
Work closely with development teams to implement DevOps best practices, CI/CD pipelines, and platform enhancements
-
Ensure platforms meet security and compliance requirements by integrating tools for vulnerability scanning, policy enforcement, and logging
Minimum Requirements:
-
Bachelor’s degree in Computer Science, Engineering
-
Minimum 5+ years of experience as an SRE, DevOps Engineer, or related role
-
Expertise in OpenShift or Kubernetes platform administration
-
Strong knowledge of Linux systems, networking, and containerization technologies (Docker)
-
Proficiency in scripting languages such as Python, Bash, or Go
-
Experience with CI/CD pipelines (e.g., Jenkins, GitLab CI/CD)
-
Familiarity with monitoring and logging tools like Prometheus, Grafana, ELK, or Splunk.
-
Willing to work on-site in Herndon, VA
-
TS/SCI clearance with CI Poly
Preferred qualifications:
-
OpenShift certification (e.g., Red Hat Certified Specialist in OpenShift Administration)
-
Experience with cloud platforms (AWS, Azure, or GCP)
-
Knowledge of service mesh technologies (Istio, Linkerd)
-
Strong understanding of microservices and distributed systems architecture