Working at ICF means applying a passion for meaningful work with intellectual rigor to help solve the leading issues of our day. Smart, compassionate, innovative, committed, ICF employees tackle unprecedented challenges to benefit people, businesses, and governments around the globe. We believe in collaboration, mutual respect, open communication, and opportunity for growth. If you’re seeking to make a difference in the world, visit www.icf.com/careers to find your next career. ICF—together for tomorrow.
The Site Reliability Engineer supports and is directly accountable for managing specific digital solutions for our clients. For these solutions, this person will be responsible for managing the health, uptime, and key client stakeholder satisfaction in order to deliver exceptional results within the organization.
Who you are:
A problem solver.You like solving puzzles, whether they are technology related, or process related and sometimes even people related.
Action-Oriented. You get stuff done. You don’t wait around. You don’t rest till things are resolved.
Continuous Improvement Obsessed. You know when to operate and when to automate. You believe that it can always be made better, faster, and stronger!
A-Self Starter.Quick to learn & adapt, you are intrinsically motivated.
What you get to do:
Incident & Problem Management: Troubleshooting & investigating. Assisting with root cause analysis and remediation via review of logs & code logic.
Develop & Modify: Assist in Infrastructure as Code (IaC) development efforts. Create and modify system management tools for monitoring/alerting and manage environments across cloud providers (AWS, Azure, Rackspace) and on-premises.
Investigate: Acting as a point of escalation, leading investigations, true impact assessment, concluding root cause analysis and remediation.
Application Deployment: deploy applications and server updates to various environments and troubleshoot issues from applications and server logs. Prepare documentation for deployment purposes and respond to any implementation issues as needed.
Solve Problems: Perform and review health checks, identify and escalate trends. Act on all application level alerts and assist with service restoration.
Manage Ecosystems and SLA’s: Understanding dependencies within a client ecosystem and service level expectations.
Monitor: Monitor & ensure action is taken on all application level alerts. Manage multiple cases involving a variety of technologies, protocols, and equipment.
System Management: Performance, Capacity management, licensing, patching and working to maintain these within acceptable benchmarks for specific clients/assets for applications installed with a client ecosystem
Automation: daily operations & staying current with technology advancements.
Collaborate: work closely with software development and project management teams to support application deployments, change requests and post launch support activities
What you’ll need to be successful:
2+ years of experience in the Information Technology sector
Experience being part of a technical application support team
1+ years using Linux, certification (LPIC-2, LFCE, RHCE) preferred
Experience in coding or scripting with either Java, .Net, Ruby or Python is required
Experience with Continuous Integration/Deployment tools such as Jenkins, Travis, Bamboo preferred
Chef, Salt, or Ansible experience required, Chef preferred
Understanding and hands on experience with major cloud platforms such as AWS and or Azure is required
AWS Certifications (SysOps, Solutions Architect, or Developer) or Microsoft Certifications (70-532, -533, -534) required
Capability to identify, analyze, and drive problems to resolution, handle complex issues simultaneously while effectively communicating across teams and external clients
Demonstrated ability to understand and deliver detailed technical documentation
Previous experience with Adobe Experience Manager (AEM), SiteCore, or Hybris experience considered an asset