Apply by: 02.10.2023
The mission of the NOC at Getty Images is to provide 24x7 support of our production systems, minimizing customer impact by identifying, resolving, and (when vital) escalating issues as quickly as possible. This support is not limited to technical restoration; we also advise, coordinate resources, and provide leadership via strict adherence to our Incident Management process.
When not actively responding to alerts or running incidents, we work on projects that improve the team's responsiveness and sharpen our technical skills.
Who You Are:
You enjoy working with a diverse set of technologies and take pride in attention to detail. You are looking to develop new skills and strengthen existing ones as you prepare for the next steps in your career.
Your Next Challenge:
- Work with a wide range of tools while taking initiative on an incident bridge.
- Respond quickly to new alerts; triaging, resolving, or escalating as needed.
- Interact with and build relationships with the entirety of Getty Images; examples include:
- Working with the Sales Ops team to dig into a website issue.
- Advising the App Dev team while performing a code rollback.
- Advising the DBAs that you identified and resolved SQL blocking on a critical system.
- Reporting to Security that you disabled an account after investigating suspicious activity.
- Coordinating with Facilities, Network, Storage, and local Service Desk teams to shut down and power up remote office equipment during planned or unexpected power outages.
- Assisting with core system patching
- Contribute to Getty's success by shortening the duration of customer-impacting incidents or preventing impact entirely by resolving issues. You will also play a pivotal role in incident triage, helping our engineers focus on project work during the day and getting uninterrupted sleep at night by only escalating issues that need immediate attention.
- Thoroughly learn how our systems are interconnected; allowing you to quickly craft a notification that details the incident's full scope and impact.
What You’ll Need:
- Prior Operations Support experience, preferably in an e-commerce environment, and a shown understanding of Cloud services and tools, specifically AWS.
- Strong written communication skills with the ability to quickly craft incident notifications for technical and non-technical audiences.
- Ability to maintain detailed ticket work notes and contribute to team knowledge base with clear yet concise troubleshooting articles.
- Experience handling stressful and time-sensitive situations with the ability to look at an alarm console full of critical alerts and calmly triage.
- Ability to resolve alerts using documented procedures when available and employ sharp problem-solving skills to troubleshoot and resolve undocumented alerts, only escalating to the next tier after exhausting all other options.
- Ability to lead high-priority incident bridges. Previous experience with major incidents (participant or an observer) is a plus.
Nice to Have:
- AWS Certification (Cloud Practitioner or greater).
- High-level knowledge of incident management best practices.
- A clear understanding of databases and how to provide first-level support, specifically Microsoft SQL Server.
- Knowledge of IT Security and best practices, CompTIA Security+.
- Experience using Splunk to write complex searches and create detailed dashboards.