Engineering Manager, SRE

| Greater LA Area

ringDNA is seeking an Engineering Manager to help build the Site Reliability Engineering (SRE) team.

The SRE team works 24/7 hours a day to keep ringDNA and our customers protected. As manager of the SRE team, you are responsible for detecting and resolving production incidents within minutes. This objective is met by monitoring the company’s core services, reacting to problems, and proactively addressing issues before they impact performance, security, or availability. You must be a hands-on leader who can build out the company’s SRE capabilities and team as needed.

What You'll Do

You will use your balance of technical expertise, leadership skills, and managerial experience to build SRE core capabilities for ringDNA and eventually supervise the day-to-day responsibilities of front-line Site Reliability Engineers.

You will set technical direction on incident bridges and marshal resources accordingly. You will ensure that investigations are following appropriate troubleshooting paths, and that monitoring, triage and change execution processes are optimal.

You will drive continuous improvement while streamlining how we run our operations. This will involve building and maintaining strong relationships with connected areas of the business, ensuring the SRE team are vital stakeholders within any process and procedural enhancements.

The leader in this role must demonstrate a strong focus on engineering and infrastructure operations practices, service ownership, agile leadership and people management skills.

Your day-to-day responsibilities include:

  • Keep all user-facing services and ringDNA production systems running smoothly 24/7/365
  • Act in key support roles during major incidents
  • Lead RCAs and partner with the Engineering and Product Management teams to permanently fix issues
  • Drive the team to be proactive in diagnostics, detection and configuration of applications
  • Build competencies in SRE team to respond to incidents in a timely manner and identify root cause
  • Work successfully across teams (Engineering, Product Management, QA) by fostering positive, influential relationships
  • Automate manual and repetitive processes to support SRE objectives
  • Help to scale infrastructure from a technical and financial planning perspective
  • Continue to mature the company’s disaster recovery strategy
  • Fully leverage our existing logging and monitoring services and propose new ones as needed
  • Lead the evolution of our incident and change management processes

Who You Are 

  • 7+ years of Infrastructure Engineering or Operations experience
  • 3+ years managing Site Reliability, NOC, or mixed operations teams preferably in globally distributed environments
  • Expertise in AWS and related services
  • Experience in 24/7/365 operations team, managing data centers and infrastructure
  • Passion for teamwork and collaboration, adaptability, communication, problem solving, customer focus, results, and innovation
  • Strong understanding of enterprise monitoring systems and their administration, such as New Relic and Sumo Logic
  • Track record of team building, including employee development with experience successfully coaching individuals to achieve goals
  • Experience with Salesforce
  • Background in Incident Management and strong understanding of ITIL service operations and SCRUM methodologies
  • Experience designing, developing, debugging, and operating resilient distributed systems
Read Full Job Description

Technology we use

  • Engineering
  • Product
    • JavaLanguages
    • JavascriptLanguages
    • PythonLanguages
    • SqlLanguages
    • jQueryLibraries
    • ReactLibraries
    • Backbone.jsFrameworks
    • PlayFrameworks
    • MySQLDatabases
    • RedisDatabases
    • Google AnalyticsAnalytics
    • GainsightPXAnalytics
    • AmplifyAnalytics
    • IllustratorDesign
    • InVisionDesign
    • PhotoshopDesign
    • SketchDesign
    • FigmaDesign
    • ConfluenceManagement
    • JIRAManagement
    • ZeplinManagement

Location

Located in the heart of Sherman Oaks, ringDNA’s modern office provides easy access to great food and fun activities right outside our front door.

An Insider's view of ringDNA

How would you describe the company’s work-life balance?

One of the most enjoyable things about ringDNA is the work/life balance which is supported by my manager and leadership. In my time here, I’ve found no pressure to work long hours/weekends but found genuine empathy and care among colleagues and management. This has been crucial in my transition to a new industry within a fast-growing startup.

Chelsea Wojes

Customer Success Manager

What are ringDNA Perks + Benefits

Culture
Friends outside of work
Eat lunch together
Daily stand up
Open door policy
Open office floor plan
Health Insurance & Wellness Benefits
Flexible Spending Account (FSA)
Disability Insurance
Dental Benefits
Vision Benefits
Health Insurance Benefits
Life Insurance
Retirement & Stock Options Benefits
401(K)
Company Equity
Performance Bonus
Child Care & Parental Leave Benefits
Flexible Work Schedule
Vacation & Time Off Benefits
Generous PTO
Paid Holidays
Paid Sick Days
Perks & Discounts
Casual Dress
Company Outings
Game Room
Stocked Kitchen
Some Meals Provided
Happy Hours
Parking
We offer employees Free on-site garage parking.
Relocation Assistance
Professional Development Benefits
Lunch and learns
Acme Co. hosts lunch and learn meetings once per quarter.
Promote from within
More Jobs at ringDNA19 open jobs
All Jobs
Dev + Engineer
Marketing
Operations
Product
Project Mgmt
Sales
Marketing
new
Los Angeles
Sales
new
Los Angeles
Operations
new
Los Angeles
Developer
new
Los Angeles
Developer
new
Los Angeles
Sales
new
Los Angeles
Sales
new
Los Angeles
Marketing
new
Los Angeles
Project Mgmt
new
Los Angeles
Project Mgmt
new
Los Angeles
Sales
new
Los Angeles
Operations
new
Los Angeles
Marketing
new
Los Angeles
Operations
new
Los Angeles
Product
new
Los Angeles
Developer
new
Los Angeles
Marketing
new
Los Angeles