Site Reliability Engineer job at Intrado Life & Safety, Inc. Ontario, CA

Job Description

About Us:

Intrado se consacre à sauver des vies et à protéger des communautés, en les aidant à se préparer pour des événements critiques, à intervenir lorsqu’ils surviennent, et à s’en rétablir.

Aujourd’hui, notre entreprise de logiciels en tant que service (SaaS) de pointe est à l’avant-garde de la transformation du continuum des interventions d’urgence du service 911, grâce à des logiciels fondés sur des données de prochaine génération. Les solutions d’ Intrado permettent aux entreprises, aux préposés aux appels, aux répartiteurs et aux premiers intervenants de prendre des décisions plus éclairées, d’intervenir rapidement et de façon sécuritaire et, ultimement, de mieux desservir leurs communautés.

Intrado is dedicated to saving lives and protecting communities, helping them prepare for, respond to, and recover from critical events.

Today, our cutting-edge SaaS company is at the forefront of transforming the 911 emergency response continuum with next generation data-driven software. Intrado’s solutions allow enterprises, call takers, dispatchers, and first responders to make more informed decisions, respond quickly and safely, and ultimately serve their communities better.

Responsibilities/Qualifications:

In this Site Reliability Engineering (SRE) role, you’ll partner closely with development and business teams to create effective monitoring, alerting, and observability solutions that improve system performance and visibility. You’ll support production systems, troubleshoot complex issues, and help drive long-term stability through proactive incident management and automation. You'll get to design secure, cost-effective, and reliable cloud infrastructure.

Reliability Engineering & System Operations

Design, implement, and maintain scalable, reliable production systems.
Troubleshoot and resolve complex application and system issues.
Collaborate with development teams to build features with reliability, observability, and performance in mind.
Apply Site Reliability Engineering (SRE) best practices including Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs).

Monitoring & Observability

Develop and maintain monitoring, alerting, synthetic testing, and dashboards to ensure visibility into system health.
Configure agents for metrics/log collection and manage incident notification channels.
Analyze trends and recurring issues to drive proactive improvements.

Cloud Infrastructure Management

Manage and optimize AWS/Azure environments in staging and production.
Collaborate with architecture, development, and finance teams to design secure, cost-effective, and reliable cloud infrastructure.

Incident & Problem Management

Participate in 24/7 on-call rotations, quickly respond to production incidents, and identify root causes.
Lead post-mortems and implement long-term fixes.
Escalate and communicate issues as appropriate.

Automation & Tooling

Automate repetitive operational tasks and improve system efficiency.
Build and maintain deployment and configuration tools.
Working in CI/CD tools such as GitHub Actions.

Collaboration & Customer Focus

Partner with product and development teams to prioritize and resolve production-impacting issues.
Support internal teams with tools and insights for efficient self-service.
Ensure timely resolution of tickets and clear communication with stakeholders.

Architecture & Documentation

Review technical documentation (HLDs/FRDs) to identify potential issues early.
Maintain knowledge of product platforms and usage patterns.

What You Bring:

Education: Bachelor’s in Computer Science, MIS, or related field (or equivalent experience).
Experience: 2+ years in application support; experience in development, databases, or systems administration preferred.
Cloud: Expertise in AWS and/or Azure (GCP a plus) with hands on experience.
Languages: Skilled in one or more languages (Python, Go, Java, Ruby, JavaScript); scripting with Bash or Python.
Monitoring Tools: Experience with tools like DataDog, Splunk, New Relic; dashboard creation and performance monitoring.
Systems & Networking: Strong Linux/Unix skills; SQL, VPN, TCP/IP, FTP/SMTP troubleshooting.
Containers & IaC: Production level of Kubernets and Terraform.
SRE Practices: Knowledge of SLIs/SLOs/SLAs, CI/CD, and automation strategies.
Soft Skills: Excellent problem-solving, communication, and collaboration.
Mindset: Continuous improvement focus with a proactive approach to reliability.

Total Rewards:

Vous voulez aimer là où vous travaillez? Chez Intrado , nous offrons un régime complet d’avantages sociaux qui comprend ce que vous attendez (assurance médicale, assurance dentaire et assurance des soins de la vue, assurance-vie et assurance invalidité, congés payés, régime enregistré d’épargne-retraite (REER) avec cotisations égales de l’employeur et compte de gestion de dépenses flexible ) , et plusieurs avantages qui excèderont vos attentes, tels que le remboursement de frais de scolarité, des congés parentaux payés, l’accès à une bibliothèque complète de ressources de formation personnelle et professionnelle, des rabais d’employés, des assurances couvrant et plus encore! Postulez dès aujourd’hui pour vous joindre à nous dans un travail qui en vaut la peine!

Want to love where you work? At Intrado , we offer a comprehensive benefits package that includes what you’d expect (medical, dental, vision, life and disability coverage, paid time off, a Registered R etirement Savings Plan (RRSP) with employer matching contributions plan and flexible spending accounts) , and several that go above and beyond – tuition reimbursement , paid parental leave, access to a comprehensive library of personal and professional training resources, employee discounts, insurance coverage and more! Apply today to join us in  work worth doing !

Job Tags

Full time, Internship, Long term contract, Flexible hours,

Similar Jobs

British Automobile Company

Sales Analyst Job at British Automobile Company

...ish proficiency (intermediate level), especially those who can communicate in English (e-mail, etc.)~ Ordinary car driver's license ~ Those who have high administrative processing ability ~ Those who can communicate in English (email)...

Runnymede Healthcare Centre

Manager, Project Management and Planning Job at Runnymede Healthcare Centre

...leaders. Identify interdependencies across projects and propose resource optimization opportunities. Implement project portfolio... ...alignment with the Hospitals obligations pursuant to the Ontario Human Rights Code. While we thank all applicants, only those under...

adidas

Director Talent, Japan Job at adidas

Purpose & Overall Relevance for the Organization: To Plan, Create and Lead the Talent Strategy for Japan Key Responsibilities: Talent Acquisition: Set vision and drive strategic and operational goals for the market across...

Cisco

Customer Delivery Technical Leader - Enterprise Networking Job at Cisco

Role Overview We are seeking an Enterprise Networking Customer Delivery Technical Leader within the CX Global Enterprise Segment Premier organization.You will provide technology leadership as an onsite embedded engineering expert wo...

⁑注目AI企業/Featured AI company⁑

CareWiz Toruto Tech Lead (Node.js, Python, React, TypeScript Job at ⁑注目AI企業/Featured AI company⁑

...o, is an AI-powered physical function analysis service developed using AI technology and the expertise of physical therapists. It allows users to easily visualize physical functionssuch as gait and oral movementsimply by recording a video...