Lead Site Reliability Engineer
SLO Adoption Program — Designed and leading a portfolio-wide SLO program connecting
reliability metrics to client experience outcomes. Spans discovery sessions with product and
engineering leads, executive alignment, and a governance model defining
shared ownership between engineering and product. Produced executive communications adopted
by director for VP and C-level audiences.
Margarita Hour — Identified and resolved a critical silent production failure:
users experiencing daily session interruptions with no alert firing and no incident declared.
Investigated root cause (token expiry, no graceful retry), resolved the issue, and used the
finding to build organizational case for proactive reliability monitoring.
Agentic Workflow Platform (FlowFoundry) — Built an agentic workflow platform on Azure Functions and
Azure OpenAI during innovation week to establish organizational SME position in agentic AI.
Platform informed company AI architecture direction. Led adoption evaluation of GitHub
Agentic Workflows and Datadog MCP Server as organizational standards. Designed hybrid
deterministic/agentic execution model and JSON-driven workflow definitions.
Portfolio Modernization Program — Independently identified and leading portfolio-wide
modernization beginning with Azure Function Apps. Assesses services for fragility,
observability gaps, and standards drift without external direction. Decomposes findings
into epics and user stories sequenced to support SLO program observability requirements.
Engineering Standards — Established DB resiliency standards (Polly, command timeouts,
connection pool sizing) with CI fitness function enforcement. Led Application Insights
decommission and Datadog migration. Implemented secret scanning standards across ADO and
GitHub.