Wells Fargo is seeking a Lead Software Engineer.
- *In this role, you will:**
- Lead complex technology initiatives including those that are companywide with broad impact
- Act as a key participant in developing standards and companywide best practices for engineering complex and large scale technology solutions for technology engineering disciplines
- Design, code, test, debug, and document for projects and programs
- Review and analyze complex, large\-scale technology solutions for tactical and strategic business objectives, enterprise technological environment, and technical challenges that require in\-depth evaluation of multiple factors, including intangibles or unprecedented technical factors
- Make decisions in developing standard and companywide best practices for engineering and technology solutions requiring understanding of industry best practices and new technologies, influencing and leading technology team to meet deliverables and drive new initiatives
- Collaborate and consult with key technical experts, senior technology team, and external industry groups to resolve complex technical issues and achieve goals
- Lead projects, teams, or serve as a peer mentor
- *Required Qualifications:**
- 5\+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
- *Desired Qualification:**
- Experience in **Software Engineering, SRE, DevOps, or Platform Engineering**.
- Strong proficiency in **Python for automation and tooling**.
- Hands‑on experience with **Grafana, Prometheus, and Splunk** in production environments.
- Solid understanding of **SLIs, SLOs, dashboards, alerting, and observability best practices**.
- Experience applying **AI/ML concepts** to monitoring, alerting, or operational analytics.
- Strong knowledge of **Linux, networking, and distributed systems**.
- Experience with **Cloud platforms and Kubernetes/OpenShift**.
- Proven experience leading incidents, RCAs, and reliability initiatives
- Experience building **custom Prometheus exporters** or advanced Grafana dashboards.
- Strong Splunk expertise (search, dashboards, alerts, log pipelines).
- Experience operationalizing ML models for observability (AIOps).
- Familiarity with **CI/CD, Terraform, Ansible**, and enterprise automation platforms.
- *Reliability \& Availability Engineering**
- Own and improve **availability, performance, scalability, and resilience** of production systems.
- Define, monitor, and manage **SLIs/SLOs and error budgets** to guide reliability investments.
- Lead capacity planning, performance testing, failover readiness, and disaster‑recovery design.
- *Observability \& Monitoring (Grafana / Prometheus / Splunk)**
* Design and operate a **comprehensive observability stack** using
+ **Prometheus** for metrics collection and alerting
+ **Grafana** for dashboards, visualization, and SLO tracking
+ **Splunk** for log aggregation, troubleshooting, and incident forensics
- Build and maintain **golden dashboards** and actionable alerts aligned to business impact.
- Reduce alert fatigue through signal‑based monitoring and correlation of metrics, logs, and traces.
- Partner with application teams to define **instrumentation standards** for metrics and logging.
- Use observability data to improve **MTTD, MTTR, and reliability outcomes**.
- *Automation \& Python Engineering**
- Develop **Python‑based automation** for monitoring, alert remediation, deployments, scaling, and recovery.
- Build self‑healing workflows integrated with Prometheus alerts and Splunk signals.
- Create reusable automation frameworks and internal SRE tooling.
- Embed automation into CI/CD pipelines to improve deployment safety and reliability.
- *AI/ML‑Driven Reliability (AIOps)**
- Apply **AI/ML techniques** to observability and operations use cases, including:
+ Anomaly detection on Prometheus metrics
+ Log pattern analysis and correlation in Splunk
+ Predictive capacity and trend forecasting
+ Noise reduction and intelligent alerting
- Partner with data and platform teams to operationalize ML models in production.
- Evaluate and integrate AIOps capabilities into the observability ecosystem.
- *Incident Management \& RCA**
- Serve as **incident commander and senior escalation point** for P1/P2 incidents.
- Lead **blameless post‑incident reviews (PIRs)** backed by Grafana metrics and Splunk evidence.
- Drive corrective and preventive actions to completion.
- *Platform \& Application Partnership**
- Collaborate with platform, application, cloud, and SRE teams to embed reliability and observability by design.
- Influence architectural decisions to ensure systems are **observable, scalable, and operable**.
- Provide SRE guidance during major releases, migrations, and modernization initiatives.
- *Security, Risk \& Compliance**
- Ensure observability and automation comply with enterprise security and audit requirements.
- Support resilience validation, failover drills, and business continuity testing.
- Mentor and guide SRE and software engineers.
- Define standards for observability, automation, reliability, and incident response.
- Act as the technical authority for complex production and platform issues.
16 Jun 2026* ***Job posting may come down early due to volume of applicants.***
- *We Value Equal Opportunity**
Wells Fargo is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other legally protected characteristic.
Employees support our focus on building strong customer relationships balanced with a strong risk mitigating and compliance\-driven culture which firmly establishes those disciplines as critical to the success of our customers and company. They are accountable for execution of all applicable risk programs (Credit, Market, Financial Crimes, Operational, Regulatory Compliance), which includes effectively following and adhering to applicable Wells Fargo policies and procedures, appropriately fulfilling risk and compliance obligations, timely and effective escalation and remediation of issues, and making sound risk decisions. There is emphasis on proactive monitoring, governance, risk identification and escalation, as well as making sound risk decisions commensurate with the business unit’s risk appetite and all risk and compliance program requirements.
Candidates applying to job openings posted in Canada: Applications for employment are encouraged from all qualified candidates, including women, persons with disabilities, aboriginal peoples and visible minorities. Accommodation for applicants with disabilities is available upon request in connection with the recruitment process.
- *Applicants with Disabilities**
To request a medical accommodation during the application or interview process, visit Disability Inclusion at Wells Fargo.
- *Drug and Alcohol Policy**
Wells Fargo maintains a drug free workplace. Please see our Drug and Alcohol Policy to learn more.
- *Wells Fargo Recruitment and Hiring Requirements:**
a. Third\-Party recordings are prohibited unless authorized by Wells Fargo.
b. Wells Fargo requires you to directly represent your own experiences during the recruiting and hiring process.