Self-Healing with ITSI - Ashish Yadav Senior Software Developer | TIAA - Splunk Conf
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
© 2020 SPLUNK INC. © 2020 SPLUNK INC. Self-Healing with ITSI Ashish Yadav Senior Software Developer | TIAA
Forward- During the course of this presentation, we may make forward‐looking statements regarding future events or plans of the company. We caution you that such statements reflect our Looking current expectations and estimates based on factors currently known to us and that actual events or results may differ materially. The forward-looking statements made in the this Statements presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, it may not contain current or accurate information. We do not assume any obligation to update any forward‐looking statements made herein. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionalities described or to include any such feature or functionality in a future release. Splunk, Splunk>, Data-to-Everything, D2E and Turn Data Into Doing are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names or trademarks belong to their respective owners. © 2020 Splunk Inc. All rights reserved
Ashish Who’s this dude? • Senior Splunk Developer @ TIAA • 9+ Years of experience on Splunk • Expertise on Splunk Enterprise Security & IT Service Intelligence • Internet of Things (IoT) Splunk Augmented Reality • Avid Blogger & Automation Enthusiast • Author of Book on Splunk – Advanced Splunk © 2020 SPLUNK INC.
© 2020 SPLUNK INC. Where do I Work ? Part of IT Operation Stability & Innovation Team @TIAA Managing & Maintaining Event Routing & Splunk Infrastructure Event Aggregation 10 TB / Day, Multi-Site, Multi- Clustering Environment Infrastructure Incident and Application Management & Monitoring Event Driven Action 6000+ Incidents on an What does my average monthly bases team do?
© 2020 SPLUNK INC. Problem & Challenges Known issue Maintenance Mode Monitoring Siloed Teams Alert Storm & False Positives Automation Loop Cybersecurity Manual Challenges Remediation Unified Automation Orchestrator
© 2020 SPLUNK INC. Team Brain-storming What we did? Incident Data Identification of Building Use- Building Classification & Noise Makers Cases Framework Mining Top Talkers were Windows Service, Disk Creating Self-Healing Last One Year Data Identified Space full, Coherence Framework Autoload, VM-F5 Server Certificate Issue
© 2020 SPLUNK INC. Event-driven Approach Monitor Triggers Rules Actions Remediation Monitor the Define a Fetch the Act based on Problem is logs sources trigger when Rules Rules taken care of! via Splunk something fails
© 2020 SPLUNK INC. How We Did? Self-Healing framework Rule Based Engine (KV Store) Data Enrichment Splunk ITSI Aggregation Self - Correlation Policy Healing Search Data Enrichment Maintenance Mode (KV Store) Successful Notify Self Healing Monitoring – Correlation Search Create Failed Incident
© 2020 SPLUNK INC. What Changed? How did we address the problems? Problem Solution Alert Storms & False Positives ITSI Event Aggregation Manual Remediation & High MTTR Automatic Remediation & Low MTTR Cybersecurity Challenges Unified Automation Orchestrator Siloed Teams Not Standard or Solution but Framework
© 2020 SPLUNK INC. What Changed? How did we address the challenges? Challenges Solution What if Automation was Unsuccessful End to End Integration Framework What if my Framework Failed ? End to End Monitoring of the Framework Automation should not go in infinite loop Orchestrators & ITSI Event Aggregation How to address Maintenance Mode Rule Based Engine (KV Store)
© 2020 SPLUNK INC. Results We can do better!!! 80% 800 90% 99% Noise Human Hours Increased Reliable and Reduction Saved Yearly Efficiency Scalable
© 2020 SPLUNK INC. The Journey Ahead . . . !!! Detect Act Predict Prevent Monitor Remediation Pattern Analysis Generate Warnings Trigger MS Teams Outlier Detection Impact Prediction Rules JIRA Clustering
© 2020 SPLUNK INC. Please provide feedback via the SESSION SURVEY
You can also read