Performance management at Danske Bank - For Internal Use 10.6.2014
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
For Internal Use Performance management at Danske Bank 10.6.2014
For Internal Use Disclaimer •The information given can be based on FACTS •The information given may NOT apply to your company •Don’t hesitate to ask questions ☺ 2
For Internal Use Agenda The Situation The Mainframe Environment Evolution in online transactions Governance & Process & Tools & Competence Optimising to be attractive as a platform Next step in Performance Management - SPSS 3
For Internal Use About Danske Bank & the Mainframe environment Danske Bank Medium size bank with app. 7 mill customers Commercial, retail and investment bank established 1871 Branch offices in Northern Europe DK, SE, No, Fi, UK, IR, Li, Lt, Es... Mainframe Environment Evolution transactions/MIPS from 2001-2013 34000 MIPS 3 //Sysplexes (G1-3) G1 Production (G3 availability centre) G2 Development Cics, DB2 and MQ based online environment 100 mill transactions per day 14 k transaction codes 3000+ transactions per second (peak) 100k production batch jobs per day Concurrent batch/online 1500+ developers 1000+ in DK 500+ in IN Programming languages PL/I (primary) Cobol C, Asm, Java and EGL 4
For Internal Use The Situation Competitive Situation Mainframe OR Windows Mainframe capabilities Run without problems Provide cost reductions Provide new technology Strengths Discipline Competence Resolve Cheapest platform Mainframe is running 70 – 80 % of the production 5
For Internal Use About the Mainframe environment – Design points The original design criteria's: •One Bank •One System •One Infrastructure Cloned setup Cloned systems Cloned infrastructure Cloned applications Workload Distribution Cics Cics Sysplex Distributor Via shared MQ Cisco Load balancers (being replaced) MQ Cics-regions App. 150 Cics regions in production Cics Cics All Cics 5.1 Data DB2 is the only data storage Enables batch and online concurrency (with the right policies) It comes with a cost Re-use (save development and infrastructure cost) Components Same components are used in online and batch Components are called in via a home-grown SOA-like infrastructure (from 199n) High resilience risk impact when key components are changed Some components are used in more than 10,000 transactions Component hierarchy > 40 levels Regulatory requirement for many countries are imbedded in the same programs Code and Test complexity is impacted 6
For Internal Use Resilience, Availability & Performance Resilience & Availability Z196 5 Km Z196 2 Sites GDPS1 – 1 Production Sysplex G1 1 Production Availability Sysplex M1nn M1nn M1nn M1nn 1 Development Sysplex Misc. Sysplex’s for insurance & sandbox Availability GDPS3 – G3 When Production Sysplex is taking “out” we M3nn M3nn switch to Availability Sysplex M3nn M3nn Availability Sysplex is kept current by Qrep Performance GDPS2 – DB relies on response times < 0,25 seconds to G2 be able to serve customers at peak – bottleneck: MVnn MVnn MVnn Cics storage = 31 bit Cics setup (not cloud –yet ☺) 7
For Internal Use About the Mainframe environment Key issues with Danske Bank setup Failure in components Failure in 1 key component can bring the online system down Failures are typically Bad performance (CPU) Bad response time The effect of bad performance Workload Virtual Storage exhaustion in Cics balancing Transaction are to slow Overload CPU starvation Control The CPU cost of running the transactions increases with concurrency (up to 50%) Cics Cics The Protection mechanisms Workload balancing in the network Overload Control Inspection of transaction Queues MQ Workload balancing in shared MQ WLM Health indicator Cics Cics 8
For Internal Use Evolution in Danske Bank online transactions Transaction Types Description Problem: 3270' Green on black Simple Web Simple HTML •We can’t buy more virtual storage •More Cics regions will not fix the problem Integrated Product Rich transaction spanning several APPS •Usage of 64 bit is the best option Self Service One "screen" - one confirmation Cics Cics MQ Cics Cics 9
For Internal Use Evolution in online transactions Something would have to be done: •Change to another Platform, Language, Container •Expensive •Risk profile (it might take too long time) •Optimize the existing applications in place •Lower cost •Lower risk Mill. Transaction/day Cics Cics MQ Cics Cics 10
For Internal Use Evolution in online transactions Something would have to be done: •Change to another Platform, Language, Container •Expensive •Risk profile (it might take to long time) •Optimize the existing applications in place •Lower cost •Lower risk Mill. Transaction/day Cics Cics MQ Cics Cics Cics is not the ’end-point’ 11
For Internal Use Danske Bank – Changing the organisation Management Leadership Management also wanted savings ☺ •Incentives •Reporting •IT Cost dahboard •IT Cost Targets Prevention Control Healing •Education •Screening •Resolve •Self Control •Tooling •Competence Cics Cics •Tooling •Management •Management MQ 1500 5 – 10 system developers programmers ”A few good Cics Cics men” 12
For Internal Use Danske Bank – Changes to the organisation MIPS contract CIO with CIO $$ Director Director Director Developm. Project Solution manager Manager Architect Technical Performance Local DBA System Architect Manager Center Development Of Model Excellence Local Developer Process & Performance Guidelines 13
For Internal Use Danske Bank – Changes to the development process CIO Organisational Checkpoint Event Solution Performance & Construction Performance Performance Resilience Review Analysis consultation Evaluation DBA Review Performance Evaluation – disadvantages •3 mw – 3 mm for Performance Center of Excellence •Bottleneck for the project 14
For Internal Use Danske Bank – Changes to the development process What if the local DBA & Performance Expert could do the Performance Evaluation ? Development Threshold Model dependent Process & application Guidelines development Local DBA Local Performance Tooling (real time SMF data with a ruler) 15
Threshold dependant application development The concept • By categorising the Cost-, Robustness-driving factors we are able determine: - Relevance of optimizations requirements - Operational cost Usage Categorisation intensity table Performance requirements • Performance requirements have been balanced according to - Effectiveness - Cost Effort Effect
Threshold dependant application development The concept – Performance requirements (Cics) Categorisa tion table Performance requirements Cics key Very high High volume Very frequent Frequent Rare Seldom used values volume (TGV> (inter (regional> (metro> (Locomotive 230400/ city>36000/hou 7200/hour) 1800/hour) > hour) r) 36/hour) Response time 0,1s 0,2s 0,5s 1s 5s 10s CPU time 0,05s 0,1s 0,2s 0,5s 2,5s 5s #Change mode 5 20 50 500 1000 5000 Storage 2 Mb 2 Mb 2 Mb 4 Mb 6 Mb 10 Mb consumption (eudsa) #LINK 5 40 100 1.000 5.000 10.000 #Dispatch 10 30 100 500 1.000 2.000 #DB2 calls 50 100 300 1.000 5.000 10.000
For Internal Use Danske Bank – Changes to the process CIO Organisational Checkpoint Event Solution Performance & Construction Performance Performance Resilience Review Analysis consultation Evaluation DBA Review What if a ”small” change in a highly re-used component had a very bad effect…..!! Critical Elements 19
Critical elements • The mainframe environment in Danske Bank has a high degree of REUSE - Programs and modules are used by many different workloads • Example Customer Information module is called from - Online • Customer portal • Netbank • Business online - Batch • Interest • Securities • Behaviour score - Changing Customer Information module can potentially affect all Danske Bank production • Availability • Cost structure • Introduction of Critical Elements gaveDanske Bank a pro-active warning and approval mechanism for vital programs and modules • The development areas are now asking to imbed their own selection of important modules to “Critical Elements”
Critical elements at Danske Bank • Statistics from Cics and DB2 are correlated and prioritzed with the following concerns in mind Cics Experienc DB2 - Usage across many development areas e - High usage • Experience based knowledge is added Selection engine - Commonly used modules like • CPUSYST • USCACHE • The list is re-generated every month Critical elements • The list is in-corporated into our Change Control System • The list is in-corporated into ADS CCS Application Diagnostic System
Critical Elements How does it work ! • When the developer opens a Critical Element an information ADS message is given by RDz: - This is a Critical Element – please be aware...... - This information is given by RDz Local DBA • The day after the Critical Element is moved to SYST the System Manager from the development area is presented CCS with a report telling that a new version of a Critical Element Syst has been moved to SYSTem Test Local Performance • MAO is informed when the Critical Element is packaged for production implementation - MAO will contact local DBA/Performance expert In the development area to verify NFR quality • Performance • Resilience - Typical question could be • Have you verified that the Critical Element is not abending (FADUMP) CCS • Have you verified performance before and after the Critical Element has been modified Prod • What are the changes being implemented - After verification the CCS change is approved by ITSM MAO
For Internal Use Maintaining and upgrading developer competence Networks: •Developer DENMARK INDIA •Technical Architects Performance Performance •Solution Architects Center Center Of Of •Local DBA Excellence Excellence •How to use DB2 correctionly •5 all day meeting every year Local DBA Local Local DBA Local •Webcasts Performance Performance •Tests •Local Performance experts •How to measure performance Development Development •Optimizing applications community community •Optimizing design DENMARK INDIA •Using Tools •4 all day meetings every year •Bring your own problem •Test 23
For Internal Use Danske Bank – Changes to the organisation MIPS contract CIO with CIO $$ Incentive Director Director Director Developm. Project Solution Rules manager Manager Architect Technical Performance Local DBA System Architect Tools Manager Center Development Competence Of Model Local Developer Process & Guidance Excellence Performance Guidelines Egg-slicer 24
For Internal Use Healing – Governance and Optimisation - Runtime LoopDetector Automatic Shit Detector ASU time inside Cics Operator // TIME= limitations Automated OverLoad Automated Test Threadsafe Control Mandatory Re- No Batch in compile Mandatory Re- Commit peak hour Bind Enforcement Automatic APA reports 25
For Internal Use Optimisation - Runtime Some of the GameChangers 1. Performance Center of Competence 2. Management backing 3. Threadsafe 4. DB2 10 5. Re-Compile (Architecture level) 6. Universal Caching System 7. Consolidation 8. APA 9. HIS 10. Contract 26
For Internal Use Optimisation – Runtime UCS MVS1 UCS Application program API Stub PC Functions Name/Token XCF synchronization Queue Cache Dataspace Dataspaces Garbage Collector X C F MVS2 UCS Application program API Stub PC Functions Name/Token XCF synchronization Queue Cache Dataspace Dataspaces Garbage Collector 27
For Internal Use Optimisation – Runtime DB2 10 and Consolidation • From 8 to 4 ZOS systems in //Sysplex • DB2 Group from 11 to 6 • MQ Group from 14 to 4 • Cics Group from 300+ to 150 28
For Internal Use Danske Bank – Optimizing the runtime To continue to keep low cost we have decided to: •Improve overall system performance through better balancing •Detailled studies have shown great potential •Improve anormality detection •Using SPSS – IT analytics is helping to achieve a faster reaction •Using SPSS we can balace our systems to be more effecient •SPSS and IT analytics is the enabler for Danske Bank 29
For Internal Use Danske Bank – Optimizing the runtime To improve overall system performance we have used HIS – • In a Un-controlled environment we are able to achieve: Report for processor: 00 Interval: 54.093953 microsecs. Cycles in interval: 235.755.942.040 88.817.162.070 37,67% Instructions executed: 33.690.832.484 13.145.421.063 39,01% Cycles per instruction: 6,99 6,75 Mips rate: 745,06 771,55 RNI report Total Instruction Data L1 miss per 100 instr. 5,48 3,28 2,19 L1 miss p/100 problem 4,12 2,37 1,75 - % From L2 (in CPU) 57,78% 69,04% 40,91% - % From L3 (on CHIP) 27,42% 23,80% 32,86% - % From L3 (on BOOK) 2,06% 0,00% 5,16% - % From L3 (off BOOK) 0,50% 0,00% 1,25% - % From L4 (on BOOK) 5,63% 3,89% 8,24% - % From L4 (off BOOK) 0,64% 0,54% 0,79% - % From Memory(local) 1,38% 0,60% 2,54% - % From Memory(remote) 4,55% 2,09% 8,22% DAT report Total Instruction Data TLB1 miss per 100 instr. 0,96 0,37 0,59 - TLB1 cycles pr. miss 118,54 162,99 91,09 - TLB1 CPU miss % 16,42% 8,62% 7,80% TLB2 miss per 100 instr. 0,49 - PTE % miss of all TLB 51,46% - STE % miss of all TLB 13,06% - STE/LPS % of all TLB 0,48% 30
For Internal Use Danske Bank – Optimizing the runtime To improve overall system performance we have used HIS – • In a controlled environment we are able to achieve a higher MIPS rate Report for processor: 01 Interval: 99.230032 microsecs. Cycles in interval: 515.053.492.600 239.845.022.743 46,56% Instructions executed: 160.000.349.748 111.401.729.213 69,62% Cycles per instruction: 3,21 2,15 Mips rate: 1.622,42 2.422,32 RNI report Total Instruction Data L1 miss per 100 instr. 3,02 1,50 1,51 L1 miss p/100 problem 1,06 0,24 0,82 - % From L2 (in CPU) 66,09% 78,48% 53,76% - % From L3 (on CHIP) 25,48% 18,79% 32,13% - % From L3 (on BOOK) 1,96% 0,00% 3,91% - % From L3 (off BOOK) 0,80% 0,00% 1,61% - % From L4 (on BOOK) 4,01% 1,89% 6,12% - % From L4 (off BOOK) 0,33% 0,09% 0,58% - % From Memory(local) 0,27% 0,09% 0,45% - % From Memory(remote) 1,02% 0,63% 1,41% DAT report Total Instruction Data TLB1 miss per 100 instr. 0,40 0,14 0,26 - TLB1 cycles pr. miss 68,88 86,74 59,07 - TLB1 CPU miss % 8,70% 3,88% 4,82% TLB2 miss per 100 instr. 0,14 - PTE % miss of all TLB 36,37% - STE % miss of all TLB 7,17% - STE/LPS % of all TLB 0,30% 31
For Internal Use Danske Bank – Using Business Analytics for System Mgmt. Pre-conditions: • ”Old school” Rules & Regulation will not make it alone •It is an investment •Effort inside DB 3-4 my. in 2013 •Consultance IBM 3-4 mm. In 2013 •Software Cost •Servers •MIPS •It will be a long term effort 32
For Internal Use Danske Bank – Using Business Analytics for System Mgmt. Why are Danske Bank doing this • We need to detect abnormalities in production much faster than today in: •Cics •To stop bad transaction behaviour •To prevent bad code from coming into production •To start or stop Cics regions •DB2 •To alert DBA when access paths is bad •Batch •To ensure batchjobs on the critical path will be finished in time •To prevent bad program/SQL changes from coming into production •STC •Detact abnormalities in consumption patterns for STCs: •IP stack •MQ •AOC •Detect abnormalities in the APPLLOG 33
For Internal Use Danske Bank – Using Business Analytics for System Mgmt. The Use Cases in 2013: •Realtime SMF record scoring: •Cics transaction (SMF110) •DB2 Plan & Packages (SMF101) •SMF30 (STC & Batch) •Simulated SMF30 •Application Log •We have a consolidated application log across platforms, systems & applications •Preventing NFR violation from coming into production 34
For Internal Use Danske Bank – Using Business Analytics for System Mgmt. SPSS modeler client The modeling platform: on Windows IBM SPSS Collaboration Services (Win2dows) Model Production calibration New SMF data IBM SPSS Modeler Server TEST (Windows) Data Server Filtering Old SMF data 35
For Internal Use Danske Bank – Using Business Analytics for System Mgmt. Runtime Environment How to ”score a model” DBLaunch DB2 Score SMFSURF Live SMF SQ L RDSMON Pack/Unpack RDS control block 36
For Internal Use Danske Bank – Using Business Analytics for System Mgmt. Runtime Environment General Considerations: •Scoring every SMF record is pointless •Scoring frequency will be based on a prepared usage pattern Scoring Policy SMFSURF Name Freq Latest SCORING ABC 1 GMT timestamp 123 8 XYZ 1500 37
For Internal Use Danske Bank – Using Business Analytics for System Mgmt. •Why is it so difficult to get hold of LIVE SMF data ? •Why can’t SMFSURF be ”subscribe” to SMF records of a particular type? •Can SMF be warehoused in memory ? •Maybe in flash ? DBLaunch DB2 Score SMF data SMFSURF SQ L Pack/Unpack 38
For Internal Use Danske Bank – Using Business Analytics for System Mgmt. An example: Normal peak day April 2. 39
For Internal Use Summary: •You can remarkable result in performance and cost – but you have to believe ☺ 40
You can also read