Automated transformation of ETL, data warehouse, and analytics to Snow ake - Address key challenges and move your data warehouse to the cloud ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
WHITEPAPER WHITE PAPER Automated transformation of ETL, data warehouse, and analytics to Snowflake Address key challenges and move your data warehouse to the cloud
Enterprises are increasingly moving to next-generation cloud data warehouses to reduce infrastructure administration overheads, achieve business agility, and enable uncompromising simplicity. A cloud-based data warehouse like Snowflake provides a decou- pled architecture, eliminates the need for remodeling, and facilitates unified data across hybrid sources. Enterprises moving to Snowflake get several benefits, including full SQL support, serverless architecture, strong partnerships with BI and ETL tools, and ease of maintenance. However, enterprises face multiple challenges in dealing with code, business logic, and analytics jobs while moving to Snowflake. For example, workloads must have the exact target-native equivalent to match the production performance SLAs. To achieve this, enterprises need to: - Thoroughly assess the existing inventory of workloads - Identify the chain of workloads to be moved - Match the source and target data - Convert scripts, business logic, reporting logic, etc. - Validate the migrated logic before putting them in production Automation toolsets can effectively deal with these intricacies to make the migration seamless and risk-free. Key considerations when migrating to Snowflake There are several factors that enterprises need to consider when deciding their migration path. Business considerations: • Avoid functional, operational, or end-user disruptions • Reuse data, code, business logic, reports, database views, etc. from the legacy environment in the cloud • Manage risk by planning for phased offload and comprehensive validation of migrated workloads • Ensure a vendor-agonistic approach • Assess existing inventory of workloads to decide the migration scale 2
Technical considerations: • Visualize and identify what needs to be migrated in a phased manner • Leverage automation for prescriptive recommendations, code transformation, and optimization • Optimize poor performing and resource-intensive workloads • Identify technical debts in existing schema, code, etc. • Identify complex interdependencies between the workloads and the future-state architecture • Automate logic transformation to a target-native engine of your choice • Run workloads on dual environments till the new environment and applications stabilize • Accelerate decommissioning of legacy systems after the parallel run period Key questions to ask: • Which workloads can be migrated with minimal effort? • How can we leverage our existing investments? • What is the extent of automation possible? • What is the level of risk and uncertainty involved? • Which workloads should be migrated as-is, which can be optimized for performance, and which need a complete overhaul? REUSE AUTOMATE OPTIMIZE CERTIFY Embrace what you Leverage automation for Meet performance Validate migrated workloads before already have faster time-to-value SLAs on cloud putting them into production A typical migration checklist 3
Migrate ‘as-is’ or ‘total re-engineering’? Whether to move data and processes in one bulk operation or deploy a staged approach depends on several factors. These include the nature of your current data analytics platform, types, the number of data sources, and your future business plans. What you need is an intelligent solution that helps you create a fine balance between the two approaches, attain agility and reliability, and make your existing workloads work best in the new environment. This fine balance creates a win-win situation for end-to-end data warehouse modernization pursuits. It provides an opportunity to: 1. Migrate the already optimized workloads as-is 2. Fine-tune expensive, resource-intensive, and poor performing workloads 3. Archive/destroy unimportant/unused workloads 4. Completely re-engineer workloads that contain poor logic Key challenges when migrating from a legacy data warehouse to Snowflake The transition from any RDBMS to Snowflake is not easy. Enterprises would have built ETL pipelines to push data to legacy warehouses, customized visualization tools to pull data out of their warehouses, and designed client applications dependent closely on data from their warehouse. The top challenges faced when migrating workloads from a legacy environment to Snowflake are: • Risk of moving mission-critical applications already in production • Multiple ETL/ELT jobs in progress on the legacy environment • Identifying optimal cloud data architecture components • Transition to cloud-native capabilities (native schedulers, ingestion, governance, metadata management, etc.) • Manual transformation of data types and SQL compliance • Query (semantic and syntactic) and data validation • Decommissioning legacy systems 4
Addressing the key challenges Data type mapping and schema conversion One of the key challenges when migrating to Snowflake is to match RDBMS data types to Snowflake data types. This requires creating the database structure and typically involves using DDL exports from the enterprise data warehouse, converting them to Snowflake compatible DDL, and executing it. The Impetus Workload Transformation Solution automatically transforms more than 95% of the RDBMS table definitions/schema to Snowflake equivalent. Any remaining DDL scripts are then converted manually by our experts, completing the end-to-end transformation. The tool maps all data types and handles a variety of complex use cases automatically. For instance, for Teradata to Snowflake transformations, it can handle complex data types such as FLOAT BETWEEN, PERIOD, TIME WITH TIME ZONE, CLOB, BLOB, VARBYTE, and many more. In addition to automated schema conversion, database views built on top of the schema are also auto-converted. However, recursive views need manual intervention for optimum performance. A comprehensive migration is achieved when the tool intelligently handles interdependencies between entities such as tables, views, and queries. The tool produces a graphical dependency structure highlighting all the entities that are directly recommended for migration and the entities that are dependent on those entities. Automated logic conversion How does automated conversion help simplify the transformation journey, mitigate migra- tion risks, and accelerate time-to-market? To understand this better, let’s take the example of converting RDBMS source-code into Snowflake SQL and code into Python. Here are some of the areas where automation brings immense value: • SQL query conversion – Automated conversion of SQL queries to SnowSQL • PL/SQL query conversion – Automated conversion of PL/SQL statements, including arguments, variables, exception handling, etc. across various statements. Other conditional statements, loops, dynamic SQLs, cursors, etc. are handled with equal dexterity or write to us at 5
• Script conversion (BTEQ, FLOAD, and FEXP) – Automated conversion of a variety of script types into Python + SnowSQL with different complex UDFs and keywords such as ERRORCODE, ACTIVITYCOUNT, etc. for a variety of statement types A systematic approach to Snowflake transformation The Impetus Workload Transformation Solution brings together data-driven decision support, automation, and cloud data platform expertise to address these challenges through a 4-step process. STEP 1: Assessment and prescription • Automated legacy data warehouse inventory and profiling • Identification of workloads (metadata, data, etc.) and dependencies • Creation of optimized schema (clustering keys, Parquet format/file size for S3 uploads, etc.) • Grouping of workloads into migration units STEP 2: Transformation • Up to 90% automated code conversion to SnowSQL • Automated data migration to an optimized schema • Automated handling of data types, nested views, intervals, loops, UDFs, procedures, etc. • Creation of patterns for the target platform (ingestion, data sync, recon, lineage, security, orchestration, etc.) • Auto-generation of patterns for newer migrations • Query-editing for optimized fixes and performance tuning STEP 3: Validation • Pipeline-based automated validation of the transformed code – Row and cell-level validation of code and error reporting – Pluggable validation transformation for instant verification of transformed code • Data-based validation of transformed code 6
STEP 4: Execution • Deliver a target-specific executable package – Cloud-native orchestration and execution on production • Optimal performance through parallel execution – Parallel execution recommendations through exhaustive data-driven assessment – Generation of required artifacts in the transformation output – Parallel execution of the generated artifacts on production • Productionalization support – End-to-end transitioning into production and operationalization – Capacity optimization – Environment stabilization through parallel-run period – Implicit data governance and compliance on cloud The Impetus Workload Transformation Solution creates a fine balance between migrating as-is and total re-engineering. It helps eliminate technical debt and ensures agility and quality while moving your legacy data warehouse to Snowflake. MODERN DATA PLATFORM ENTERPRISE DATA WAREHOUSE Auto-transform DML, Executable package DDL, procedures, Pipeline-based with cloud-native ETL, jobs, data validation: orchestrators Inventory listing - Schema, metadata, Parallel execution Lineage – Packaged using data dependency cloud-native wrappers CI/CD and analysis - Data-based code transition support Target-specific Repeatability, - Pre and post recommendations extensibility processed data Capacity planning Resource estimation LOGIC TRANSFORMATION ASSESS TRANSFORM VALIDATE EXECUTE The Impetus Workload Transformation Approach 7
Benefits of the Impetus Workload Transformation Solution • Reuse all your existing investments • Automatically transform decades of effort in 12-20 weeks • Fast and reliable end-to-end EDW transformation • 50% time and cost savings compared to manual migration • Strategize between migrating as-is and total re-engineering to achieve maximum with least effort • Proven to reduce development, testing, and validation effort compared to manual migration • Extensive experience in delivering projects for Fortune 100 companies • Caters to any industry, any domain, any use case • Up to 60% cost-effective, massive savings, and faster time-to-value • Meet performance SLAs • Decrease risk and uncertainty • Maximize ROI Impetus Technologies is focused on enabling a unified, clear, and present view for the intelligent enterprise by enabling data warehouse modernization, unification of data sources, self-service ETL, advanced analytics, and BI consumption. For more than a decade, Impetus has been the 'Partner of Choice' for several Fortune 500 enterprises in transforming their data and analytics lifecycle. The company brings together a unique mix of software products, consulting services, and technology expertise. Our solutions include industry's only platform for the automated transformation of legacy systems to the cloud/big data environment and StreamAnalytix – a self-service ETL and machine learning platform. To learn more, visit or write to us at . © 2020 Impetus Technologies, Inc. All rights reserved. Product and company names mentioned herein may be trademarks of their respective companies. Jan 2021
You can also read