Data Extraction and Ingestion: The 40 companies to watch in 2021
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
About InsTech London InsTech London was founded in 2015 and has grown to become a leading intelligence network that is shaping the future of insurance and risk management. We connect industry to technology, data, and analytics providers that are driving and influencing change through innovation. The two executive partners, Matthew Grant and Robin Merttens, each have over 30 years’ experience of bringing new technologies into the global insurance market and draw on an extensive network of consultants and collaborators. InsTech London runs regular events (live and digital), provides market commentary and insights and a weekly podcast. In addition, we provide advisory services to our members, from ad-hoc recommendations to in-depth consulting studies. We are supported by (and grateful to) our corporate members of over 130 companies, and an extended community reaching 20,000 people who keep us honest and informed about what is happening in insurance, technology and beyond. Report authors This report has been led by Matthew Grant, supported by the InsTech London Research and Insight team – Rebekah Bostan, Henry Gale and Ali Smedley. Our work is further informed by the over 200 discussions we are all having with companies in, and supporting, the global insurance industry each month. Disclaimer & copyright The information in this report is drawn from a variety of sources. This includes our own experience, interviews, and live discussions at our events with founders, executives, investors, and others active in this area. Further information has been gathered from public sources such as a company’s own websites and news items. We have not independently verified all the information in this report and InsTech London assumes no responsibility for the accuracy and completeness of what is written here. This report is for information only and not intended to be used as advice or recommendations beyond general observations of trends and themes. The reproduction of all or part of this report, in full or in part, in any media without the written permission of InsTech London, is prohibited. InsTech London reports Ingestion and data extraction is one of the major themes we have identified this year that we believe will be driving change in insurance in the next decade. This is the sixth report to be released. Our previous reports are available from the following links and are available free to members. Insurance: to embed, or not to embed July 2021 No-Code/Low-Code – A Bridge from Legacy to Digital? May 2021 Location Intelligence 2021 – the Companies to Watch March 2021 E-Trading Platforms: Challenges, Opportunities and Imperative January 2021 Parametric Insurance – 2021 outlook and the companies to watch October 2020 To learn more about InsTech London, our forthcoming themes, review recordings of our live events or to discuss hosting an event with us, you can find us at www.instech.london and contact us at hello@instech.london.
Introduction Those of us living at the sharp end of innovation are frequently talking about the value of data, how this can enrich the customer experience, make pricing fairer and generally make the world a better place. Unfortunately, even though the insurance industry has now developed an appetite for data it’s still often being shared between partners in a way that hasn’t changed much since computers arrived on our desks in the early nineties. This is becoming a serious choke point for the industry - a data trap - keeping costs high, amplifying errors and throttling innovation. We believe the problem will get sorted, eventually. For the next few years spreadsheets and PDF files, often with no standard structure or format, will still proliferate. So, for this report we’re going below deck and peering into the clanking engine room of insurance to discover how data is being extracted, ingested, and organised. We’ve set out to discover whether the promised land of efficient data transfer is within our grasp, or are we going to be floundering at sea for many years to come? Insurance companies need the data they are receiving from their partners (brokers, clients and other third-parties) to improve their decision-making. Risk selection, underwriting, claims, policy administration and increasingly, regulatory reporting are all being improved with better data. If insurers can get it. Before shipping containers became widespread in the 1960s, cargo was shipped around the world in boxes and bags. Goods were frequently damaged or stolen in transit, and dockyards around the world employed hundreds of thousands of people to load and stack the loose goods. Despite the efforts of organisations such as ACORD, and the technology existing to whizz data around the world safely and cheaply, insurers and brokers in 2021 are still sharing key business information in the same way they did in the 1990s. Faced with the onslaught of emails, PDFs and spreadsheets, organisations from the biggest tech of Amazon, Microsoft, and Google all the way down to start-ups only a few months old are offering solutions to insurers. This is a problem that shouldn’t really exist. It’s spawning a group of companies we shouldn’t need but, which for the next few years, we’re going to become increasingly dependent upon. We’ve discovered new technology that is improving data extraction, ingestion, and organisation. More processes are being automated, allowing richer data sets to be harnessed and reducing manual processing. Complex tasks can be performed with artificial intelligence (AI) and machine learning to produce accurate and comprehensive datasets. Yet, we still have a long way to go. Once again, we’ve spread our net wide and deep for this report. Wide to track down the companies that are offering industrial strength, and often commoditised solutions across multiple industries, and deep to spend time with those we know well and whom we are grateful for sharing their experience and solutions. This report is the sixth in our ongoing series of reports from InsTech London. We’ve aimed to identify the main companies in this field, and the ones to look out for, but if you feel we’ve missed a company that should have been in this report please do contact us. Better still, join as a member and let us share your stories too. Matthew Grant Partner, InsTech London INSTECH LONDON Data Extraction and Ingestion | 3
Contents Why this matters 5 Key lessons for effective ingestion and extraction 7 What is data extraction, ingestion, and organisation? 9 How and why data is being exchanged 10 Technologies for data extraction and ingestion 13 Who needs ingestion? 16 Extraction and ingestion expertise 18 Words of advice: choosing your ingestion and extraction partner 20 Future state 22 Companies listed in this report 23 Past reports 52 INSTECH LONDON Data Extraction and Ingestion | 4
Why this matters? If any of the following are of interest to you, keep reading: • Where are the problems with sharing data and why do these exist? • What solutions are there for overcoming the limitations of email, PDFs, and spreadsheets of unstructured data into a structured digital format? • Which companies are transforming the messy unstructured data into actionable, useful information? • How reliable is Artificial Intelligence (AI) and automated data processing? When is human intervention required? • Which types of insurance coverage are the worst when it comes to sharing data, and which are best? • What industry initiatives exist and are being tested and expanded? We have attempted to identify where organisations are creating specific capabilities to support extraction and ingestion analytics for insurers. Use of public sources of information to get reliable, explicit information on the capabilities and qualities of companies providing technology services is hard, sometimes impossible. One of the services we offer to our corporate members, and therefore in turn to you as the reader of our reports, is to invest the time interviewing our members about what they are doing and, where possible or permitted, identify real clients and case studies. These member companies are covered in the profile section at the end of this report and in most cases you will find more detailed interviews accessed by the links to our website. For companies that are not members, but which we believe are offering ingestion and extraction services to insurers that are worth being aware of, we have provided our own assessment of their capabilities as best we can. By necessity such entries are shorter and less insightful. INSTECH LONDON Data Extraction and Ingestion | 5
Data Extraction and Ingestion: the InsTech London members Data Extraction and Ingestion: the watch list INSTECH LONDON Data Extraction and Ingestion | 6
At some point in the future the problem of data extraction and ingestion will disappear. Blockchain, or simpler Distributed Ledger Technology (DLT) has the potential to eliminate the need for transferring data in spreadsheets or PDFs altogether. Rather than package up the data and send it to the next person in the chain, it makes more sense to have the data reside in one single place and ensure that whoever is best positioned to define the data keeps it updated and fresh. DLT technology will also eliminate the need for intermediaries, lowering transaction costs. This immutable record can then be made available to allow eligible third-parties to come in and extract the data they need when they need it, using access controls. Less pull, more push. Until then, we all need to make do with what we are being offered. Advances in AI and machine learning are creating more powerful tools. Various market bodies recognise that the frictional cost of sharing and accessing data is a major cost for the industry. Collaboration across insurers and their partners is encouraged to effectively address the problem. The Future at Lloyd’s strategy explicitly specified the need for better collaboration to define and implement better data standards. “Specialty insurance data is particularly problematic as it typically passes through several intermediaries, lacks consistent data standards and is liable to manual entry errors which can lead to a degradation in quality” Nick Mair, CEO, DQPro As is often the case in many industries, it is the more complicated problems that sometimes get solved first. Tractable, for example, is using image recognition from photos sent from mobile phones to assign a cost to damage in motor claims and is expanding around the world. Parsyl is using sensor data to trigger payment for vaccines damaged in transit and has its own syndicate in Lloyd’s. Flyreel is using videos on mobile phones to identify hazards in its customer’s home. COVID-19 has now proved that insurance in London can survive without having to exchange physical pieces of paper. We don’t anticipate, and certainly don’t hope for, a similar “big bang” event to dramatically “fix” this data problem. Instead, we are likely to see a gradual nibbling away at the blockages in the global data exchange, supported with occasional rapid advances in data standards and formats in single business lines. The most current being insurance of crypto assets. “As buildings are digitised, new data sources can inform risk and be shared with the markets for underwriting. Those sensors can also be used to reduce risk: predict and prevent. It’s not going to happen overnight.” Hemant Shah: Co-founder & CEO, Archipelago INSTECH LONDON Data Extraction and Ingestion | 8
What is data extraction, ingestion, and organisation? Various terms are used for data manipulation. For be used for decision making or record keeping. This the purpose of this report, we’re defining the main information is usually then stored in one or more activities as follows: systems and databases. Extraction is retrieving words, numbers, or Organisation of data is the act of validating and pictures from disparate sources such as emails, assigning attributes to the data so that it can PDFs, and spreadsheets. This data may be in a be used immediately or in the future. This is a structured, standardised format such as ACORD, major area, incorporating data science, data or it could be in unstructured or one-off formats. enhancement and AI. We touch briefly on those areas, but we mainly consider organisation of data Ingestion refers to the transformation processes where it more explicitly relates to extraction and that turns the extracted information into ingestion. structured data that can more easily and effectively Artificial: Data Ingestion Case Study The Issue: A global broker had identified the need to reduce manual effort and digitise processes. One of their most manually intensive processes is the rekeying of data from insurance policies into existing systems. Artificial proposed a solution for extracting and structuring data points from market reform contract (MRC) insurance policies. A key requirement was to take strings or sentences of information and transform these into a structured data format required by the target system. Artificial’s solution: Artificial took a ground-up approach to configuration of all the broker’s insurance policy processes. An extraction model was developed to first digitise and then structure all aspects of the document. The original policy document could then be stored in a digital version, using the widely used JSON format. The work to do this required the following steps: • Identified the appropriate data points required by the broking systems, along with their relevant structure, format, and validation rules (such as policy limit structures). The digital policy document had to be able to cope with up to 16 fields relating to the policy and 8 fields relating to each underwriter subscribing to the policy. • A confidence score was then derived for each data point with supporting workflow within the platform enabling “human-in-the-loop” interaction for items that had insufficient confidence scores. • The post-processing (structuring) component of the model was configured to meet customer’s specific needs. • Data could be shared via an API. Pilot results • Extraction, structuring, and processing model took between 1 and 5 minutes for each insurance slip. Multiple instances of the model can run in parallel and so easily scales. • The pilot exceeded their expectations in relation to accuracy and is now being considered across a much larger portfolio than initially identified. INSTECH LONDON Data Extraction and Ingestion | 9
How and why data is being exchanged Workflow Data and information are used across the full insurance life cycle. Typically, once data has arrived at an insurance organisation and been extracted there should be fewer problems sharing the data internally. Some areas in the insurance value chain have more standardised processes than others, but we’ve found examples of ingestion and extraction being widely used (or needed) in the following areas: Focus area of surveyed Workflows Description companies Point of underwriting Underwriting and risk selection General policy Policy administration including administration enrolment and onboarding of new clients Compliance Compliance checks Loss modelling analytics Appraisal of insured assets Claims Claim processing 0% 20% 40% 60% 80% 100% Notes: Based on analysis of 40 companies Source: InsTech London © 2021 InsTech London Data types By far the most common mediums for data to be exchanged in are still spreadsheets and PDF files. Direct connection between systems and providers using Application Programming Interface (APIs) is becoming more popular, but usually only when the technology has been created in the last decade. Words and numbers dominate the type of information being shared, but signatures and handwritten endorsements scribbled onto existing documents also need to be accounted for. Images are also starting to be used in claims. Who’s looking at you now: satellite and aerial imagery Satellite and other aerial imagery is increasingly being used for risk assessment and claims. We covered this extensively in our Location Intelligence report. The two main ways that data is collected are Light Detection and Ranging (LIDAR) which is typically captured by aircraft and drones, and Synthetic Aperture Radar (SAR). Satellites using SAR can see through clouds, smoke and can be used at night. These are used by companies such as ICEYE and McKenzie Intelligence Services. Aerial data is now widely available, and in some cases provided free, for example from the European Sentinel II satellite, but processing can be expensive and resource intensive. INSTECH LONDON Data Extraction and Ingestion | 10
In considering all types of communication across every aspect of insurance, we’ve seen examples shown in the table below. Data Sources Unstructured data Type Emails Free text fields Contracts Emails Text Scanned documents Email attachments Text within images Text within videos PDFs Word documents Tables Spreadsheets Numerical Free form text Other CSV files Audio files Image files Static images Phone calls Fax Audio and visual Recorded videos Live streams Semi-structured data Type Satellite imagery Sensor data Aerial imagery Sensors Source: InsTech London © 2021 InsTech London The importance of near real-time flood data Major flooding has been seen across Europe over the summer of 2021, as well as in other parts of the world. In some areas, the insured loss is likely to reach record levels. Managing and reducing the flood risk going forward requires not only actions by governments but also improved data, tools, and technology. Sensors are playing an important role in speeding up the claims process. For near real-time data, ICEYE provides flood depth data at the building level within 24 hours of a flood’s peak via its satellite constellation. McKenzie Intelligence Services processes satellite imagery and other data sources such as drones and, on the ground, “human intelligence” with machine-learning algorithms. The company can provide data at one-metre resolution anywhere in the world. We have launched a dedicated flood insurance newsletter, Flood Focus, featuring news, interviews, and insights from companies in the space. If you would like to keep up to date with the latest developments in flood modelling and technology, please click here to sign-up. INSTECH LONDON Data Extraction and Ingestion | 11
Looking ahead to sensor data There is massive potential for integrating sensor data into the underwriting risk assessment process, ongoing monitoring, supporting clients with risk management and for claims. Many buildings and most complex machinery have sensors. Our phones have sensors. Yet although the data exists, most insurers are still struggling to find credible use cases for the data coming from the Internet of Things (IOT) at scale. Chubb is one of the most advanced insurers in using IOT data to support its clients. Sean Ringsted, Chief Digital Officer of Chubb, gave examples of how sensors can be used to tackle specific tasks when we spoke to him recently. Organisations such as Shepherd are helping organise and make sense of the incoming property performance data, converting it into actionable insights for insurers. We are releasing a future report on the use of sensors for large commercial assets and how this impacts insurance and risk management. Here is a taster of specific examples we will be discussing: • Smoke detectors to provide alerts for fire are being offered by insurers to their clients to help reduce risks but so far, the data itself is not being widely used for underwriting. • Water shut off valves are being installed in new buildings, although we are not yet aware of direct uses for underwriting. Escape of water represents 50% of water damage in the US. • Temperature variation monitoring is being used to identify if food or pharmaceuticals are spoilt during transit, as we discussed with Ben Hubbard, founder of Parsyl. • Building resonance frequency can be measured to determine if a building is at risk of collapse following an earthquake, an area that Safehub is providing sensors for, as explained by co-founder Andy Thompson. • Biometric data is already available from wearables such as watches and linked to insurance premium reductions. Companies such as Vitality whom we interviewed are amongst the leaders in this field. • Earthquake shaking as provided by the United States Geological Survey (USGS) is a key part of parametric insurance. Kate Stillwell explained how the data is converted to colour-coded maps and why this forms the basis for the coverage provided by her company Jumpstart Recovery. • Windspeed data is another essential source of information used for parametric hurricane triggers around the world. New Paradigm Underwriters are capturing this data, but as Evan Glassman reveals in our discussion, conventional windspeed recorders often fail during hurricanes. INSTECH LONDON Data Extraction and Ingestion | 12
Technologies for data extraction and ingestion All of the technology used for the primary data extraction from the various PDF files, spreadsheets and other mediums has been developed outside of insurance. It has wide applications across many other industries. We have only covered the companies in this report where we have come across insurance explicit applications. Acronyms and Definitions Data ingestion and extraction is described with a wide variety of terminology and acronyms. These are not comprehensive, but should guide you through the main processes and technologies. Text Acronym Terminology RPA Robotic process automation A generic term for automating high-volume repetitive tasks to replace or significantly reduce the need for human input. Introducing RPA has been at the heart of many insurers’ drive to create efficiencies and replace people with machines. Sophisticated RPA can extract data based on keywords and rules, verify claims, and integrate different data sources. RPA can also be applied to policy administration and data collection for underwriting. It’s a big topic, but we’re only interested in the data ingestion element for this report. IPA Intelligent process automation Unlike RPA, IPA uses AI to automate tasks. When documents contain unstructured data, RPA’s keyword or rule-based automation can be prone to error. IPA can understand context and meaning to automate tasks such as claims processing more accurately. Also known as cognitive process automation (CPA). The design of IPA requires a deep understanding of how the insurance workflow operates and, for many insurance markets, an appreciation of the specific language and nuances of how insurance organisations communicate with each other. Intelligent document processing (IDP) is a type of IPA which uses technologies such as OCR and NLP to convert documents into structured data. NLP Natural language processing NLP is used to read raw digital data and seeks to understand it in a way a human would. A simple form of NLP is “syntax” which aims to verify and improve the grammar of the written statement. “Semantics” goes further and attempts to extract context from free form text and convert this into a structure, classification and order that can then be analysed. One common challenge for data extraction in insurance is where the policy conditions (deductibles, limits, exclusions), may be explained in writing, rather than following a pre-determined and accepted format. This is particularly true in speciality or other forms of insurance where there are complex and unique coverages sometimes covering multiple pages of a contract. One type of NLP is sentiment analysis. This quantifies how positive or negative the language in an extract is. This could be used to understand how customers feel about their insurer. Another is named entity recognition (NER), which detects key entities and data within text including names, locations, times, and dates. Natural language understanding (NLU) or natural language interpretation (NLI) are extensions of NLP focusing on the context and intent. NLP applications enable the technology to be trained to extract relevant information for underwriting. More sophisticated NLP can learn as it goes, teaching itself. NLP is hard to get right and is potentially error prone. Every application using NLP purposes should be able to assign a confidence level to what has been extracted. ITI Intelligent text ingestion A generic term used to describe the other processes identified here which convert unstructured text data into structured data, including OCR, Intelligent character recognition (ICR), Intelligent word recognition (IWR) and NLP. Source: InsTech London © 2021 InsTech London INSTECH LONDON Data Extraction and Ingestion | 13
Audio Acronym Terminology ASR Automatic speech recognition ASR technology turns an audio recording of speech into machine-readable text. Speech recognition could be used by an insurer to record phone calls with customers and apply NLP. Technology has advanced rapidly in this area. Source: InsTech London © 2021 InsTech London “At InsTech London we’re enthusiasts for Otter.AI which demonstrates the power of converting spoken word into a coherent readable document stripping out many of the common human conversational foibles.” Image and video Terminology Image recognition technology Images can be analysed to retrieve information of what the image depicts. The technology may be applied in claims processing to assess the loss based on a photograph of a damaged insured asset, such as a car. This is one area where insurance specific technology organisations such as Tractable and Flyreel are making strong advances. Video recognition technology Analyses videos to retrieve information on the items and events depicted in the video. Applications in underwriting include assessing the nature and value of objects to be covered. CCTV and dashcam footage may be used in claims. Whilst more complicated to get right than static image recognition, there are clearly additional benefits from video that can reveal the full story of a claim event and enable a focus on specific frames at key points. Video analysis can also be used to determine the size of properties, converting images into dimensioned drawings, potentially even rendering construction types and finishes in a digital format. Source: InsTech London © 2021 InsTech London “Tools that use and access policyholder information in combination with video collaboration or pictures have become very important to insurers, not just on the underwriting side, but also to process claims.” Mark Anquillare, CCO, Verisk INSTECH LONDON Data Extraction and Ingestion | 14
INSTECH LONDON Data Extraction and Ingestion | 15
Who needs ingestion? Source: InsTech London © 2021 InsTech London All parties in the insurance value chain handle large amounts of data. Insurance transactions cause data to flow through as many as seven different organisations, from the policyholder to the reinsurer with lots of brokers in the middle. It’s a bumpy journey. Some data gets mixed up; other data gets lost in transit. It’s not uncommon for new information to creep in. Lacking a single coherent backbone for the transfer most organisations need to have some form of data extraction and ingestion processes. Here is a brief guide to some of the different types of entities responsible for ingesting and sharing data. Insurers Personal lines In a competitive market, insurers of all sizes must but there is some consistency between brokers improve their speed of response to customers to and some of the underlying formats are recognised remain competitive, and accuracy of risk selection standards. Complexity will be squeezed out and is paramount. In personal lines, insurers may benefit competitive insurance pricing will increase the from relationships directly with customers. It is now alignment between the data available from clients possible to shrink, to almost zero, the frictional and how this is being passed to insurers. Likewise, cost and other barriers to getting the data personal the claims experience is simpler and lends itself to lines insurers need. Whilst the data problem is not further automation. entirely solved in this area, competitive forces will drive the next generation of enhancements. PDFs On the other hand, it is still hard for speciality have no place in personal lines insurance. market insurers. Many of these insurers are at the end of the game of pass the parcel with wholesale brokers and retail brokers in the US and then The SME London. They are largely at the mercy of the data and formats they received. If they ask for different At the smaller end of the small and medium data or decline to write business because the sized enterprise (SME) market, sole traders and formats are not what they want, they don’t get companies with less than 10 employees, the shown the business. insurance transaction is starting to look more like a personal lines buying experience. The risk Submissions are commonly sent via email characteristics are broadly homogenous within and contain insurance applications, exposure different trades, and buyers can be nudged into schedules, loss runs and various other buying existing insurance products rather than attachments. In many cases, critical information requiring the bespoke contracts required for more is within the body of the email itself. This is an complex insurance transactions. Insurance is area where extraction and ingestion services are being bought direct and via brokers. most needed. Technology solutions can automate the extraction of data from all these submission In the UK Polaris and in the US ACORD are documents, including from the email itself. providing standards for simple commercial policies. The data may still come in a PDF format, INSTECH LONDON Data Extraction and Ingestion | 16
The emergence of the algorithmic underwriter, First, for the brokers, manipulating data carries a whereby data is automatically entered and then cost, and may expose them to liabilities if errors underwritten by an insurer will require, and drive, are introduced. The formats they are receiving better data. Canopius, Ki, Beazley and others are from their clients are very varied, and there are launching algorithmic underwriting units which still no agreed standards amongst insurers in how reduce the human costs, and hence can offer more they receive data. Some of the major brokers see competitive underwriting pricing. This will give strategic benefits in improving the data they are them the edge in demanding better data from their receiving. Furthermore, data extraction can be brokers and agents. used to investigate market trends and establish benchmarks to better serve clients. Improving the Mid-market and large commercial quality and format of data helps with getting faster, and maybe better, quotes from insurers. With the increasing value in the size of the insured’s assets, the information about these assets gets The reinsurance brokers are leading the pack here more complicated. Surveyors and risk engineers by some way, particularly in catastrophe property will often perform surveys, but these are rarely underwriting. The three main brokers, Aon, Guy provided in consistent formats. Data is usually Carpenter and Willis, have established a key role in buried in lengthy reports shared in hard copy or supporting their insurance clients to improve their PDFs. Schedules of locations may be provided in own data quality, create consistent standards and PDFs. Companies such as Archipelago have been promote the sharing of information. This happened founded to solve this problem by extracting the over 20 years as RMS and Verisk (AIR) launched data at source, from the end client and facilitating global catastrophe models and, by default, defined the entire data transaction via a single platform. two proprietary standards that were adopted by the market. Brokers The increasing move to e-placing platforms (see The brokers are often looked at as being able to our recent report e-trading platforms) and improve the standards of data and how it is shared. the desire by the brokers to support the new It is true that brokers often sit at the intersection algorithmic insurers will increasingly drive greater between the client and the insured but changing adoption of electronic standards. In the short term the data that is received and passing it on to though, we see only limited evidence of brokers insurers is not as straightforward as it seems. using ingestion and extraction tools themselves. Nearly 20 years after early e-trading initiatives proved to be a false dawn for insurance, interest in the platforms space has been firmly reignited. Our ‘E-Trading Platforms: Challenges, Opportunities and Imperative’ report examines what we can learn from the industry’s past attempts, how current platforms align with the needs of speciality (re)insurers, and the role Lloyd’s and brokers have to play going forward. Click here for more information INSTECH LONDON Data Extraction and Ingestion | 17
Extraction and ingestion expertise Both established companies and start-ups are offering services to support data extraction and ingestion. Some companies are tackling specific lines of business, others focusing on key parts of the value chain. Different expertise is required for underwriting and claims. We have categorised the companies offering data ingestion, extraction and data management services into sub-groups as follows. The monster munchers The tech giants such as Google, Amazon and Microsoft offer generic services such as OCR through their cloud platforms (Google Cloud, AWS, and Azure). The sophistication and low cost of these services is hard to match. We are not aware of any insurance specific companies focusing only on OCR and most specialist data companies are tapping into these Company examples industrial strength capabilities. Other companies exist in these categories but are out of scope for this report. The necessary and the needed Many of the companies undertaking extraction and ingestion are doing it as a requirement to offer more sophisticated analyses or perform a different core function. They rarely set out to become ingestors and extractors but have found that this is necessary to get the data into their applications and extract value for their clients. Many of the companies we feature have these characteristics. We believe these organisations are amongst those that are most likely to be successful at creating robust ingestion processes because they are a critical component of more complex (and hence generating higher fees) tools. If the ingestion doesn’t work, then neither will the core products. When describing their services organisations such as Archipelago (obtaining data directly from insured corporations) or Cytora (providing triage services for commercial underwriters) tend to focus on the high Company examples order value benefits they are offering. Ingestion is usually an important, but secondary application. Artificial is embedding ingestion into its broader analytical platform (see case study). “Insurers had been documenting claims, including photos of damaged vehicles, for decades. There were millions of examples sitting on servers. We got access to data from our global customer base to train the AI.” Adrien Cohen, Co-founder & President, Tractable INSTECH LONDON Data Extraction and Ingestion | 18
The insurance insiders Often of smaller scale these companies will have a clear focus on the business problem, and we can assume bring expertise to handling large volumes of data accurately and efficiently. As with all early-stage companies though, there are risks that these organisations have too narrow a focus, or applications that are hard to scale beyond one area into another. Start-ups and scales up such as Convr, CLARA Analytics and Groundspeed Company examples are creating solutions specifically designed for insurance documents and use cases. In-house and in control A core group of companies have developed outsourced analytical and data cleansing functions that have enabled insurers to avoid having to tackle the data ingestion problem themselves. Xceedance, RMS and Verisk all have teams of data analysts, frequently offshore in India or other countries. In-house tools have been developed to help with the data extraction. These tools are not commercially available, but Company examples we believe there is a lot that can be learnt from companies such as these that are tackling this problem at scale across multiple different data inputs and use cases. The standard bearers There are a small number of companies that are valiantly trying to provide standards across all the market. The most widely known is ACORD which has standards commonly used in the US, but still with limited support around the world. In the UK Polaris provides standards for commercial insurance. Default standards have emerged in catastrophe modelling from RMS and Verisk. The London Insurance Market Operations & Strategic Sourcing orga- nisation (LIMOSS) sources and operates market services for the London Company examples insurance market. These companies are slowly winning the standards battle but still have not achieved enough escape velocity to convince us that they alone will drive widespread adoption of standards. For those looking to understand the full scope of standards available we recommend the work that QOMPLX are doing with the ReQoncile initiative which seeks to catalogue existing available standards. InsTech London is represented on the advisory board of ReQoncile. The health and hygiene inspectors The companies in this category are not explicitly focussing on ingestion and extraction, but they play a critical role in ensuring that ingested data is fit for purpose. In the same way that you wouldn’t drive a car without a functioning dashboard of dials and warning lights, insurers need to be able to validate that the data they are receiving is trusted and accurate. A sense of humility is essential for even the most complex AI companies and their tools to ensure that the AI is intelligent enough to admit to what it does not know or can’t categorise. Precisely and DQPro license tools to help insurers identify errors in the data Company examples received from their business partners, with capabilities to flag the issues for manual review. INSTECH LONDON Data Extraction and Ingestion | 19
Words of advice: Choosing your ingestion and extraction partner Much of the advice related to identifying a technology company to partner with when considering your ingestion and extraction needs is true of any technology partner. Whether you are an insurer looking to license a service, a technology company building one or a consultant making recommendations, these are our suggestions for your checklist. Confused or needing clarification? Talk to us about InsTech London corporate membership hello@instech.london INSTECH LONDON Data Extraction and Ingestion | 20
1. Start with your data organisation strategy Data extraction and ingestion processes should fit within a coherent data organisation strategy. This includes how data is collected, stored, and made available to those that need it. A focus on the most critical and higher value business decisions that will be influenced by the data helps define where to start. 2. Is the proposed application integrated? There is no point solving one problem and creating a new one. New data solutions must either integrate with existing business practices or be part of creating a new and improved process that fits the insurers’ goals. Cloud-based and API integrations are prerequisites. Companies with a data organisation strategy will recognise the need for flexible product configurations to accommodate future changes. 3. Who are the vendor’s existing partners and customers? Early adopters and those comfortable with life on the bleeding edge may not require evidence of other partners or customers, but on average 80% of technology buyers are looking for some proof points of existing success with other companies before committing to new technology. 4. Which standards does this new technology work with and support? Data standards are becoming increasingly open source, or available cheaply via ACORD, Polaris, and others. Not every underwriting process or claims assessment uses standards, but many do. Companies building new technology can overlook the business use cases and standards available if they are not familiar with the insurance applications. If you want to benefit from standards that are being used, check that the tools have the capability to work with these. 5. Is there a precedent for your business application and use case? Specific data extraction and ingestion solutions should be tailored to the input data a company receives and the intended application. Documents submitted by clients require different solutions to those submitted by brokers. Products suitable for companies receiving lots of image and video data may be unsuitable to those whose main challenge is interpreting contracts. Some solutions will be specific to an application in the workflow, such as underwriting, claims processing or policy administration. Sometimes it will be necessary to configure applications to your solutions, but it’s better to discover that early. 6. What is the evidence of industry experience and are there ancillary products? How well do the people building and supporting the tools understand your needs? As we identified above, ingestion can be bundled in with other applications that rely on effective ingestion to work. Is there proof that data can move cleanly along the whole value chain from original import to ultimate decision making and recording in a system of record? Companies that have been around for 10 years or more, such as VIPR are likely to have a breadth of established clients and experience and, as we have seen recently with Charles Taylor and CoreLogic, it is not only the start-ups that are getting injections of fresh investments to enhance their product offerings. 7. Do you get confidence scoring and escalation? Technology-enabled data extraction and ingestion solutions may be less error-prone than manual rekeying, but these applications are still subject to error. Companies looking to use data extraction solutions should consider the implications of different types of errors, including words or numbers wrongly transcribed by character recognition technology and documents being incorrectly classified by natural language processing. Some solution providers, such as Artificial and Eigen Technologies, give confidence scores for each data point. Another method for managing error is using a post-ingestion data monitoring system like DQPro. 8. Can you try before you buy? When buying from an established company with proven solutions for business needs that match your own, then it may not be necessary to run a proof of concept (POC). For newer applications and early-stage companies avoid committing to longer term contracts until the technology is proven. A POC trial is a possibility but may still involve costly set-up fees and lengthy internal approvals for insurers. Sometimes it’s better all round for the users to license the product but ensure there are provisions for early termination if the solution doesn’t deliver as expected. 9. Is there a market solution already? Insurer marketplaces and organisations are seeing the benefit of creating central data cleansing and ingestion services. Charles Taylor InsureTech provides a data manager available to all insurers in Lloyd’s, operated in conjunction with LIMOSS. Some of these are free or lower cost and are worth reviewing alongside a bespoke solution. 10. Ask us We are continuously monitoring what is happening in this area and are regularly being approached by companies that may not have been included in our reports. If you would like to discuss any of the companies in this report, more generally understand what is happening in this area, or are interested in sharing information about your technology or needs, please contact us hello@instech.london. INSTECH LONDON Data Extraction and Ingestion | 21
Future state Ultimately the solution to data ingestion can only be solved once the industry is willing to reject data being sent in formats that are not standardised and not digital. Whilst the applications of AI will get more sophisticated, it is a poor testament to the industry as a whole that an entire sub-category of companies has had to emerge to solve a problem that, for the most part, the industry has created itself. There will always be a need for companies like Tractable or Flyreel to deal with the complex nature of damaged cars or the usual chaos of our homes and possessions. Surely though, as an industry, we can figure out a way to pass data between consenting parties without messing it up in the process? What did we miss? This is the sixth in our ongoing series of reports reviewing the state of technology in the insurance market. Our previous reports have been very well received, but inevitability we cannot and do not cover every company and every technology in the broader areas we review. We will be returning to this theme regularly so please do tell us what we missed and what you would like to see in future reports at hello@Instech.london. INSTECH LONDON Data Extraction and Ingestion | 22
Extraction and ingestion companies – the InsTech London members The companies listed below are amongst the leaders in the data extraction and ingestion in the insurance space. We are talking to these companies at least once a month and interviewed them specifically for this report. Examples of InsTech London engagement with these companies can be found at the end of each individual company report. • 360Globalnet 24 • ACORD 25 • Allphins 26 • Archipelago 27 • Artificial 28 • Charles Taylor InsureTech 29 • Cytora 30 • DQPro 31 • Eigen Technologies 32 • EY 33 • LIMOSS 34 • Polaris 35 • RMS 36 • Tractable 37 • Verisk 38 • VIPR 39 • Xceedance 40 INSTECH LONDON Data Extraction and Ingestion | 23
360GLOBALNET Founded: 2010 Head Office: United Kingdom Click here for InsTech Member Profile Data Types: Text Tables Video Static image Point of underwriting Loss modelling and analytics Compliance Workflow: General policy administration Claims Introduction 360Globalnet has built a cloud-hosted digital technology insurance claim platform, built using no-code. The initial focus of the company was on acquiring video, imagery and unstructured data technology with the aim of improving, automating and speeding up the claims process, as well as detecting more fraudulent claims. The company has processed over 3.5 million claims for major insurers in the USA, Europe, and the Far East. It works in areas including motor, personal injury and business interruption. Ingestion problem being solved Its 360Retrieve product ingests all unstructured data including documents, e-mails, file notes, text within images and free text fields from case management, turning all unstructured data searchable, capable of interrogation and analysis. 360Globalnet functionality includes document interrogation and retrieval, unstructured data analysis, document classification and semi-structured data extraction. It is also able to undertake entity matching. 360Retrieve is also designed to reduce fraud through the analysis of claims data. Products Insurance clients 360Retrieve Direct Line, esure, ERS, Mulsanne Partners Non-insurance clients Synectics, Percayso, KPMG, DXC Thames Valley Police and Lancashire Constabulary InsTech London discussion Report: No-Code/Low-Code Platforms - A Bridge from Legacy to Digital? – May 2021 Report: Location Intelligence 2021 - the Companies to Watch – March 2021 Live chat event: Data is the new oil: fracking unstructured content – December 2020 Live chat event: No Code Insurance Platforms. Hype or Game Changing? – September 2020 Live chat event: Technology for Commercial Property: The Future Has Arrived – May 2020 INSTECH LONDON Data Extraction and Ingestion | 24
ACORD Founded: 1970 Head Office: United States Data Types: Tables Workflow: Point of underwriting Claims Introduction The Association for Cooperative Operations Research and Development (ACORD) is a not-for-profit organisation providing data standards to the international insurance industry. Its goal is to make straight- through processing easier. Its over 36,000 members are brokers, (re)insurers and technology companies. Prompted by its members, ACORD is now focused on ways to promote standard adoption beyond the US. It has created a technology arm, ACORD Solutions Group (ASG). Ingestion problem being solved Inconsistent data standards between organisations slow the ingestion and management of data. ASG provides components to technology vendors to implement in their own products. ASG’s partner vendors cover 95% of the London Market. Brokers, insurers and reinsurers can also license ASG technology directly. Transcriber is a mapping and transformation tool for document extraction. It converts risk, policy and financial account documentation, including schedules, bordereaux and invoices, from PDF or Excel format to ACORD formats for placing, premium accounting and claims transactions. All ACORD forms and revisions are included. It also includes a visual mapping tool to integrate non-ACORD documents. Converter uses natural language processing to load data from ACORD structured formats and non- ACORD unstructured formats into ACORD-compliant messages. Conductor provides a data exchange for global and London market messaging. The components Transcriber, Converter and Conductor form part of ASG’s ADEPT data exchange portal. Products Insurance clients Aon, Marsh, Zurich, Hannover Re, Munich Re, Swiss ADEPT (Transcriber, Converter, Conductor) Re Partners Non-insurance clients DXC Technology, VIPR, Sapiens, Sequel, Guidewire, Whitespace InsTech London discussion INSTECH LONDON Data Extraction and Ingestion | 25
ALLPHINS Founded: 2018 Head Office: France Click here for InsTech Member Profile Data Types: Tables Workflow: Point of underwriting Loss modelling and analytics Claims Introduction Allphins is a data analytics and technology platform, primarily intended for reinsurers. The initial focus has been on allowing (re)insurers to manage energy-related exposures both offshore, onshore and for renewables. It is now offering services in other classes of risk including political, credit, terror and cyber. Ingestion problem being solved Exposure and risk data can come from disparate sources in a range of formats. These can include spreadsheets and outputs from several different types of underwriting systems. These risks need to be consolidated in one place in order for them to be interrogated and analysed by underwriters. Allphins has created a machine learning algorithm to allow insurers to digitise submissions being received and organise the data. Allphins also provides its own data to complement what is available to the insurers. Enriching data in this way is only possible when the specific asset or company being insured is recognised. Allphins uses natural language processing to recognise, for example, the same assets which have been named differently. Input data can be accepted in many different formats in order to identify the risks, exposures and insureds. The Allphins tool enriches the data so that it can be analysed for different aggregation scenarios. Allphins has a database built with open-source data, which allows insurers to access more attributes related to specific locations, assets or companies than are available on the data received from the broker or end clients. Allphins offers a range of output formats for its clients. Aggregation of the loss potential across multiple risks is complex and must take into account geographical events, supply chain events and sector events, for example. Allphins has invested in building a strong user interface for data visualisation and to make the data and tool accessible to underwriters for risk selection and underwriting, including exposure analysis over time or versus premium. This helps underwriters optimise the selection of their policies considering premium along with the marginal impact on exposure scenarios, to optimise their capacity. Allphins’ platform also helps insurers understand their exposure when a claim occurs. The company is exploring further services relating to claims. These include using past claims events to test a company’s books against typical claims and predicting risks based on claims. Allphins participated in Cohort 3 of the Lloyd’s Lab. Products Insurance clients Allphins Energy Platform, Political and Credit Risk TransRe, MSAmlin, Greenlight Re, Ariel Re, Arch Re, Platform, Cyber Platform Blenheim Partners Non-insurance clients None InsTech London discussion Report: Location Intelligence 2021 - the Companies to Watch – March 2021 INSTECH LONDON Data Extraction and Ingestion | 26
ARCHIPELAGO Founded: 2018 Head Office: California Click here for InsTech Member Profile Data Types: Tables Text Workflow: Point of underwriting Loss modelling and analytics Introduction Archipelago is a risk data platform helping large commercial property owners manage their data, assess their risks, and efficiently connect to their insurers. Archipelago’s platform digitises risk, enriches data, and connects risk managers, brokers, and insurers during renewals. Ingestion problem being solved Insurers struggle to source trusted, high-quality information about commercial property to inform their underwriting decisions. The best data about these exposures lies within the customers, the owners and managers of these properties, but it is often buried inside documents. This data has also been hard to share in the traditional formats that generally get passed to brokers and insurers. Archipelago helps owners extract critical underwriting data from internal systems (structured data) and documents (unstructured). The company offers a secure, immutable, on-platform link to the insurers who can then analyse the data, with confidence in the source provenance. Model-ready files with complete, high-quality data are available to support their underwriting process. By creating a more standardised process for underwriting, risk selection and assessment for large commercial property submissions, insurers benefit from increased efficiency and gain greater confidence in the basis on which underwriting decisions are made. Products Insurance clients C.N.A. Partners Non-insurance clients Prologis, JLL, Alexandria InsTech London discussion Podcast: InsTech London Podcast interview with Hemant Shah – episode 145– July 2021 Report: Location Intelligence 2021 - the Companies to Watch – March 2021 INSTECH LONDON Data Extraction and Ingestion | 27
ARTIFICIAL Founded: 2013 Head Office: United Kingdom Click here for InsTech Member Profile Data Types: Tables Text Workflow: Point of underwriting General policy administration Compliance Introduction Artificial provides an insurance platform for data ingestion and augmentation, digital underwriting and policy management. The platform, artificialOS, helps insurers to structure incoming data in a digital format. This information can then be augmented with third-party and internal data sets to enable underwriters to assign a score to individual risks and accept, refer or decline submissions automatically. The company also provides tools for digital distribution that can be built on top of existing legacy platforms or on Artificial’s core systems. Artificial partners with market-leading OCR providers. It is regularly testing these technologies and has the flexibility to change providers to take advantage of these new developments. Ingestion problem being solved Artificial’s technology enables its clients to extract customisable core data points from submission emails, contracts, schedule of values or bordereaux files. The relevant document can be consumed securely by email or API. Artificial is able to use third-party data sources to complement the extracted data and give a broader picture of risks. Examples include integration with sanctions checking, address-matching services, and the client’s own internal sources. The extracted data then flows to the Artificial Appetite Engine, which can be used to make granular automated decisions about the risk under consideration. The customer can decide which factors are important for each decision, as well as the minimum confidence level associated with them. When the given factors have a confidence score below the set threshold, the system will flag that the results need to get reviewed manually. Artificial describes this as “human-in-the-loop”. Products Insurance clients Chaucer, Aon, Convex, AXIS Partners Non-insurance clients Capita, AWS InsTech London discussion Podcast: David King: Founder of Artificial Labs: Organising the world’s data – February 2020 INSTECH LONDON Data Extraction and Ingestion | 28
You can also read