Data Extraction and Ingestion: The 40 companies to watch in 2021

Page created by Ian Ortega
 
CONTINUE READING
Data Extraction and Ingestion: The 40 companies to watch in 2021
Data Extraction and Ingestion:
The 40 companies to watch in 2021
Data Extraction and Ingestion: The 40 companies to watch in 2021
About InsTech London
InsTech London was founded in 2015 and has grown to become a leading intelligence network that is
shaping the future of insurance and risk management. We connect industry to technology, data, and
analytics providers that are driving and influencing change through innovation. The two executive
partners, Matthew Grant and Robin Merttens, each have over 30 years’ experience of bringing new
technologies into the global insurance market and draw on an extensive network of consultants and
collaborators. InsTech London runs regular events (live and digital), provides market commentary
and insights and a weekly podcast. In addition, we provide advisory services to our members, from
ad-hoc recommendations to in-depth consulting studies. We are supported by (and grateful to) our
corporate members of over 130 companies, and an extended community reaching 20,000 people
who keep us honest and informed about what is happening in insurance, technology and beyond.

Report authors
This report has been led by Matthew Grant, supported by the InsTech London Research and
Insight team – Rebekah Bostan, Henry Gale and Ali Smedley. Our work is further informed by the
over 200 discussions we are all having with companies in, and supporting, the global insurance
industry each month.

Disclaimer & copyright
The information in this report is drawn from a variety of sources. This includes our own experience,
interviews, and live discussions at our events with founders, executives, investors, and others active
in this area. Further information has been gathered from public sources such as a company’s own
websites and news items. We have not independently verified all the information in this report and
InsTech London assumes no responsibility for the accuracy and completeness of what is written
here. This report is for information only and not intended to be used as advice or recommendations
beyond general observations of trends and themes. The reproduction of all or part of this report, in
full or in part, in any media without the written permission of InsTech London, is prohibited.

InsTech London reports
Ingestion and data extraction is one of the major themes we have identified this year that we
believe will be driving change in insurance in the next decade. This is the sixth report to be
released. Our previous reports are available from the following links and are available free to
members.

Insurance: to embed, or not to embed July 2021

No-Code/Low-Code – A Bridge from Legacy to Digital? May 2021

Location Intelligence 2021 – the Companies to Watch March 2021

E-Trading Platforms: Challenges, Opportunities and Imperative January 2021

Parametric Insurance – 2021 outlook and the companies to watch October 2020

To learn more about InsTech London, our forthcoming themes, review
recordings of our live events or to discuss hosting an event with us, you can
find us at www.instech.london and contact us at hello@instech.london.
Data Extraction and Ingestion: The 40 companies to watch in 2021
Introduction
Those of us living at the sharp end of innovation are frequently talking
about the value of data, how this can enrich the customer experience,
make pricing fairer and generally make the world a better place.

Unfortunately, even though the insurance industry has now developed
an appetite for data it’s still often being shared between partners
in a way that hasn’t changed much since computers arrived on our
desks in the early nineties. This is becoming a serious choke point for
the industry - a data trap - keeping costs high, amplifying errors and
throttling innovation.

We believe the problem will get sorted, eventually. For the next few
years spreadsheets and PDF files, often with no standard structure or
format, will still proliferate. So, for this report we’re going below deck
and peering into the clanking engine room of insurance to discover how data is being extracted, ingested,
and organised. We’ve set out to discover whether the promised land of efficient data transfer is within our
grasp, or are we going to be floundering at sea for many years to come?

Insurance companies need the data they are receiving from their partners (brokers, clients and other
third-parties) to improve their decision-making. Risk selection, underwriting, claims, policy administration
and increasingly, regulatory reporting are all being improved with better data. If insurers can get it.

Before shipping containers became widespread in the 1960s, cargo was shipped around the world in
boxes and bags. Goods were frequently damaged or stolen in transit, and dockyards around the world
employed hundreds of thousands of people to load and stack the loose goods. Despite the efforts of
organisations such as ACORD, and the technology existing to whizz data around the world safely and
cheaply, insurers and brokers in 2021 are still sharing key business information in the same way they did
in the 1990s.

Faced with the onslaught of emails, PDFs and spreadsheets, organisations from the biggest tech of
Amazon, Microsoft, and Google all the way down to start-ups only a few months old are offering solutions
to insurers. This is a problem that shouldn’t really exist. It’s spawning a group of companies we shouldn’t
need but, which for the next few years, we’re going to become increasingly dependent upon.

We’ve discovered new technology that is improving data extraction, ingestion, and organisation. More
processes are being automated, allowing richer data sets to be harnessed and reducing manual processing.
Complex tasks can be performed with artificial intelligence (AI) and machine learning to produce accurate
and comprehensive datasets. Yet, we still have a long way to go.

Once again, we’ve spread our net wide and deep for this report. Wide to track down the companies that
are offering industrial strength, and often commoditised solutions across multiple industries, and deep to
spend time with those we know well and whom we are grateful for sharing their experience and solutions.

This report is the sixth in our ongoing series of reports from InsTech London. We’ve aimed to identify the
main companies in this field, and the ones to look out for, but if you feel we’ve missed a company that
should have been in this report please do contact us. Better still, join as a member and let us share your
stories too.

                                                                                            Matthew Grant
                                                                                  Partner, InsTech London

                                                    INSTECH LONDON Data Extraction and Ingestion | 3
Data Extraction and Ingestion: The 40 companies to watch in 2021
Contents

Why this matters                                                                5

Key lessons for effective ingestion and extraction                              7

What is data extraction, ingestion, and organisation?                           9

How and why data is being exchanged                                           10

Technologies for data extraction and ingestion                                13

Who needs ingestion?                                                          16

Extraction and ingestion expertise                                            18

Words of advice: choosing your ingestion and extraction partner               20

Future state                                                                  22

Companies listed in this report                                               23

Past reports                                                                  52

                                  INSTECH LONDON Data Extraction and Ingestion | 4
Data Extraction and Ingestion: The 40 companies to watch in 2021
Why this matters?
If any of the following are of interest to you, keep reading:

• Where are the problems with sharing data and why do these exist?

• What solutions are there for overcoming the limitations of email, PDFs, and spreadsheets of
  unstructured data into a structured digital format?

• Which companies are transforming the messy unstructured data into actionable, useful information?

• How reliable is Artificial Intelligence (AI) and automated data processing? When is human
  intervention required?

• Which types of insurance coverage are the worst when it comes to sharing data, and which are best?

• What industry initiatives exist and are being tested and expanded?

We have attempted to identify where organisations are creating specific capabilities to support
extraction and ingestion analytics for insurers. Use of public sources of information to get reliable,
explicit information on the capabilities and qualities of companies providing technology services is hard,
sometimes impossible. One of the services we offer to our corporate members, and therefore in turn
to you as the reader of our reports, is to invest the time interviewing our members about what they are
doing and, where possible or permitted, identify real clients and case studies. These member companies
are covered in the profile section at the end of this report and in most cases you will find more detailed
interviews accessed by the links to our website.

For companies that are not members, but which we believe are offering ingestion and extraction services
to insurers that are worth being aware of, we have provided our own assessment of their capabilities as
best we can. By necessity such entries are shorter and less insightful.

                                                     INSTECH LONDON Data Extraction and Ingestion | 5
Data Extraction and Ingestion: The 40 companies to watch in 2021
Data Extraction and Ingestion: the InsTech London members

Data Extraction and Ingestion: the watch list

                                          INSTECH LONDON Data Extraction and Ingestion | 6
Data Extraction and Ingestion: The 40 companies to watch in 2021
Key lessons for effective ingestion and extraction

                       INSTECH LONDON Data Extraction and Ingestion | 7
Data Extraction and Ingestion: The 40 companies to watch in 2021
At some point in the future the problem of data extraction and ingestion will disappear. Blockchain, or
simpler Distributed Ledger Technology (DLT) has the potential to eliminate the need for transferring data
in spreadsheets or PDFs altogether. Rather than package up the data and send it to the next person in the
chain, it makes more sense to have the data reside in one single place and ensure that whoever is best
positioned to define the data keeps it updated and fresh. DLT technology will also eliminate the need for
intermediaries, lowering transaction costs. This immutable record can then be made available to allow
eligible third-parties to come in and extract the data they need when they need it, using access controls.
Less pull, more push.

Until then, we all need to make do with what we are being offered. Advances in AI and machine learning
are creating more powerful tools. Various market bodies recognise that the frictional cost of sharing
and accessing data is a major cost for the industry. Collaboration across insurers and their partners is
encouraged to effectively address the problem. The Future at Lloyd’s strategy explicitly specified the
need for better collaboration to define and implement better data standards.

          “Specialty insurance data is particularly problematic as
          it typically passes through several intermediaries, lacks
       consistent data standards and is liable to manual entry errors
                 which can lead to a degradation in quality”
                                         Nick Mair, CEO, DQPro

As is often the case in many industries, it is the more complicated problems that sometimes get solved
first. Tractable, for example, is using image recognition from photos sent from mobile phones to assign a
cost to damage in motor claims and is expanding around the world. Parsyl is using sensor data to trigger
payment for vaccines damaged in transit and has its own syndicate in Lloyd’s. Flyreel is using videos on
mobile phones to identify hazards in its customer’s home.

COVID-19 has now proved that insurance in London can survive without having to exchange physical
pieces of paper. We don’t anticipate, and certainly don’t hope for, a similar “big bang” event to dramatically
“fix” this data problem. Instead, we are likely to see a gradual nibbling away at the blockages in the global
data exchange, supported with occasional rapid advances in data standards and formats in single
business lines. The most current being insurance of crypto assets.

          “As buildings are digitised, new data sources can inform
        risk and be shared with the markets for underwriting. Those
        sensors can also be used to reduce risk: predict and prevent.
                     It’s not going to happen overnight.”
                           Hemant Shah: Co-founder & CEO, Archipelago

                                                    INSTECH LONDON Data Extraction and Ingestion | 8
Data Extraction and Ingestion: The 40 companies to watch in 2021
What is data extraction, ingestion,
and organisation?
Various terms are used for data manipulation. For      be used for decision making or record keeping. This
the purpose of this report, we’re defining the main    information is usually then stored in one or more
activities as follows:                                 systems and databases.

Extraction is retrieving words, numbers, or            Organisation of data is the act of validating and
pictures from disparate sources such as emails,        assigning attributes to the data so that it can
PDFs, and spreadsheets. This data may be in a          be used immediately or in the future. This is a
structured, standardised format such as ACORD,         major area, incorporating data science, data
or it could be in unstructured or one-off formats.     enhancement and AI. We touch briefly on those
                                                       areas, but we mainly consider organisation of data
Ingestion refers to the transformation processes       where it more explicitly relates to extraction and
that turns the extracted information into              ingestion.
structured data that can more easily and effectively

   Artificial: Data Ingestion Case Study
   The Issue: A global broker had identified the need to reduce manual effort and digitise processes.
   One of their most manually intensive processes is the rekeying of data from insurance policies
   into existing systems. Artificial proposed a solution for extracting and structuring data points
   from market reform contract (MRC) insurance policies. A key requirement was to take strings or
   sentences of information and transform these into a structured data format required by the target
   system.

   Artificial’s solution: Artificial took a ground-up approach to configuration of all the broker’s
   insurance policy processes. An extraction model was developed to first digitise and then structure
   all aspects of the document. The original policy document could then be stored in a digital version,
   using the widely used JSON format.

   The work to do this required the following steps:

   • Identified the appropriate data points required by the broking systems, along with their
     relevant structure, format, and validation rules (such as policy limit structures). The digital
     policy document had to be able to cope with up to 16 fields relating to the policy and 8 fields
     relating to each underwriter subscribing to the policy.
   • A confidence score was then derived for each data point with supporting workflow within the
     platform enabling “human-in-the-loop” interaction for items that had insufficient confidence
     scores.
   • The post-processing (structuring) component of the model was configured to meet
     customer’s specific needs.
   • Data could be shared via an API.

   Pilot results

   • Extraction, structuring, and processing model took between 1 and 5 minutes for each
     insurance slip. Multiple instances of the model can run in parallel and so easily scales.
   • The pilot exceeded their expectations in relation to accuracy and is now being considered
     across a much larger portfolio than initially identified.

                                                   INSTECH LONDON Data Extraction and Ingestion | 9
Data Extraction and Ingestion: The 40 companies to watch in 2021
How and why data is being exchanged
Workflow
Data and information are used across the full insurance life cycle. Typically, once data has arrived at an
insurance organisation and been extracted there should be fewer problems sharing the data internally.

Some areas in the insurance value chain have more standardised processes than others, but we’ve found
examples of ingestion and extraction being widely used (or needed) in the following areas:

  Focus area of surveyed
                                      Workflows                    Description
  companies

                                      Point of underwriting        Underwriting and risk selection

                                      General policy               Policy administration including
                                      administration               enrolment and onboarding of new clients

                                      Compliance                   Compliance checks

                                      Loss modelling analytics     Appraisal of insured assets

                                      Claims                       Claim processing
  0%   20%   40%   60%   80%   100%

  Notes: Based on analysis of 40 companies
  Source: InsTech London                                                          © 2021 InsTech London

Data types
By far the most common mediums for data to be exchanged in are still spreadsheets and PDF files. Direct
connection between systems and providers using Application Programming Interface (APIs) is becoming
more popular, but usually only when the technology has been created in the last decade.

Words and numbers dominate the type of information being shared, but signatures and handwritten
endorsements scribbled onto existing documents also need to be accounted for. Images are also starting
to be used in claims.

   Who’s looking at you now: satellite and aerial imagery
   Satellite and other aerial imagery is increasingly being used for risk assessment and claims.
   We covered this extensively in our Location Intelligence report. The two main ways that data
   is collected are Light Detection and Ranging (LIDAR) which is typically captured by aircraft and
   drones, and Synthetic Aperture Radar (SAR). Satellites using SAR can see through clouds,
   smoke and can be used at night. These are used by companies such as ICEYE and McKenzie
   Intelligence Services. Aerial data is now widely available, and in some cases provided free,
   for example from the European Sentinel II satellite, but processing can be expensive and
   resource intensive.

                                                   INSTECH LONDON Data Extraction and Ingestion | 10
In considering all types of communication across every aspect of insurance, we’ve seen examples shown
in the table below.

                          Data Sources           Unstructured data              Type

                                                                                Emails
                                                                                Free text fields
                                                                                Contracts
                          Emails                 Text
                                                                                Scanned documents
                          Email attachments                                     Text within images
                                                                                Text within videos
                          PDFs

                          Word documents
                                                                                Tables
                          Spreadsheets           Numerical                      Free form text
                                                                                Other
                          CSV files

                          Audio files

                          Image files                                           Static images
                                                                                Phone calls
                          Fax                    Audio and visual
                                                                                Recorded videos
                                                                                Live streams

                                                 Semi-structured data           Type

                                                                                Satellite imagery
                                                 Sensor data                    Aerial imagery
                                                                                Sensors

 Source: InsTech London                                                          © 2021 InsTech London

  The importance of near real-time flood data

  Major flooding has been seen across Europe over the summer of 2021, as well as in other parts of
  the world. In some areas, the insured loss is likely to reach record levels. Managing and reducing the
  flood risk going forward requires not only actions by governments but also improved data, tools, and
  technology. Sensors are playing an important role in speeding up the claims process.

  For near real-time data, ICEYE provides flood depth data at the building level within 24 hours of
  a flood’s peak via its satellite constellation. McKenzie Intelligence Services processes satellite
  imagery and other data sources such as drones and, on the ground, “human intelligence” with
  machine-learning algorithms. The company can provide data at one-metre resolution anywhere in
  the world.

  We have launched a dedicated flood insurance newsletter, Flood Focus, featuring news, interviews,
  and insights from companies in the space. If you would like to keep up to date with the latest
  developments in flood modelling and technology, please click here to sign-up.

                                                 INSTECH LONDON Data Extraction and Ingestion | 11
Looking ahead to sensor data
There is massive potential for integrating sensor data into the underwriting risk assessment
process, ongoing monitoring, supporting clients with risk management and for claims. Many
buildings and most complex machinery have sensors. Our phones have sensors.

Yet although the data exists, most insurers are still struggling to find credible use cases
for the data coming from the Internet of Things (IOT) at scale. Chubb is one of the most
advanced insurers in using IOT data to support its clients. Sean Ringsted, Chief Digital
Officer of Chubb, gave examples of how sensors can be used to tackle specific tasks when
we spoke to him recently.

Organisations such as Shepherd are helping organise and make sense of the incoming property
performance data, converting it into actionable insights for insurers.

We are releasing a future report on the use of sensors for large commercial assets and how
this impacts insurance and risk management. Here is a taster of specific examples we will be
discussing:

• Smoke detectors to provide alerts for fire are being offered by insurers to their clients to
  help reduce risks but so far, the data itself is not being widely used for underwriting.

• Water shut off valves are being installed in new buildings, although we are not yet aware of
  direct uses for underwriting. Escape of water represents 50% of water damage in the US.

• Temperature variation monitoring is being used to identify if food or pharmaceuticals are
  spoilt during transit, as we discussed with Ben Hubbard, founder of Parsyl.

• Building resonance frequency can be measured to determine if a building is at risk
  of collapse following an earthquake, an area that Safehub is providing sensors for, as
  explained by co-founder Andy Thompson.

• Biometric data is already available from wearables such as watches and linked to
  insurance premium reductions. Companies such as Vitality whom we interviewed are
  amongst the leaders in this field.

• Earthquake shaking as provided by the United States Geological Survey (USGS) is a
  key part of parametric insurance. Kate Stillwell explained how the data is converted to
  colour-coded maps and why this forms the basis for the coverage provided by her company
  Jumpstart Recovery.

• Windspeed data is another essential source of information used for parametric hurricane
  triggers around the world. New Paradigm Underwriters are capturing this data, but as
  Evan Glassman reveals in our discussion, conventional windspeed recorders often fail
  during hurricanes.

                                            INSTECH LONDON Data Extraction and Ingestion | 12
Technologies for data extraction and ingestion
All of the technology used for the primary data extraction from the various PDF files, spreadsheets and
other mediums has been developed outside of insurance. It has wide applications across many other
industries. We have only covered the companies in this report where we have come across insurance
explicit applications.

Acronyms and Definitions
Data ingestion and extraction is described with a wide variety of terminology and acronyms. These are not
comprehensive, but should guide you through the main processes and technologies.

  Text
  Acronym                      Terminology

  RPA                          Robotic process automation
  A generic term for automating high-volume repetitive tasks to replace or significantly reduce the need for
  human input. Introducing RPA has been at the heart of many insurers’ drive to create efficiencies and replace
  people with machines.
  Sophisticated RPA can extract data based on keywords and rules, verify claims, and integrate different data
  sources. RPA can also be applied to policy administration and data collection for underwriting. It’s a big topic,
  but we’re only interested in the data ingestion element for this report.

  IPA                          Intelligent process automation
  Unlike RPA, IPA uses AI to automate tasks. When documents contain unstructured data, RPA’s keyword or
  rule-based automation can be prone to error. IPA can understand context and meaning to automate tasks
  such as claims processing more accurately. Also known as cognitive process automation (CPA).
  The design of IPA requires a deep understanding of how the insurance workflow operates and, for many
  insurance markets, an appreciation of the specific language and nuances of how insurance organisations
  communicate with each other. Intelligent document processing (IDP) is a type of IPA which uses technologies
  such as OCR and NLP to convert documents into structured data.

  NLP                          Natural language processing
  NLP is used to read raw digital data and seeks to understand it in a way a human would. A simple form of NLP
  is “syntax” which aims to verify and improve the grammar of the written statement. “Semantics” goes further
  and attempts to extract context from free form text and convert this into a structure, classification and order
  that can then be analysed.
  One common challenge for data extraction in insurance is where the policy conditions (deductibles, limits,
  exclusions), may be explained in writing, rather than following a pre-determined and accepted format. This
  is particularly true in speciality or other forms of insurance where there are complex and unique coverages
  sometimes covering multiple pages of a contract.
  One type of NLP is sentiment analysis. This quantifies how positive or negative the language in an extract is.
  This could be used to understand how customers feel about their insurer. Another is named entity recognition
  (NER), which detects key entities and data within text including names, locations, times, and dates. Natural
  language understanding (NLU) or natural language interpretation (NLI) are extensions of NLP focusing on the
  context and intent.
  NLP applications enable the technology to be trained to extract relevant information for underwriting. More
  sophisticated NLP can learn as it goes, teaching itself. NLP is hard to get right and is potentially error prone.
  Every application using NLP purposes should be able to assign a confidence level to what has been extracted.

  ITI                          Intelligent text ingestion
  A generic term used to describe the other processes identified here which convert unstructured text data into
  structured data, including OCR, Intelligent character recognition (ICR), Intelligent word recognition (IWR) and
  NLP.

 Source: InsTech London                                                                    © 2021 InsTech London

                                                       INSTECH LONDON Data Extraction and Ingestion | 13
Audio
 Acronym                     Terminology

 ASR                         Automatic speech recognition

 ASR technology turns an audio recording of speech into machine-readable text. Speech recognition could be
 used by an insurer to record phone calls with customers and apply NLP. Technology has advanced rapidly in
 this area.

Source: InsTech London                                                                  © 2021 InsTech London

          “At InsTech London we’re enthusiasts for Otter.AI which
          demonstrates the power of converting spoken word into
          a coherent readable document stripping out many of the
                  common human conversational foibles.”

 Image and video
 Terminology

 Image recognition technology

 Images can be analysed to retrieve information of what the image depicts. The technology may be applied in
 claims processing to assess the loss based on a photograph of a damaged insured asset, such as a car. This is
 one area where insurance specific technology organisations such as Tractable and Flyreel are making strong
 advances.

 Video recognition technology

 Analyses videos to retrieve information on the items and events depicted in the video. Applications in
 underwriting include assessing the nature and value of objects to be covered. CCTV and dashcam footage
 may be used in claims. Whilst more complicated to get right than static image recognition, there are clearly
 additional benefits from video that can reveal the full story of a claim event and enable a focus on specific
 frames at key points. Video analysis can also be used to determine the size of properties, converting images
 into dimensioned drawings, potentially even rendering construction types and finishes in a digital format.

Source: InsTech London                                                                  © 2021 InsTech London

            “Tools that use and access policyholder information in
            combination with video collaboration or pictures have
              become very important to insurers, not just on the
                underwriting side, but also to process claims.”
                                    Mark Anquillare, CCO, Verisk

                                                    INSTECH LONDON Data Extraction and Ingestion | 14
INSTECH LONDON Data Extraction and Ingestion | 15
Who needs ingestion?

 Source: InsTech London                                                             © 2021 InsTech London

All parties in the insurance value chain handle large amounts of data. Insurance transactions cause data
to flow through as many as seven different organisations, from the policyholder to the reinsurer with lots
of brokers in the middle. It’s a bumpy journey. Some data gets mixed up; other data gets lost in transit. It’s
not uncommon for new information to creep in. Lacking a single coherent backbone for the transfer most
organisations need to have some form of data extraction and ingestion processes.

Here is a brief guide to some of the different types of entities responsible for ingesting and sharing data.

Insurers
Personal lines

In a competitive market, insurers of all sizes must     but there is some consistency between brokers
improve their speed of response to customers to         and some of the underlying formats are recognised
remain competitive, and accuracy of risk selection      standards. Complexity will be squeezed out and
is paramount. In personal lines, insurers may benefit   competitive insurance pricing will increase the
from relationships directly with customers. It is now   alignment between the data available from clients
possible to shrink, to almost zero, the frictional      and how this is being passed to insurers. Likewise,
cost and other barriers to getting the data personal    the claims experience is simpler and lends itself to
lines insurers need. Whilst the data problem is not     further automation.
entirely solved in this area, competitive forces will
drive the next generation of enhancements. PDFs         On the other hand, it is still hard for speciality
have no place in personal lines insurance.              market insurers. Many of these insurers are at the
                                                        end of the game of pass the parcel with wholesale
                                                        brokers and retail brokers in the US and then
The SME                                                 London. They are largely at the mercy of the data
                                                        and formats they received. If they ask for different
At the smaller end of the small and medium              data or decline to write business because the
sized enterprise (SME) market, sole traders and         formats are not what they want, they don’t get
companies with less than 10 employees, the              shown the business.
insurance transaction is starting to look more
like a personal lines buying experience. The risk       Submissions are commonly sent via email
characteristics are broadly homogenous within           and contain insurance applications, exposure
different trades, and buyers can be nudged into         schedules, loss runs and various other
buying existing insurance products rather than          attachments. In many cases, critical information
requiring the bespoke contracts required for more       is within the body of the email itself. This is an
complex insurance transactions. Insurance is            area where extraction and ingestion services are
being bought direct and via brokers.                    most needed. Technology solutions can automate
                                                        the extraction of data from all these submission
In the UK Polaris and in the US ACORD are               documents, including from the email itself.
providing standards for simple commercial
policies. The data may still come in a PDF format,

                                                   INSTECH LONDON Data Extraction and Ingestion | 16
The emergence of the algorithmic underwriter,            First, for the brokers, manipulating data carries a
whereby data is automatically entered and then           cost, and may expose them to liabilities if errors
underwritten by an insurer will require, and drive,      are introduced. The formats they are receiving
better data. Canopius, Ki, Beazley and others are        from their clients are very varied, and there are
launching algorithmic underwriting units which           still no agreed standards amongst insurers in how
reduce the human costs, and hence can offer more         they receive data. Some of the major brokers see
competitive underwriting pricing. This will give         strategic benefits in improving the data they are
them the edge in demanding better data from their        receiving. Furthermore, data extraction can be
brokers and agents.                                      used to investigate market trends and establish
                                                         benchmarks to better serve clients. Improving the
Mid-market and large commercial                          quality and format of data helps with getting faster,
                                                         and maybe better, quotes from insurers.
With the increasing value in the size of the insured’s
assets, the information about these assets gets          The reinsurance brokers are leading the pack here
more complicated. Surveyors and risk engineers           by some way, particularly in catastrophe property
will often perform surveys, but these are rarely         underwriting. The three main brokers, Aon, Guy
provided in consistent formats. Data is usually          Carpenter and Willis, have established a key role in
buried in lengthy reports shared in hard copy or         supporting their insurance clients to improve their
PDFs. Schedules of locations may be provided in          own data quality, create consistent standards and
PDFs. Companies such as Archipelago have been            promote the sharing of information. This happened
founded to solve this problem by extracting the          over 20 years as RMS and Verisk (AIR) launched
data at source, from the end client and facilitating     global catastrophe models and, by default, defined
the entire data transaction via a single platform.       two proprietary standards that were adopted by
                                                         the market.
Brokers
                                                         The increasing move to e-placing platforms (see
The brokers are often looked at as being able to         our recent report e-trading platforms) and
improve the standards of data and how it is shared.      the desire by the brokers to support the new
It is true that brokers often sit at the intersection    algorithmic insurers will increasingly drive greater
between the client and the insured but changing          adoption of electronic standards. In the short term
the data that is received and passing it on to           though, we see only limited evidence of brokers
insurers is not as straightforward as it seems.          using ingestion and extraction tools themselves.

                                                         Nearly 20 years after early e-trading initiatives
                                                         proved to be a false dawn for insurance,
                                                         interest in the platforms space has been firmly
                                                         reignited. Our ‘E-Trading Platforms: Challenges,
                                                         Opportunities and Imperative’ report examines
                                                         what we can learn from the industry’s past
                                                         attempts, how current platforms align with the
                                                         needs of speciality (re)insurers, and the role
                                                         Lloyd’s and brokers have to play going forward.
            Click here for more information

                                                   INSTECH LONDON Data Extraction and Ingestion | 17
Extraction and ingestion expertise
Both established companies and start-ups are offering services to support data extraction and ingestion.
Some companies are tackling specific lines of business, others focusing on key parts of the value chain.
Different expertise is required for underwriting and claims.

We have categorised the companies offering data ingestion, extraction and data management services
into sub-groups as follows.

  The monster munchers

  The tech giants such as Google, Amazon and Microsoft offer generic services such as OCR through
  their cloud platforms (Google Cloud, AWS, and Azure). The sophistication and low cost of these
  services is hard to match.

                               We are not aware of any insurance specific companies focusing only
                               on OCR and most specialist data companies are tapping into these
  Company examples
                               industrial strength capabilities. Other companies exist in these
                               categories but are out of scope for this report.

  The necessary and the needed

  Many of the companies undertaking extraction and ingestion are doing it as a requirement to offer
  more sophisticated analyses or perform a different core function. They rarely set out to become
  ingestors and extractors but have found that this is necessary to get the data into their applications
  and extract value for their clients. Many of the companies we feature have these characteristics.
  We believe these organisations are amongst those that are most likely to be successful at creating
  robust ingestion processes because they are a critical component of more complex (and hence
  generating higher fees) tools. If the ingestion doesn’t work, then neither will the core products.

                               When describing their services organisations such as Archipelago
                               (obtaining data directly from insured corporations) or Cytora (providing
                               triage services for commercial underwriters) tend to focus on the high
  Company examples
                               order value benefits they are offering. Ingestion is usually an important,
                               but secondary application. Artificial is embedding ingestion into its
                               broader analytical platform (see case study).

               “Insurers had been documenting claims, including
             photos of damaged vehicles, for decades. There were
             millions of examples sitting on servers. We got access
            to data from our global customer base to train the AI.”
                      Adrien Cohen, Co-founder & President, Tractable

                                                 INSTECH LONDON Data Extraction and Ingestion | 18
The insurance insiders
Often of smaller scale these companies will have a clear focus on the business problem, and we can assume
bring expertise to handling large volumes of data accurately and efficiently. As with all early-stage companies
though, there are risks that these organisations have too narrow a focus, or applications that are hard to scale
beyond one area into another.

                                 Start-ups and scales up such as Convr, CLARA Analytics and Groundspeed
Company examples                 are creating solutions specifically designed for insurance documents and use
                                 cases.

In-house and in control
A core group of companies have developed outsourced analytical and data cleansing functions that have
enabled insurers to avoid having to tackle the data ingestion problem themselves.

                                 Xceedance, RMS and Verisk all have teams of data analysts, frequently
                                 offshore in India or other countries. In-house tools have been developed to
                                 help with the data extraction. These tools are not commercially available, but
Company examples
                                 we believe there is a lot that can be learnt from companies such as these that
                                 are tackling this problem at scale across multiple different data inputs and use
                                 cases.

The standard bearers
There are a small number of companies that are valiantly trying to provide standards across all the market.

                                 The most widely known is ACORD which has standards commonly used in the
                                 US, but still with limited support around the world. In the UK Polaris provides
                                 standards for commercial insurance.
                                 Default standards have emerged in catastrophe modelling from RMS and
                                 Verisk. The London Insurance Market Operations & Strategic Sourcing orga-
                                 nisation (LIMOSS) sources and operates market services for the London
Company examples                 insurance market. These companies are slowly winning the standards battle
                                 but still have not achieved enough escape velocity to convince us that they
                                 alone will drive widespread adoption of standards. For those looking to
                                 understand the full scope of standards available we recommend the work
                                 that QOMPLX are doing with the ReQoncile initiative which seeks to
                                 catalogue existing available standards. InsTech London is represented on
                                 the advisory board of ReQoncile.

The health and hygiene inspectors
The companies in this category are not explicitly focussing on ingestion and extraction, but they play a critical
role in ensuring that ingested data is fit for purpose. In the same way that you wouldn’t drive a car without a
functioning dashboard of dials and warning lights, insurers need to be able to validate that the data they are
receiving is trusted and accurate.
A sense of humility is essential for even the most complex AI companies and their tools to ensure that the AI is
intelligent enough to admit to what it does not know or can’t categorise.

                                 Precisely and DQPro license tools to help insurers identify errors in the data
Company examples                 received from their business partners, with capabilities to flag the issues for
                                 manual review.

                                                    INSTECH LONDON Data Extraction and Ingestion | 19
Words of advice:
Choosing your ingestion and extraction partner
Much of the advice related to identifying a technology company to partner with when considering your
ingestion and extraction needs is true of any technology partner. Whether you are an insurer looking to
license a service, a technology company building one or a consultant making recommendations, these
are our suggestions for your checklist.

                                                                              Confused or needing
                                                                                 clarification?

                                                                         Talk to us about InsTech London
                                                                              corporate membership
                                                                               hello@instech.london

                                               INSTECH LONDON Data Extraction and Ingestion | 20
1. Start with your data organisation strategy
Data extraction and ingestion processes should fit within a coherent data organisation strategy. This includes how
data is collected, stored, and made available to those that need it. A focus on the most critical and higher value
business decisions that will be influenced by the data helps define where to start.

2. Is the proposed application integrated?
There is no point solving one problem and creating a new one. New data solutions must either integrate with existing
business practices or be part of creating a new and improved process that fits the insurers’ goals. Cloud-based and
API integrations are prerequisites. Companies with a data organisation strategy will recognise the need for flexible
product configurations to accommodate future changes.

3. Who are the vendor’s existing partners and customers?
Early adopters and those comfortable with life on the bleeding edge may not require evidence of other partners or
customers, but on average 80% of technology buyers are looking for some proof points of existing success with
other companies before committing to new technology.

4. Which standards does this new technology work with and support?
Data standards are becoming increasingly open source, or available cheaply via ACORD, Polaris, and others. Not
every underwriting process or claims assessment uses standards, but many do. Companies building new technology
can overlook the business use cases and standards available if they are not familiar with the insurance applications.
If you want to benefit from standards that are being used, check that the tools have the capability to work with these.

5. Is there a precedent for your business application and use case?
Specific data extraction and ingestion solutions should be tailored to the input data a company receives and the
intended application. Documents submitted by clients require different solutions to those submitted by brokers.
Products suitable for companies receiving lots of image and video data may be unsuitable to those whose main
challenge is interpreting contracts. Some solutions will be specific to an application in the workflow, such as
underwriting, claims processing or policy administration. Sometimes it will be necessary to configure applications
to your solutions, but it’s better to discover that early.

6. What is the evidence of industry experience and are there ancillary products?
How well do the people building and supporting the tools understand your needs? As we identified above, ingestion
can be bundled in with other applications that rely on effective ingestion to work. Is there proof that data can move
cleanly along the whole value chain from original import to ultimate decision making and recording in a system
of record? Companies that have been around for 10 years or more, such as VIPR are likely to have a breadth of
established clients and experience and, as we have seen recently with Charles Taylor and CoreLogic, it is not only
the start-ups that are getting injections of fresh investments to enhance their product offerings.

7. Do you get confidence scoring and escalation?
Technology-enabled data extraction and ingestion solutions may be less error-prone than manual rekeying, but
these applications are still subject to error. Companies looking to use data extraction solutions should consider the
implications of different types of errors, including words or numbers wrongly transcribed by character recognition
technology and documents being incorrectly classified by natural language processing. Some solution providers,
such as Artificial and Eigen Technologies, give confidence scores for each data point. Another method for managing
error is using a post-ingestion data monitoring system like DQPro.

8. Can you try before you buy?
When buying from an established company with proven solutions for business needs that match your own, then
it may not be necessary to run a proof of concept (POC). For newer applications and early-stage companies avoid
committing to longer term contracts until the technology is proven. A POC trial is a possibility but may still involve
costly set-up fees and lengthy internal approvals for insurers. Sometimes it’s better all round for the users to license
the product but ensure there are provisions for early termination if the solution doesn’t deliver as expected.

9. Is there a market solution already?
Insurer marketplaces and organisations are seeing the benefit of creating central data cleansing and ingestion
services. Charles Taylor InsureTech provides a data manager available to all insurers in Lloyd’s, operated in
conjunction with LIMOSS. Some of these are free or lower cost and are worth reviewing alongside a bespoke solution.

10. Ask us
We are continuously monitoring what is happening in this area and are regularly being approached by companies
that may not have been included in our reports. If you would like to discuss any of the companies in this report, more
generally understand what is happening in this area, or are interested in sharing information about your technology
or needs, please contact us hello@instech.london.

                                                       INSTECH LONDON Data Extraction and Ingestion | 21
Future state
Ultimately the solution to data ingestion can only be solved once the industry is willing to reject data
being sent in formats that are not standardised and not digital. Whilst the applications of AI will get more
sophisticated, it is a poor testament to the industry as a whole that an entire sub-category of companies
has had to emerge to solve a problem that, for the most part, the industry has created itself. There will
always be a need for companies like Tractable or Flyreel to deal with the complex nature of damaged cars
or the usual chaos of our homes and possessions. Surely though, as an industry, we can figure out a way
to pass data between consenting parties without messing it up in the process?

   What did we miss?
   This is the sixth in our ongoing series of reports reviewing the state of technology in the insurance
   market. Our previous reports have been very well received, but inevitability we cannot and do not
   cover every company and every technology in the broader areas we review. We will be returning to
   this theme regularly so please do tell us what we missed and what you would like to see in future
   reports at hello@Instech.london.

                                                  INSTECH LONDON Data Extraction and Ingestion | 22
Extraction and ingestion companies – the InsTech London members

The companies listed below are amongst the leaders in the data extraction and ingestion in the insurance
space. We are talking to these companies at least once a month and interviewed them specifically for this
report. Examples of InsTech London engagement with these companies can be found at the end of each
individual company report.

• 360Globalnet                                                                                        24

• ACORD                                                                                               25

• Allphins                                                                                            26

• Archipelago                                                                                         27

• Artificial                                                                                          28

• Charles Taylor InsureTech                                                                           29

• Cytora                                                                                              30

• DQPro                                                                                               31

• Eigen Technologies                                                                                  32

• EY                                                                                                  33

• LIMOSS                                                                                              34

• Polaris                                                                                             35

• RMS                                                                                                 36

• Tractable                                                                                           37

• Verisk                                                                                              38

• VIPR                                                                                                39

• Xceedance                                                                                           40

                                                INSTECH LONDON Data Extraction and Ingestion | 23
360GLOBALNET

  Founded: 2010       Head Office: United Kingdom               Click here for InsTech Member Profile
  Data Types:             Text        Tables        Video        Static image

                        Point of underwriting      Loss modelling and analytics     Compliance
  Workflow:
                        General policy administration       Claims

Introduction
360Globalnet has built a cloud-hosted digital technology insurance claim platform, built using no-code.
The initial focus of the company was on acquiring video, imagery and unstructured data technology
with the aim of improving, automating and speeding up the claims process, as well as detecting more
fraudulent claims.

The company has processed over 3.5 million claims for major insurers in the USA, Europe, and the Far
East. It works in areas including motor, personal injury and business interruption.

Ingestion problem being solved
Its 360Retrieve product ingests all unstructured data including documents, e-mails, file notes, text
within images and free text fields from case management, turning all unstructured data searchable,
capable of interrogation and analysis.

360Globalnet functionality includes document interrogation and retrieval, unstructured data analysis,
document classification and semi-structured data extraction. It is also able to undertake entity matching.

360Retrieve is also designed to reduce fraud through the analysis of claims data.

                     Products                                            Insurance clients

  360Retrieve                                           Direct Line, esure, ERS, Mulsanne

                     Partners                                         Non-insurance clients

  Synectics, Percayso, KPMG, DXC                        Thames Valley Police and Lancashire Constabulary

                                       InsTech London discussion

  Report: No-Code/Low-Code Platforms - A Bridge from Legacy to Digital? – May 2021

  Report: Location Intelligence 2021 - the Companies to Watch – March 2021

  Live chat event: Data is the new oil: fracking unstructured content – December 2020

  Live chat event: No Code Insurance Platforms. Hype or Game Changing? – September 2020

  Live chat event: Technology for Commercial Property: The Future Has Arrived – May 2020

                                                   INSTECH LONDON Data Extraction and Ingestion | 24
ACORD

  Founded: 1970        Head Office: United States

  Data Types:            Tables

  Workflow:              Point of underwriting        Claims

Introduction
The Association for Cooperative Operations Research and Development (ACORD) is a not-for-profit
organisation providing data standards to the international insurance industry. Its goal is to make straight-
through processing easier. Its over 36,000 members are brokers, (re)insurers and technology companies.
Prompted by its members, ACORD is now focused on ways to promote standard adoption beyond the
US. It has created a technology arm, ACORD Solutions Group (ASG).

Ingestion problem being solved
Inconsistent data standards between organisations slow the ingestion and management of data. ASG
provides components to technology vendors to implement in their own products. ASG’s partner vendors
cover 95% of the London Market. Brokers, insurers and reinsurers can also license ASG technology
directly.

Transcriber is a mapping and transformation tool for document extraction. It converts risk, policy and
financial account documentation, including schedules, bordereaux and invoices, from PDF or Excel
format to ACORD formats for placing, premium accounting and claims transactions. All ACORD forms
and revisions are included. It also includes a visual mapping tool to integrate non-ACORD documents.

Converter uses natural language processing to load data from ACORD structured formats and non-
ACORD unstructured formats into ACORD-compliant messages. Conductor provides a data exchange
for global and London market messaging. The components Transcriber, Converter and Conductor form
part of ASG’s ADEPT data exchange portal.

                      Products                                            Insurance clients

                                                           Aon, Marsh, Zurich, Hannover Re, Munich Re, Swiss
  ADEPT (Transcriber, Converter, Conductor)
                                                           Re

                      Partners                                         Non-insurance clients

  DXC Technology, VIPR, Sapiens, Sequel, Guidewire,
  Whitespace

                                        InsTech London discussion

                                                    INSTECH LONDON Data Extraction and Ingestion | 25
ALLPHINS

  Founded: 2018         Head Office: France                         Click here for InsTech Member Profile
  Data Types:              Tables

  Workflow:               Point of underwriting         Loss modelling and analytics       Claims

Introduction
Allphins is a data analytics and technology platform, primarily intended for reinsurers. The initial focus
has been on allowing (re)insurers to manage energy-related exposures both offshore, onshore and for
renewables. It is now offering services in other classes of risk including political, credit, terror and cyber.

Ingestion problem being solved
Exposure and risk data can come from disparate sources in a range of formats. These can include
spreadsheets and outputs from several different types of underwriting systems. These risks need to be
consolidated in one place in order for them to be interrogated and analysed by underwriters.

Allphins has created a machine learning algorithm to allow insurers to digitise submissions being
received and organise the data. Allphins also provides its own data to complement what is available
to the insurers. Enriching data in this way is only possible when the specific asset or company being
insured is recognised. Allphins uses natural language processing to recognise, for example, the same
assets which have been named differently. Input data can be accepted in many different formats in
order to identify the risks, exposures and insureds. The Allphins tool enriches the data so that it can
be analysed for different aggregation scenarios. Allphins has a database built with open-source data,
which allows insurers to access more attributes related to specific locations, assets or companies than
are available on the data received from the broker or end clients. Allphins offers a range of output
formats for its clients.

Aggregation of the loss potential across multiple risks is complex and must take into account
geographical events, supply chain events and sector events, for example. Allphins has invested
in building a strong user interface for data visualisation and to make the data and tool accessible
to underwriters for risk selection and underwriting, including exposure analysis over time or versus
premium. This helps underwriters optimise the selection of their policies considering premium along
with the marginal impact on exposure scenarios, to optimise their capacity.

Allphins’ platform also helps insurers understand their exposure when a claim occurs. The company is
exploring further services relating to claims. These include using past claims events to test a company’s
books against typical claims and predicting risks based on claims. Allphins participated in Cohort 3 of
the Lloyd’s Lab.

                       Products                                              Insurance clients

  Allphins Energy Platform, Political and Credit Risk        TransRe, MSAmlin, Greenlight Re, Ariel Re, Arch Re,
  Platform, Cyber Platform                                   Blenheim

                       Partners                                           Non-insurance clients

                                                             None

                                          InsTech London discussion

  Report: Location Intelligence 2021 - the Companies to Watch – March 2021

                                                        INSTECH LONDON Data Extraction and Ingestion | 26
ARCHIPELAGO

  Founded: 2018        Head Office: California                   Click here for InsTech Member Profile
  Data Types:            Tables        Text

  Workflow:              Point of underwriting     Loss modelling and analytics

Introduction
Archipelago is a risk data platform helping large commercial property owners manage their data, assess
their risks, and efficiently connect to their insurers. Archipelago’s platform digitises risk, enriches data,
and connects risk managers, brokers, and insurers during renewals.

Ingestion problem being solved
Insurers struggle to source trusted, high-quality information about commercial property to inform their
underwriting decisions. The best data about these exposures lies within the customers, the owners and
managers of these properties, but it is often buried inside documents.

This data has also been hard to share in the traditional formats that generally get passed to brokers and
insurers. Archipelago helps owners extract critical underwriting data from internal systems (structured
data) and documents (unstructured). The company offers a secure, immutable, on-platform link to the
insurers who can then analyse the data, with confidence in the source provenance. Model-ready files
with complete, high-quality data are available to support their underwriting process.

By creating a more standardised process for underwriting, risk selection and assessment for large
commercial property submissions, insurers benefit from increased efficiency and gain greater
confidence in the basis on which underwriting decisions are made.

                      Products                                          Insurance clients

                                                        C.N.A.

                      Partners                                        Non-insurance clients

                                                        Prologis, JLL, Alexandria

                                        InsTech London discussion

  Podcast: InsTech London Podcast interview with Hemant Shah – episode 145– July 2021

  Report: Location Intelligence 2021 - the Companies to Watch – March 2021

                                                  INSTECH LONDON Data Extraction and Ingestion | 27
ARTIFICIAL

  Founded: 2013       Head Office: United Kingdom              Click here for InsTech Member Profile
  Data Types:           Tables        Text

  Workflow:             Point of underwriting       General policy administration     Compliance

Introduction
Artificial provides an insurance platform for data ingestion and augmentation, digital underwriting and
policy management. The platform, artificialOS, helps insurers to structure incoming data in a digital
format. This information can then be augmented with third-party and internal data sets to enable
underwriters to assign a score to individual risks and accept, refer or decline submissions automatically.
The company also provides tools for digital distribution that can be built on top of existing legacy
platforms or on Artificial’s core systems. Artificial partners with market-leading OCR providers. It is
regularly testing these technologies and has the flexibility to change providers to take advantage of
these new developments.

Ingestion problem being solved
Artificial’s technology enables its clients to extract customisable core data points from submission
emails, contracts, schedule of values or bordereaux files. The relevant document can be consumed
securely by email or API.

Artificial is able to use third-party data sources to complement the extracted data and give a broader
picture of risks. Examples include integration with sanctions checking, address-matching services, and
the client’s own internal sources.

The extracted data then flows to the Artificial Appetite Engine, which can be used to make granular
automated decisions about the risk under consideration. The customer can decide which factors are
important for each decision, as well as the minimum confidence level associated with them. When the
given factors have a confidence score below the set threshold, the system will flag that the results need
to get reviewed manually. Artificial describes this as “human-in-the-loop”.

                     Products                                            Insurance clients

                                                         Chaucer, Aon, Convex, AXIS

                     Partners                                         Non-insurance clients

  Capita, AWS

                                       InsTech London discussion

  Podcast: David King: Founder of Artificial Labs: Organising the world’s data – February 2020

                                                 INSTECH LONDON Data Extraction and Ingestion | 28
You can also read