The cyber data conundrum part 1: cyber risk data in insurance

In today’s highly interconnected world, every enterprise has cyber risk. As technologies evolve, this risk becomes more multi-dimensional, and inevitably leads to rising levels of cyber risk data. Leveraging this data, enterprises, their IT teams, and the third-party security providers work tirelessly to manage and mitigate that cyber risk.

However, no amount of cyber risk mitigation or management can completely eliminate cyber risk. This is where enterprises must look to transfer their risk. Enter cyber insurance.

The challenge cyber insurers then face is to identify which enterprises or accounts are “bad” risks that the insurer would like to avoid, which are “good” risks that they’d like to add to their portfolios, what types of coverage the insurer is willing to offer the enterprise/insured/policyholder, and how much premium they need to charge that is commensurate with the enterprise’s level of risk.

In order to differentiate between these risks, having access to data and analytics is key. Insurers need to be able to make informed risk decisions using the appropriate information.

In this blog series all about The Cyber Data Conundrum, I’ll be discussing how insurers should manage the collation, analysis and curation of cybersecurity data, as well as how CyberCube’s unique approach meets the needs of the cyber insurance industry. In the first part of the series, I’ll highlight some of the challenges that insurers face when using cybersecurity data for cyber insurance.

How can you get the signals you need?

Essentially, signals are built by processing raw data. Raw data is unstructured, unassociated data gathered through a variety of means including port scanning, dns sinkholes, dark web intelligence, among others. It may have one or more identifiers, like an IP range or address, domain, or email address, etc. but observations remain rough and in petabytes of unmatched, unmapped, fragmented, duplicated and noisy disaggregated storage.

The process of collecting data is also critical. The complexity of the data collection activity by itself requires enormous effort and attention to detail. For the sake of scalability, many large-scale data collection efforts rely on simplified steps of the communication processes to establish the presence of servers or services. If improperly executed, this could lead to spurious observations that may not truly be indicative of a company’s risk posture.

Once properly collected, the post-collection data engineering effort necessary to actually make this data usable for insurance begins. The data needs to be appropriately contextualized, which means processing, cleaning, deduplicating, associating and mapping to specific companies, before being correlated and mapped to specific companies, then further analyzed to develop the relevant signals that underwriters need to understand the risks they’re evaluating.

If you are an insurance carrier, an MGA, or any other type of underwriting shop, you have a few options when it comes to implementing cyber risk signals into your underwriting workflow:

1. Collect the raw data and build everything from scratch yourself
2. Contract with a third-party firm that collects, cleans, processes and maps the raw data on your behalf, and then build the signals yourself
3. Partner with a third-party firm that collects, cleans, processes, and maps the raw data, and then generates signals on your behalf. You can then apply those raw signals using your own understanding of cyber risk

Given the complexities, domain expertise, and in-house resources needed for options one and two, partnering with a signal and cyber data vendor to build signals on your behalf is becoming all the more common.

What to consider with a data vendor

By using third-party cyber risk signals and scores, the underwriting process can be more efficient, and underwriters can be better informed with a unique view of risk — especially if the time available to spend on an account is limited due to either the volume of submissions or the size and complexity of the submission. Signals and scores provide an additional method and lens for risk triage, vulnerability identification, data corroboration and overall cyber risk hygiene.

For those looking to partner, there are multiple aspects you should keep in mind when evaluating data and signals for your insurance underwriting, including, but not limited to:

- What is the signal and/or score measuring?
- How was the score/signal built?
- What is the resolution of the signal (and resolution of the underlying data)?
- What is the signal mapped and attributed to (domain, IP, company, etc.)?
- What is the relevance of this signal for Cyber Underwriting?
- What is the correlation of this signal to the frequency of an event?
- What is the correlation of this signal to the severity of an event?

It’s important to understand how the third-party data, signals, and scores truly suit your underwriting needs before choosing a partner you can trust. Answering these questions, and understanding not only how you will apply signals and scores to your underwriting process but also how much these signals will improve the outcomes of underwriting decisions, should guide your decision.

The misapplication of real-time data

An underwriter is trained and conditioned to swiftly sort through assessment questionnaires and policy language — and they value efficiency to make well-informed decisions of risk. Cyber risk modeling tools and cyber risk signals that cut through the noise are essential to achieving this.

However, underwriters are facing a Cyber Data Conundrum: the volume of data available makes it difficult to know which signals are important. Constant streams of high resolution data can become overwhelming, and ultimately, less is more. Effective underwriting comes down to having the right data.

There is a common sentiment in the cyber insurance underwriting market that underwriters need real-time signals and real-time cybersecurity data. The sentiment extends and suggests that for underwriters to have relevant data, it needs to be given with high fidelity, and needs to be the most recent snapshot. While there is some relevance of having the most recent daily or hourly view of a company’s performance in certain contexts, such as cyber monitoring and mitigation, that granularity of data is noisier and less relevant than you may think in insurance underwriting.

Generally, a daily or real-time feed of observations is less fit for most insurance purposes. This can be for a variety of reasons, such as they can be overly technical, they are not always contextualized into cyber policy and claim considerations, they do not capture or smooth out false positives and fleeting observations, or they do not offer recommendations for remediating controls and questions to ask. Let’s take a look at why.

Real-time data: is it statistically significant?

Furthermore, the real-time data, given its recency, may not have established itself as statistically significant and relevant to the likelihood of a claim. A daily view may overlook one of the most critical aspects to a given signal’s relevance and lift in an underwriter’s decision making: how material is this observation, and will it materialize into a claim? While evaluating a daily resolution or real-time feed, you should question how this signal has been validated, and whether it is necessary for all signal types.

Real-time data can be noisy

There is also a lot of noise with daily resolution observations, such as how they may already be in the pipeline for remediation by a company. Users of this data must return back to this dataset to see if this has changed.

Real-time data needs context

Context also matters. What does a binary observation for a company really indicate? Most insurance underwriting is not done in a vacuum. Underwriters are seeking to contextualize this signal and compare it to other companies and their observations, just as they compare one risk to another. Does having a few vulnerabilities matter? The answer is best answered with deeper context about the observation, its persistence, and efforts to resolve over time.

Real-time data in insurance

Real-time data does have a place in insurance, however, provided the limitations of real-time data are understood. Once understood, appropriate accommodations may be made in the selective-design and curation of the real-time data for the appropriate application by underwriters, specific workflows, and other tailored needs and use cases. One such approach will be discussed later in this blog series.

Get the data you need. Leave the rest

In this blog, Cyber Data Conundrum Part 1, I’ve identified some of the challenges you may face when attempting to apply cybersecurity data for insurance underwriting.

While data is necessary to better determine “bad” risk and “good” risks, it’s essential that your underwriters are using the right data, which is dependent on your cyber risk analytics solution. Many solutions in the market today can overwhelm underwriters with an ocean of cyber risk data, leaving them to drown in irrelevant information.

At CyberCube, we have years of experience in building models, analytics, and signals that are continuously refined to help the insurance industry understand their insureds’ cyber risk and stay current on the ever changing threat landscape.

To learn more about utilizing the signals with statistical significance to best understand cyber risk, check out our free report — Evaluating Cyber Risk Signals as Indicators of Future Incidents.

Stay tuned for part two of the Cyber Data Conundrum series — What signals say about the state of the dam.

Enabling exceptional outcomes for our clients

Life at CyberCube

Projecting Cyber Insurance Growth: A 10-Year US Market Outlook

The cyber data conundrum part 1: cyber risk data in insurance