|||

Benford's Law: Potential Applications for Insider Threat Detection

Aamo, I., "On the Use of Benford's Law to Detect JPEG Biometric Data Tampering." *Journal of Information Security*, 8, 2017, 240-256. https://file.scirp.org/pdf/JIS_2017071914213246.pdf

Reese, M., "Why Cyber Security Should Care About Benford's Law." *LinkedIn*, 2019. https://www.linkedin.com/pulse/why-cyber-security-should-care-benfords-law-mindy-reese

Sarkar, T., "What is Benford's Law and Why Is It Important for Data Science?" *Towards Data Science*, 2018. https://towardsdatascience.com/what-is-benfords-law-and-why-is-it-important-for-data-science-312cb8b61048

Detecting anomalous network activity is a powerful way to discover insider threat activities. To establish baseline traffic and process traffic data. This post explores how a mathematical law, already used in forensic accounting, may help detect insider activity without the effort of traditional anomaly detection.

Benford's law of anomalous numbers states that generally, in naturally occurring collections of numbers, the leading digit is likely to be small. The resulting downward-sloping curve can be used as a baseline for determining whether a dataset is genuine or fabricated.

Accountants often compare the leading digits of financial transaction data, such as ledger entries, to a Benford curve to spot anomalies that may indicate fraud. The same technique can be used to detect irregular network activity and other data that may indicate malicious insider activity.

Mathematics

Benford's law is grounded on base-10 logarithms that calculate the probability that number *x* will begin with digit *d* if *log*10(*x*) lies in the** interval of length ***log*10(*d*+1) - *log*10(*d*) = ** log10(1+1/d ).** When plugging in the digits 1 through 9, each subsequent digit has a diminishing probability that it will be the leading digit.

*Figure 1: Logarithmic Intervals of Leading Digits, Based on log10(x)*

The size of the number doesn't matter. Whether you're dealing with five-digit or two-digit numbers, the *probability* of a given leading digit can be predicted for data fitting Benford assumptions by looking at the first two decimals of the **base-10 log** of the number.

Consider 1,002: *log*10(1,002) ≈ 3.000867. The first two decimals are within the .00-.30 interval, for base-10 log values of numbers with a leading digit of 1. This position reflects the fact that 30 percent of naturally occurring numbers that fit the Benford assumption have a leading digit of 1. Similarly, consider 52: *log*10(52) ≈ 1.716000334. The first two decimals are within the .70-.78 interval, for base-10 log values of numbers with a leading digit of 5.

*Table 1: Example of Base-10 Logs for Leading Digits 1-9*

The conclusion from all this math: numbers in a dataset that fits all the Benford assumptions should follow this distribution of leading digits, with 1 being the most common and 9 being the least.

*Figure 2: Probability Distribution of Leading Digits Under Benford's Law*

For a conclusion on a Benford curve to be valid, **the data must (1) be numeric, (2) be randomly generated, (3) be large, and (4) represent magnitudes of events.** Many types of data fit these assumptions, including population counts, accounting data, and network traffic. **Data comprising numbers used as identifiers**, such as phone numbers and social security numbers, **violates the assumption that the data is generated randomly.**

*Figure 3: Leading Digit Distribution of Population Data*

Benford's law is widely used in accounting to examine data for anomalies that may indicate fraud. Accountancy data generally follows the four assumptions required for a valid conclusion on a Benford curve: general ledgers, income statements, and inventory listings can all be compared to the curve to determine genuineness.

This analysis may be admissible evidence of fraud in federal and state courts. The forensic accounting community generally accepts the methodology, which is referenced in the Fraud Examiners Manual. Forensic accountants, fraud examiners, accountants, and auditors use Benford's law to detect anomalies that require investigation. The combination of the method's widely accepted usage, academic reputation, and wide availability of experts make the admissibility of Benford analyses likely.

Network traffic typically follows the four assumptions required for a conclusion on the Benford curve to be valid. The Benford analysis' long-standing use in accounting and its suitability for information security's naturally generated data make the process viable for technical insider threat. Benford analysis is especially useful in detecting both highly likely and unlikely data points, so it serves as a dual measure of both normalcy and aberration.

**Current cybersecurity systems rely heavily on identifying anomalous behaviors**. Looking only for known signatures does not address the breadth of the threat landscape--**unknown signatures are equally important.** Anomaly detection is generally hard to establish because creating a baseline traffic profile and processing the large amount of traffic data are time-consuming processes.

Benford's law can help avoid the effort of baseline-derived anomaly detection. If the network traffic conforms to the assumptions of Benford's law, any traffic data deviating from the Benford curve can be considered an anomaly. Benford's law performs much of the legwork, rather than manual computation.

A small-scale example application of this technique can be demonstrated with spreadsheet macros.

To demonstrate the potential applications of **Benford's law to insider threat detection**, let's explore some scenarios inspired by those we capture in the CERT Insider Threat Incident Corpus.

*An employee creates fictitious invoice charge data to hide their illicit activity by randomly typing numbers on the horizontal number keys. Another employee notices irregularities in the Benford analysis of the invoice data, and the employee who created the fictitious data is caught.*

In this situation, the digits 4, 5, 6, and 7 occur as the leading digits more frequently because of the employee's hand placement on the number keys. Even fabricated data that seems random can be separated from genuine data.

*Figure 5: Data Generated by Typing on the Horizontal Number Keys*

*A disgruntled co-founder of a tech company argues with his partner and decides to leave the company, but not before downloading large trade-secret files. The co-founder has authorized access to the trade secrets and regularly views and works with the files. He deals with numerous uploads and downloads on a daily basis, so he doesn't think he'll get caught.*

Measures of network traffic generally follow a Benford curve. Though the co-founder typically deals with the trade secrets and has high network usage, his unexpected increase in normal network activity shifts the distribution of leading digits in the company's network traffic, signaling an abnormality. An analytic to detect changes in the statistical distribution of network activity triggers an alert of suspicious activity. In this case, the co-founder does not get away with it.

*An employee finds out he is going to be laid off and decides to launch a denial-of-service (DoS) attack on the company's network. The company's IT department has recently established baseline interval times and packet lengths. They are quickly able to identify the anomaly caused by the employee and stop the attack.*

Benford's law is especially useful in detecting DoS attacks because flooding a network with data breaks the naturalness of network traffic.

It is important to use the resources that we already have access to. Many accounting departments having longstanding experience with Benford analyses, so applying the Benford framework to an information security context should be simpler than creating new techniques for monitoring threshold activity. This control does not rely on labeled historical data. Instead, it leverages the data's natural conformity to the assumptions of Benford's law and tests that conformity against the Benford expectation.

Not all organizational data fits the Benford assumptions. For example, organizations that consistently facilitate transactions with high leading digits may find that the Benford method is of limited use.** In the future, we could compare the return on investment and efficacy of using Benford analysis for anomaly detection compared to more conventional statistical methods used for insider threat, such as ****Bayes' theorem****.**

https://m.sciencenet.cn/blog-3413082-1266334.html

上一篇：[转载]评估信息物理系统的威胁建模方法

下一篇：安全与技术变化

扫一扫，分享此博文

Archiver|手机版|**科学网**
( 京ICP备07017567号-12 )

GMT+8, 2022-10-7 06:41

Powered by **ScienceNet.cn**

Copyright © 2007- 中国科学报社