Natural processes have characteristics that get disturbed when there is motivated action. Numerical markers of “normality” can then signal anything unusual, in a way that those responsible may find it difficult to conceal or where normal detection may take more time or effort. Professor Shankar Venkatagiri, mathematician and member of the decision sciences and information systems area at the Indian Institute of Management, Bangalore during the annual meeting of the Indian Railway Accounts Service at the Rail Wheel Factory in Bangalore, described features of numbers and the way that fraud detection agencies, as well as the world of business, make use of patterns to detect threats and opportunities.
A little known property of numbers that arise in natural processes is that the first digit of these numbers is not uniformly distributed, but tends to be low, like one, two, or three rather than high, like eight or nine. For example, the height of mountains in feet, or of buildings in millimeters, would be numbers, typically, from a few hundreds to many thousands. Now, the first digits of actual numbers, which may be 12,335 or 8,322 or 6,345, for instance, are one, eight and six, in these examples. Would there be a tendency for this first digit to be preferentially in some range, rather than be uniformly distributed from one to nine?
While one would normally expect that all the digits from one to nine are equally likely to be the first digit in long lists of numbers, which cover many orders of magnitude (rather than stay in a limited range), Venkatagiri explained that there was a counter-intuitive law which said this was not so. The Benford’s Law, he said, was that one was the first digit as often as 30 per cent of the time and nine appeared at the first place only 4.6 per cent of the time. The percentage of times that all the digits arise and a graph of how they fall, from 30.1 per cent to 4.6 per cent, is shown in the picture. This rule about how the first digit is more often not a lower number has been verified in a great many instances, like the area of lakes in a district, population sizes, birth or death rates, electricity bills and commodity prices. It will be noticed that these are numbers that arise “naturally” or without a design that affects the first digit.
This would not be the case, say, in the height, in inches, of the average 12-yr-old, which would be between 50 and 60 inches, with five as the most common first digit. The area of a lake, in square metres, or populations, for instance, could be anything from a few hundred to thousands or even hundreds of thousands. While this feature of the first digit being low numbers rather than high ones would seem surprising at first, it can be understood with a little analysis. The number 1 we can see, occurs as the first digit, first, by itself, then from the numbers, 10 to 19 and then from 100 to 199, and so on. The number, two, similarly, occurs as the first digit first by itself, then from 20 to 29, and then from 200 to 299 and so on. The same sequence is true of three. What we notice is that one gets repeated first within nine numbers of its first appearance, and then after just the next 80 numbers. But the number, two has to wait for 18 numbers before the first repetition and then for 170 numbers before the second repetition. The wait before repetition keeps extending, like this, for the numbers, three to nine — it being from 100 to 899, or 799 numbers before the second repetition of nine as first digit. That is ten times the wait of only 80 numbers for one. When we reach higher numbers, the distance between successive appearances of the higher digits extends exponentially or the greater the number, the more marked the higher separation of occurrence. This is the reason that in a collection of numbers that cover a wide range, the distribution of first digits follows Benford’s law.
A direct application is to capture the numbers generated in a system and to keep checking if the first digits follow Benford’s law. One kind of fraud in banks, for instance, is with the daily interest calculated on balances. The fraudster manipulates the system to add some small figure to the interest worked out on a thousand accounts and transfer the total amount to a separate account that the fraudster can access. If the bank had a “Benford’s law check system” in place, it would regularly inspect the first digits, and also some other features of the numbers in the bank’s records. If all is well, the numbers follow Benford’s law. But if there is a systematic change being made, this would reflect in how the first digits appear and alert the bank’s auditors. A similar application could be in the data collected through surveys. Figures that arise from honest surveys show features that do not appear in fictitious data or even in data where there have been errors in sampling. Applying statistical checks on the numbers could then show that corrections need to be applied. This kind of check could be vitally important in statistical quality checks or checks that ensure safety. Venkatagiri went on to describe other uses of capturing and analysing numbers, like in maintaining law and order, public health, scheduling material movement or public transport. An area of great use was in advertising and marketing.
The clicks on pages of search engines like Google, or in the course of purchases on the Internet were captured and made use of to send specifically sele-cted advertisement messages to individual users, based on their browsing behaviour. Venkatagiri also described how Google may be able to detect an epidemic before the health administration of a state came to know of it. Particularly in countries where medical help or dispensing was expensive, the occurrence of symptoms was revealed first in the way Internet users carried out searches rather than in the records of their visits to doctors or hospitals. Google could hence use its data to alert governments of apparent rise in the incidence of body pain and fever, for instance, to set in motion a process of investigation and containment.