Name frequency data overrides

Name Data Object overrides allow site-specific adjustments to the main internal database of name frequency data used by many aspects of Global Name Management name analysis and scoring operations.

Low-level component access is provided in a new gnrndo.h header file, and NameWorks access is provided through a new [General] configuration entry (NDOOverrides=filename).

Details of low-level access appear in the gnrndo.h header file. The NDOOverrides=... configuration entry indicates the name of a file containing records that override NDO data.

A new NameWorks Configuration object method addNdoOverrideFile() has also been added.

NDO override data is supplied for individual countries associated with specific name phrases. For each associated country relative frequency data is provided for surname and given name frequencies, with given name data split into female, male, and unknown values.

NDO override files use a configuration format similar to other Global Name Management configuration data, where a section name provides the name phrase being overridden and individual entries within a section provide per-country statistics for the related name phrase.

A special Extends=true/false entry indicates whether the override data should replace any existing NDO data (Extends=false) or appear in addition to any existing data (Extends=true).

The Extends=... entry is not required; the default value is false (replace existing data).

Frequency data is expressed as a percentile ranking which provides a value relative to the overall collection of known name data. Thus a single country data entry for the name LEONARD in Australia might contain a surname frequency of the 13th percentile, a female given name frequency of the 2nd percentile, a male given name frequency of the 19th percentile, and an unknown given name frequency of the 5th percentile.

Given the NDO statistics at the time of this writing, the sum of given name frequencies would be of the 19th percentile.

All percentile ranking values must be in the range [0..100], otherwise an exception will be thrown when an attempt is made to add override data.

Each entry within a section contains an ISO-3166 two-letter country code (the same codes used in other areas of NameWorks) followed by a list of frequency percentiles for that specific country:


   Code=surname_percentile,female_percentile,male_percentile,unknown_percentile
   

Thus an entry for LEONARD with two country overrides could appear as:


   [LEONARD]
   AU=13,2,19,5
   GB=15,1,22,3
   

NDO override data affects parsing, culture classification, country association, genderization, and special scoring of short names. NDO override data does not affect variant generation.