How Anonymization Works

    Data Anonymization

    This article describes how Own's Anonymize application works in a Salesforce data environment.

    We support automatic data anonymization capabilities as part of our Anonymize application. This capability allows users to automatically anonymize fields that may contain private/sensitive/identifiable information. The library that is leveraged is based on ​Faker​​.

    We anonymize fields in the selected sObject(s) based on their Type, Compliance Category, Sensitivity Level or Label and use the appropriate anonymization functions to match that fields’ value format. For example, email type fields are anonymized to values in email format.

    Retaining Distribution and Data Integrity

    To maximize the data quality, a given value that is anonymized is anonymized to the same value when encountered again on the same field type in the same Job, thus preserving the original distribution of values and integrity. The following four required factors must be the same for this to function:

    1. The replacement value selected in the template
    2. The field name (API name)
    3. The original value
    4. Field Type

    For example, if the Country field value was originally ‘Cuba’, and was anonymized to ‘Peru’, every record in the same job, where the Country field is 'Cuba' is anonymized to ‘Peru’.

    Blank Values

    The following values are never anonymized:​

    • Blank (empty string)
    • NA
    • [not provided]

    ​Field Type Identification

    As Salesforce’s schema does not provide identification for all fields containing sensitive information, we use the following process to determine which fields to anonymize (and the appropriate matching anonymization functions):

    ​​Address fields​​:  Street, City, State, Country, and Postal Code fields are determined by their Label.

    ​Names​​: Account Name field uses a company name anonymization. Contact Name field is anonymized via a person names anonymization function. Other Name fields are anonymized as regular strings. FirstName and LastName field names are anonymized accordingly.

    ​​​Personal identifiable information​​: Email, Phone and URL fields are identified by their corresponding field types.

    National ID numbers​​: Social Security Number, Social Insurance Number, National Insurance Number are identified via the “MaskType” property.  The ‘all’ MaskType field types are anonymized via the SIN anonymization function.

    ​​​Financial Credit card fields​​: Identified via the ‘creditCard’ MaskType.

    ​​​Other​​: All encrypted string field types are also anonymized as string fields, according to their original length.

    ​Field History Tracking

    When using the Anonymize application, we are able to anonymize the records in the selected Salesforce sandbox. This does not manipulate the Field History Tracking in Salesforce as History tables are Read-Only. When using our product to anonymize data in a sandbox, we recommend to disable Field History Tracking and re-enable if required.

    ​You can turn off field-history tracking from the object’s management settings. Below is a Salesforce article on how to disable Field History Tracking in your sandbox.

    ​To disable Field History Tracking, see more information ​here​​.

    ​Compliance Categorization and Data Sensitivity Level

    When using the Anonymize application, we set fields with Data Sensitivity Level and Compliance Categorization in Salesforce. We set a data sensitivity level from the field, and identify if the field needs to be anonymized.

    ​A field is marked as sensitive information based on the following field properties:

    ​​​Compliance Categorization​​: The compliance acts, definitions, or regulations that are related to the field’s data. When the field contains a value that is not public, we suggest an anonymization value.

    ​​​Data Sensitivity Level​​: The sensitivity of the data contained in this field. When the field contains a default value of Internal, Confidential, Restricted or MissionCritical, we suggest an anonymization value.

    ​Once the Compliance Categorization and Data Sensitivity Level are recognized, we suggest anonymizing the fields. If the field is marked as private or sensitive information, the replacement value is automatically recommended. However, if we are unable to recognize the categorization, then the anonymization is randomized.

    NOTE: For checkboxes, the cache mechanism is used. For the first record, a randomly selected replacement checkbox state is used (either checked or unchecked). All following records having the same original state accordingly receive that same new state.

    Additional Information

    ​For more information about the Anonymize application, see ​here​​.

    For more information about the Seeding application, see ​here​​.

    ​For more information about Replicate, see ​here​​.

     

     

    Attachments

    « Previous ArticleNext Article »