Any data that can be used by itself or in combination with other pieces of data to uniquely identify a specific individual is considered as Personally Identifiable Information (PII). Traditionally, some of the common examples that have been considered most sensitive PII data are Social Security numbers, mailing addresses, email addresses, and phone numbers since each can be used to identify an individual uniquely. However, with an exponential rise in the volume of data collected from social media, mobile devices, websites, and various user tracking technologies, a lot of other secondary data that may not seem like PII data by itself could be used in combination with other sets of collected information to identify an individual by the means of statistical data analytics, AI/ML and many other similar and advanced techniques. This has put major emphasis on all regulated organizations that deal with PII data to address the following business requirements:
- Quickly capture and classify data at scale for its sensitivity.
- Encrypt or de-identify the sensitive data collected before it hits external networks to evade all threat vectors.
- Apply a zero-trust approach and use least-privilege methods to selectively identify the data based on strict Role-Based Access Control (RBAC).
Common challenges with Identification and De-identification of PII data
All over the world, security practitioners agree equivocally that encrypting the PII data is the best way to protect it, however, that’s the easy part. The challenging part is capturing and encrypting the PII data at scale specially when the source of data could be transient applications such as containers or non-static serverless functions such as AWS Lambda, Azure functions and the likes. Additionally, here are some of the real world needs and challenges:
- Applications generating/consuming PII data may not allow code changes. Think SaaS/PaaS.
- Data may go through multiple hops and may need to be quickly identified/de-identified at each hop depending on the business jurisdictions.
- When large amounts of data (TBs) need to be migrated from on-premises to cloud, de-identification will be required at a very high rate (~1M Ops/s)
- Not all sensitive data are of known Data types such as SSN, DOB, email address etc., which can be De-identified/Tokenized easily with the incumbent solutions. Some de-identification requirements can be complex such as De-identified date needs to be within certain bounds, house number in a street address cannot be more than certain number or it might break the business logic of real estate application that checks for the validity of house numbers in a certain neighborhood.
Key traits of an effective and practical solution
A complete solution ideally should address not only the current business challenges as mentioned above but should also be easily extensible and flexible enough to address future and more complicated data protection needs.
Here are some of the key traits you should keep in mind while evaluating solutions to protect your PII data globally:
- Cloud native and multi-cloud deployment
- Hybrid Deployment
- Hardware root of trust
- Global Software as a Service (SaaS) offering
- Versatility and Extensibility
Solution that works and objectively solves all real-world use-cases
Fortanix Data Security Manager (DSM) brings a modern, scalable, lightweight, flexible and cloud friendly solution to help customers protect their PII data right at the source and/or in-transit and/or at-rest. Before we dive into each of the use-cases, lets recap what DSM offers in summary:
- Flexible consumption options
- a. Available in Cloud Marketplaces with granular core-based pricing to help you get the best ROI.
- b. Supports multicloud deployment where in you can run nodes of a single cluster across different clouds.
- c. Supports hybrid cloud deployment where in you can run few nodes on-premises and few nodes in Cloud of your choice as part of same cluster.
- d. Offered as a global SaaS service with FIPS 140–2 Level 3 compliance.
- One stop shop for all data protection needs
- a. Offers full suite of data protection services such as Tokenization, Dynamic Data Masking, Key Management, Transparent Data Encryption, Application Encryption, Secrets Management, Key management for legacy 3rd party HSMs and Multi-cloud key management.
- Cluster to cluster peering at group level
- a. DSM offers a unique architectural tenet where in you can selectively use keys from another DSM cluster or from another 3rd party HSM while using the same control plane belonging to your primary DSM cluster. This specifically allows you to use same control plane / URL for all your applications. This approach also offers a unique advantage in cloud deployments where your primary cluster can be deployed in cloud, however, some of the highly sensitive cloud applications can perform crypto inside off-the-Cloud Fortanix HSMs selectively to meet key residency or localization requirements.
Fortanix’s approach to data de-identification and identification
DSM uses NIST approved FF1 method to do Format Preserving Encryption to de-identify/Tokenize and identify/de-tokenize PII data. The prime advantage and differentiation of Fortanix’s solution rests in how the solution is able to fit itself within the customer’s complex architecture and can seamlessly perform data identification/de-identification:
- It can be deployed and consumed at the application host so, data gets de-identified right at the source. This is achieved by caching the key in memory at the application host thereby achieving very high rate of tokenization and detokenization operations/second.
- It can also sit as a proxy fronting the applications and can transparently de-identify data without needing any code change in client applications. Additionally, it can automatically identify data without a need to pass any Key identifier and can integrate with user defined RBAC.
- For highly sensitive data, if business mandates that data identification/de-identification must happen inside hardware appliances, then DSM offers the 3rd approach where data can be streamed to the centrally deployed DSM cluster for identification/de-identification.
Regardless of business requirements and technical architectural fit, for all the methods mentioned above, the encryption key is always securely stored at-rest inside the central DSM cluster and thus helps you meet your local key residency requirements as you can control where to deploy a DSM cluster — in a specific region in Cloud, or within your own data center or you can choose a specific region of our SaaS service. Let’s magnify each of above methods:
Data Security Accelerator (DSA)
Data Security Accelerator (DSA) is an ideal solution when there is a need for very high rate of data tokenization and detokenization with negligible latency. It is offered in a form of a library as Java, JCE, PKCS#11 and as well as a webservice that can be inserted as a container in a mesh of micro-services or serverless functions such as AWS Lambda, Azure functions etc. for highly scalable/auto-scalable data protection needs.
De-identifying the data at the source gives you a subpoena-proof approach to de-identify your PII data, however, will require you to change your application code to insert the tokenize/detokenize function calls to DSA.
We understand that sometimes changing the application code is not practical. To address that, we offer another state of-the-art solution that lets you transparently encrypt your PII data as it hits the network.
Fortanix Transparent Encryption Proxy (TEP)
Transparent Encryption Proxy (TEP) bundles ngnix proxy under the hood and transparently tokenizes/de-tokenizes the data dynamically based on a predefined data classification schema. To make the entire process easier and scalable, the data identification/de-tokenization happens automatically using TEP’s intelligent data tagging technology.
This solution also gives you a subpoena-proof approach to de-identify your PII data right at the source (~almost), however, this approach is ideal for the use-cases where the data is read and/or written from/to services that are outside of your control such as public SaaS services.
Nevertheless, if you need very high throughput for data transformation such as bulk or batch de-identification for TB/PB of data sitting at-rest in data lakes, then Data Security Accelerator would be an ideal solution as it can be easily integrated with the data lake or Database specific User Defined Functions (UDFs).
Finally let’s explore the use-case where you can make application code change to call external APIs to tokenize/de-tokenize data, however, you may be under compliance that prohibits you from exporting the key as that would be needed for key caching in order to use DSA. In such cases, Fortanix offers its versatile approach where all types of operations on your data can be done inside the central DSM cluster.
De-identifying and Identifying data inside DSM Cluster
This is a straightforward solution where you can deploy a horizontally scalable FIPS 140–2 Level 3 complaint hardware appliance-based solution in your private cloud or data centers, consume our FIPS 140–2 Level 3 complaint globally deployed SaaS service or deploy horizontally as well as vertically scalable cloud native container-based cluster or Cloud VM based multi-AZ or multi-region cluster.
Regardless of which of the above 3 approaches you employ to solve your PII data protection needs — definition of data types for tokenization, dynamic data masking policy management, SSO integration with your central AD as well as key management is done centrally from single control plane within your central DSM cluster.
DSM offers many commonly used data types out-of-the-box, however, organizations often run into situations where they need to de-identify complex data types that won’t fit any pre-existing data types. To meet such demands, DSM offers highly customizable token type that can be morphed to meet almost any complexity.
The custom token can be built out of many parts where each part can be restricted to numbers only, characters only etc. To cater to more complex needs, numeric components can also be restricted between minimum and maximum integer values. This is just an example as there many more complex use uses that custom token can solve right out-of-the box.
Key Take Away
- Fortanix can effectively protect any type of your PII data at scale at the source, in-transit or at-rest.
- Whether you need an auto-scalable cloud native solution, globally deployed SaaS solution or a self-managed air-gapped HSM grade on-premises solution, we have you covered.
- Fortanix is Data-first Multi-cloud security company and can deterministically help you protect every bit of your PII data across Public/Private Clouds and SaaS services.
Originally published at https://fortanix.com.