While the need to protect the data contained in computer systems has existed for decades, the massive adoption of cloud, as well as the proliferation of data protection regimes, have made it more crucial and urgent to identify and control sensitive data.
Information security used to be mostly a technical matter: prevent and detect intrusions, unauthorized logins, and denial-of-service attacks through firewalls, IDS systems, enforcement of strong passwords, and educating employees about social engineering exploits. Nowadays, security covers a much broader scope.
Information stored in the cloud is de facto shared with another party, the Cloud Service Provider (CSP), and in many organizations this information sharing triggers release compliance rules that are specific to each industry – finance, healthcare, defense, etc. Some organizations are updating their internal policies to consider the CSP as an extension of their enterprise and IT infrastructure, but this is not the rule today.
Protecting personal information became an important concern for organizations when the European Union voided the “safe harbor” provisions that had allowed U.S.-based companies to self-certify that they correctly protected that information. This led to the General Data Protection Regulation (GDPR) and a host of similar, sometimes directly GDPR-inspired, regulations in other jurisdictions (PIPEDA in Canada, CCPA in California, etc.).
The two issues (cloud storage and personal data protection) compound each other when the custodians of personal data store it in the cloud. For example, when data breaches occur, we often see finger-pointing between the parties or delays in making the affected customers aware that their information has been compromised.
And despite all the publicity about personal data protection, there are other types of information that may need to be controlled. In a recent white paper from the Object Management Group’s Cloud Working Group[1], four “compliance domains” are listed:
- Personal data: personally identifiable information (PII), patient health information (PHI), financial records, etc.
- Data subject to export control: for example, the technical specifications of a product may be subject to government restrictions because of the country of destination or of origin.
- Data and documents that embody protected intellectual property (patentable inventions, trade secrets, copyrighted material)
- Classified information, especially pertaining to military- or dual-use products and systems.
Controlling such data means identifying it, knowing where it came from, where it is stored, who can access it, where it is accessed from, whom it may be sent to, etc. – in other words, to establish and practice solid data governance. While this is necessary even in a private data center, the need is more acute in a cloud deployment.
One of the key steps to establish data governance is data discovery – scanning databases and file systems to identify the presence of information that it likely to require attention. For example, some data discovery products use rules based regular expressions to identify the presence of hone numbers, social security numbers, etc., or lists of common family names to identify that a particular column of a database may contain names.
To identify sensitive data, you need to know what you are looking for. Taxonomies of protected data play a crucial role to make data governance more systematic. Existing information security and cloud data governance standards in existence generally stay at too high a level – principles, policies, roles, responsibilities – and do not provide specific lists nor do they focus on specific compliance requirements. The OMG paper, on the other hand, lists specific taxonomies related to the four compliance domains listed earlier:
- A list of personal data attributes, classified into various privacy levels
- Export control categories specified in the U.S. Export Administration Regulations (EAR) and the International Traffic in Arms Regulation (ITAR)
- A taxonomy of “governing agreements,” such as non-disclosure agreements, that define the use and protection of intellectual property
- Data taxonomies for defense information from NATO and from the U.S. Department of Defense
- A taxonomy of Controlled Unclassified Information (CUI) published by the U.S. National Archives and Records Administration (NARA).
Data custodians need to investigate not only whether these taxonomies of data types and attributes apply to them, but also what other countries’ regulations may apply, given the location of the data (an issue knows as “data residency”) and the nationality or residency of the people and organizations involved.
Once sensitive data is identified, using a combination of these taxonomies and data discovery tolls, one must implement various mechanisms to ensure it is properly governed. These may include:
- Adopting the eXtensible Access Control Markup Language (XACML) standard from OASIS
- Adopting the OMG’s Information Exchange Framework (IEF)
- Implementing security information and event management (SIEM) solutions for monitoring and alert management
- Leveraging the Open Cybersecurity Schema Framework (OCSF) and Intrusion Detection Message Exchange Format (IDMEF)
- Using distributed ledger technology (DLT), such as blockchain, to secure data, track transactions, and prevent tampering
- Adopting the data management best practices codified by DAMA International in their Data Management Book of Knowledge (DMBOK).
All of this will not fully resolve data security and control – many challenges will remain, for example related to retention and deletion of data. But it provides a foundation for a critical series of steps toward good data governance:
- Determine the business goals of the organization
- Determine which laws, regulations, contracts, etc., apply
- Decide how to label and track sensitive data
- Inventory the sensitive information (data discovery, supported by taxonomies)
- Find out what is missing in the current handling of information
- Create a remediation plan and a robust governance mechanism, with appropriate funding and staffing
- Procure and deploy the appropriate toolset to enforce the controls specified in the plan, with as much automation as possible.
[1] Object Management Group: Domain Taxonomies for Cloud Data Governance. December 2023. https://www.omg.org/cgi-bin/doc?mars/23-12-05