
Unstructured data discovery is really organizational knowledge management
The purpose of Unstructured Data Discovery Data discovery is generally used to create an inventory of all corporate data, structured and unstructur...
Name
Email Address
Telephone
Published on 13 April 2021
Data discovery is generally used to create an inventory of all corporate data, structured and unstructured, identifying regulated data (e.g. CCPA, GDPR) and data that is business-sensitive and / or mission-critical. Furthermore, it is the first step in establishing data-centric security, governance, policies and controls. For such controls to be effective, they need to be closely aligned with how business users share data. This ensures that controls for unstructured data are aligned with how business users share data. Mapping the discovered data to business processes, business-orientated functions and artefacts must be the goal of discovery. This in turn ensures that discovery aligns with (and actively contributes to) knowledge management.
“Knowledge Management is the process of capturing, distributing, and effectively using knowledge.” (Davenport)
Knowledge Management is often based on building a catalogue or dictionary of information. Software platforms which support knowledge management are almost universally built around a hierarchical taxonomy. For the purpose of governance, a hierarchical business process / function catalogue is usually the best option.
Unstructured data discovery requires investment and technology as the volumes of data are often very large. You can read about how DocAuthority saves businesses time and money here.
To protect the corporate landscape against a potential breach, relying on existing data classifications or legacy permissions may not be the best method. Without a well-planned, centralized approach, managing the threat of a breach is a substantial challenge.
Big Picture
Without a unified view of data assets and their business affiliation and associated risk, no business-wide, holistic policy baseline can exist. Furthermore, it is hard to quantify (in financial, legal and brand terms) the scale or business impact of a breach. A unified view makes it much easier to justify what data should be protected and to what extent.
Inconsistency
Data classification, DLP, access management and data retention activities will vary significantly among the different business units and departments. When defining policies on a per file or folder basis, specifically classification, we rely on end users to make informed decisions. Unfortunately, consistent security knowledge or the prioritisation of security activities by end users cannot be relied upon. The resulting variance is considerable. Hence, classification is inconsistent and, nearly always, incomplete.
Timescale
It is difficult to recover from large scale breaches as it will take significant resources and time to identify what was compromised or lost for a large, cross silo dataset.
Privacy
Data privacy is a challenge. It is hard to differentiate between the different types of documents that contain PII, identify their purpose and their authorized use.
To see the big picture and reduce risk management complexity, a data catalogue is required. A catalogue is a centralized “Yellow Pages” for sensitive / mission critical and regulated information. The catalogue maps the data hierarchically within the organization in a way that is comprehensible to security, business, and management.
The data catalogue items describe the “What” and answer the question “What data do we actually handle here?”. Each catalogue item deals with the type of data, the essence. Therefore it makes it possible to subsequently define the specific policy for each data type. As an example, you might have a department (or more than one) that handles “suppliers’ contracts”. This type of data may not be homogeneous, as there might be several types of “suppliers’ contracts” across the company, and several related assets associated with those contracts. These may span to many physical locations across the company, making it difficult to apply the exact intended policy to govern this specific data type everywhere, in any other approach.
A data catalogue enables policy baselining for data risk management, protection, and governance across the entire organization, enabling high quality, consistent and systematic enforcement. Consistency and quality are enabled because policies are assigned to catalogue items that share the same business use. Departmental SMEs define policies based on the catalogue items (and their business usage), rather than rely on end users’ decisions.
How to Build a Data Catalog
Building a sensitive data catalog involves five steps:
Summary
A data catalog enables a consistent, systematic and unified approach to data-centric security and governance for mid-size and large organizations. If done correctly, it is a one-time effort that can sustain itself over time. It is best to manage an unstructured data catalog separately from the structured data catalog, as their structure and policies differ.
The purpose of Unstructured Data Discovery Data discovery is generally used to create an inventory of all corporate data, structured and unstructur...
POPI compliance, like its counterparts in other parts of the world, has the following privacy requirements. You must keep a record of what personal...
The classical records management lifecycle (which consists of creation, classification, maintenance and disposition) is fine if there is a records ...
Wondering how much you could save on your unstructured data storage?
Find out now with our Storage Saving Calculator.
Enter how many terabytes of unstructured data your company manages?
{{ previewCost() }}