Information classification versus data classification – We were here before IT!
Due to the varied value of information, people learned to categorize it to „separate the grain from the chaff” and to focus their efforts on preserving the information very early. Whether it was about Caesar’s struggle against Hannibal, the commercial importation of amber from Polish territories in medieval times, or even the list and proportions of ingredients needed to produce gunpowder in ancient China – all this information was available only to a selected few. People subconsciously sensed who to entrust this information to, and from whom to hide it. This „feeling” of the confidentiality of information, was major underlying cause for creating conception of information as we see it today.
Information classification – is applying labels to the data, indicating who and how can process (and the information contained inside).
Attributes – popular CIA (although it is also worth knowing the ACID model):
- Confidentiality – or „secrecy”, the intensity of this feature defined as high measures of protection of information seen in order to protect it from the unauthorized first.
- Integrity – or „immutability”, the intensity of this feature determining how high the information levels of the news would block the content from changing.
- Availability – „availability” – telling about who we can share it with and on what terms.
Classification is a fundamental information, securing information information from the interpretation of information and organization. It is the process of identifying and assigning predetermined levels of sensitivity to different types of information.
Individual organizations define the types of information that fit each category in their own way.
Usually, as is good practice, they have a unified sensitivity rating system:
If your organization doesn’t classify your data properly, you can’t protect it properly.
Information classification requires knowledge of its location, content, volume and context. IT resources are, of course, the priority location in the era of forced technological transformation. This is where IT security comes in, i.e. the process of ensuring the state of compliance with the security policy for the computerized part of the information system. Information is leaked, information about vulnerabilities allows it to be exploited, information exfiltration is the target of an attack, etc.
Each of the aforementioned scenarios results in financial, operational and reputational losses. If you want to learn more, a good starting point is the risk analysis prepared by your IT department.
Today, all modern companies store major share of their information in the form of data – they process it, store it, share it, lose it and obtain it. They are distributed in many repositories, to which immediate (often poorly controlled) access is obtained, from various devices, by various (not necessarily authorized) users:
- Databases – local or in the cloud
- Microsoft SharePoint platforms
- Cloud Storage services
- Files such as spreadsheets, PDFs, Word and e-mail
Let’s say you are a security analyst in a financial or public institution. An organization where users create millions of files containing information every day. Some of this information is very confidential and if it is in the wrong hands, you could lose from hundreds of thousands to millions in penalties, damages and lost sales opportunities. However, this does not change the fact that most of the data created each day could be easily run in the TV news bar and it would be without incident.
The purpose of data classification is to capture these few percent of the critical data among the organizational „noise” and ensure their visibility. However, this is not the only goal.
In the File Analysis Software Market Guide (which integrates data classification systems), Gartner lists four general areas of utility for this software:
- Restrict access to information containing personal data (PII)
- They allow you to control the location and access to intellectual property (IP)
- Reduce the attack surface to sensitive data
- They allow you to add an additional rule execution parameter in other programs, eg DLP
Governance / Compliance
- Help identify data subject to GDPR, HIPAA, CCPA, PCI, SOX and future regulations
- Apply metadata tags to protected data to allow for additional tracking and control
- Provide for quarantine, legal hold, archiving, and other regulatory actions
- Much easier to implement the „right to be forgotten” and data access requests (DSARs)
Performance and optimization
- Provide effective access to content based on type, usage, etc.
- Locate outdated or redundant data
- Help to optimize processes – e.g. identify heavily used data for transfer to faster technologies or infrastructure in the cloud
- Tagging metadata to optimize business operations
- Inform the organization of the location and use of data
Different organizations define the types of information that fit each category and the areas (from the above) that they want to improve. They often contain a common hierarchy of sensitivity: protected, sensitive, confidential and public. However, the types of data classification are a much wider aspect.
Data classification can be done based on content, context, or user choices:
- Content-based – includes scanning of files and documents and their classification based on what they contain / represent
- Context-based – This includes classifying files based on metadata such as the application that created the file (e.g. MS WORD), the person who created the document (AD account), or the location where the files were created or modified (e.g. . repository)
- User-based – is the user creating or editing a document / file
The above classification schemes are used in File Analysis Software tools that allow companies to take the first step to protecting information. You cannot protect something that is not precisely located and defined. It has been known for a long time that the inventory, especially of data, is a tasty morsel for auditors.