Keys for Securing Private Information in an EDMS

Electronic document management/imaging systems (EDMS) are subject to the same rules and regulations as other information systems. However, being document-centric, applying these rules and regulations has its own particular characteristics. Anyone setting out to implement information privacy rules in their EDMS will need to bear this in mind.

Norman Mooradian, Ph.D.

Bookmark and Share

Moreover, success in translating information privacy rules into the domain of an EDMS will be essential to any information privacy initiative. Because an EDMS accounts for a great deal of the information captured and managed by many organizations, failure to address the peculiar challenges it raises could leave organizations vulnerable.

High-Level Privacy Requirements

Most organizations will capture and manage diverse kinds of personal information that are protected or regulated under distinct legislative regimes at the international, national, provincial, or state levels. Although it is impossible in a single article to describe how to implement the requirements from all of these regimes, a set of common, high-level requirements can be abstracted and used as the basis for any implementation.

There are advantages to starting design and implementation planning from a set of higher-level requirements. First, it will help identify a common basis for all compliance protocols, thereby providing a comprehensive approach to diverse compliance needs. Second, it will provide a basis for personal information not covered by a particular legislative regime. Finally, it will provide the basis for a unified platform for technical development and implementation. This unified platform will enable common tools to be created, used, reused, and adapted to different and changing needs. The platform will simplify future development, administrator training, and end-user training.

The following requirements are abstracted from a number of frameworks and legislative regimes. They capture the main elements of most significant privacy legislation, such as those found in the U.S. Privacy Act of 1974, the Safe Harbor Agreement, and the Organization for Economic Co-Operation and Development Guidelines for the Security of
Information Systems and Networks, even though terminology in the legislation may differ.

  1. Consent: Obtain consent for collection and for additional uses or sharing of information.
  2. Notice: Provide notice for additional uses and sharing of information.
  3. Access: Restrict access to authorized individuals and parties.
  4. Use: Limit use to minimum needed for legitimate business activities.
  5. Retention: Retain information only as long as needed for legitimate business purposes.
  6. Security: Provide adequate security for information.
  7. Audit Trail:Maintain a record (history/audit trail) of uses of information.
  8. Review: Provide subjects access to their data to view and correct information.

From a technical perspective, requirements 3 through 5 constitute the starting point and core of any privacy-compliant EDMS.

Overview of an EDMS

From a privacy perspective, the fundamental difference between an EDMS and other business applications is that the EDMS has a dual information structure. While business applications capture and manage data or information in traditional databases, an EDMS captures and manages document files. An EDMS, however, uses databases to manage these document files. So, at a basic level, there can be two categories of personal information in an EDMS: that contained in the document file as its content or metadata and that contained in the database in data tables as index data or metadata.

An EDMS provides standard functional capability. These include the ability to store documents in repositories (typically called folders); retrieve documents through index (database) searches or full-text searches; browse or navigate to documents based on directory structures; and print, e-mail, or export documents singly or en mass. Also, systems can convert image files of text documents into readable text and move documents through workflows based on document attributes and events. Elements of an EDMS similar to other business applications include user accounts, user groups, user privileges assignable to user accounts or groups, security controls, reporting functions, automated processes, and audit trail creation.

The division of information into the categories of index data and documents means there will be two targets for privacy controls. The policies established for the different kinds of personal information will apply to both types of information, including an individual’s full name, Social Security number, date of birth, address, gender, race, marital status, medical information, and employee performance evaluations.

If using an EDMS to store employee files, expect that some or all of this information will be stored in the EDMS. Unlike in other systems, however, it will be captured and managed in two different ways. First, it will be captured as data in a database management system such as MS SQL Server or Oracle, where its primary use will be for identifying, classifying, and retrieving documents. Second, it will be stored in the documents themselves as part of their content. The majority of personal information, sensitive and non-sensitive, will be stored in the documents. Nevertheless, the primacy of documents should not lead to neglecting the personal information that is stored as index data.

Index Data and Documents

Index data plays two roles with respect to privacy in an EDMS. First, as described above, it can contain personal information and can, therefore, be a target of privacy controls. Second, as information descriptive of documents and their content, it can itself be part of privacy controls. Following is a look at both of these functions.

Index Data

Index data can include sensitive personal information. This may not be obvious at first because index data is not the primary information in an EDMS and makes up a small fraction of the overall data. Nevertheless, it can contain small pieces of data that can be harmful if disclosed.

Consider the example of an index template used for personnel documents shown in Figure 1.

 Index Template

The index data shows that it is associated with a W2 form. Clearly, the W2 form contains the bulk of the confidential information; in particular, it contains the person’s income information for the particular position. But notice that there is an index field for the Social Security number. It is there to facilitate retrieval, not to capture that piece of information as part of the record. Nevertheless, it is sensitive information, and as a result, controls will need to be in place to limit access to and sharing of this data.

The second point about index data is that it can be used as part of a privacy controls package. When designing index templates, consider what information or attributes are necessary to facilitate managing the documents under privacy rules. Make policies for each type of personal information the organization collects.

An index form, therefore, should include a field that contains the name of the information type as it is named in the policy. In the figure, there is a document type field, and the value is “W2 Form.” There should be a policy that explicitly refers to W2 forms. This will allow a connection between compliance rules and documents indexed under this template. The rules may be automated within the system or may be followed through manual processes. Most likely, it will entail both.

The important thing is that there is a descriptive connection between the document and the privacy policy. In the event that the document description used to retrieve the document does not correspond to the lexicon or taxonomy of the policies, add a field to the index for that value.

In the example figure, this would not normally be a problem. However, if the organization’s policy used a more general term such as “Tax Information,” the organization would need to add a field to hold that value. The field might be labeled “Privacy Status.” The form would then have two descriptive fields: one used by end users to retrieve the document (Document Type: W2), and one used to manage it in accord with privacy rules (Privacy Status: Tax
Information). The decision to add a specific field for privacy status should be consistent across all document types, even if the two descriptive fields are redundant in some cases.

Other fields on the form are also relevant to privacy controls. The retention fields are used to implement a retention schedule. This, of course, is a fundamental part of records management and should cover all documents in the system, not just the ones containing privacy information. That being said, privacy requirements mandate that personal information not be kept longer than is necessary for serving business needs. In assessing the retention of documents containing personal information, apply greater scrutiny to such information and require that business needs have sufficient weight to warrant retention. Implement the retention requirements for the personal information in the EDMS so end users can easily identify documents and index data that have passed their expiration period.

Making retention information an explicit part of the indexing or metadata set supports a number of system functions that will make this monitoring process easier for end users. At the most basic level, it makes it possible for users to perform simple queries that return all documents that have passed their destruction or removal date.

At a more advanced level, the EDMS may have automation tools that will identify these documents and take some action. The system could automatically move the expired documents to a holding place for review, send a notification to the relevant persons, and generate a report or list for review. This kind of functionality should be employed for all document types, but it is especially urgent where compliance is concerned and may merit some extra elements to reduce risk.


Document files are the center of any EDMS. They will clearly be the main source of the personal information captured and managed by a system. The majority of privacy controls will therefore focus on them. To devise effective controls, it is important to understand the main differences between document files and the data managed in traditional business applications.

Structured vs. Unstructured – Documents can differ from each other along a spectrum of what information professionals label “structured” and “unstructured” data. Structured data is data that is stored in a database; it is described this way because it is stored in fixed tables and fields. Unstructured data does not reside in fixed locations. An example is text in a word processing document. While some documents can be described as structured and others as unstructured, some fall somewhere in the middle of the spectrum.

Forms are an example of documents that fall on the structured end of the spectrum. Forms are usually broken down into fields that are filled out by a customer, employee, or some other party. The document is conceived of as being made up of predefined pieces, with (in most cases) a predictable range of values for each of the pieces. In this respect, forms resemble database records in a table. In fact, using recognition technologies, such documents are often used to generate data for a database.

On the other end of the spectrum are documents that have no predefined fields and that do not in any other way set prior constraints on what kind of information will be in the document. Correspondence is a good example of a type of unstructured document.

Employee evaluations are an example of documents that fall somewhere in the middle of the spectrum. They may be organized into forms-like areas with boxes to be filled in, as well as sections for written descriptions and comments that might vary considerably.

The distinction between structured and unstructured documents has implications for privacy controls. When evaluating structured documents for privacy risks, identify which fields on the form can contain personal information and its degree of sensitivity. Knowing the range of values a particular field will have enables one to assess the privacy risks associated with that document type. Then, appropriately classify the document type with respect to its sensitivity level. Because fields on a form will likely differ in terms of their sensitivity, privacy controls should be based on the most sensitive field, not the least.

With unstructured documents, however, this method of risk assessment will not be possible. For any category of correspondence, it may be unclear what kind of personal information might be contained in a given document. To determine the maximum sensitivity level possible for such documents, review a sample and predict what other documents in this category may contain. Make a judgment about the proper classification of the document type. On the one hand, take into account both the predicted frequency and sensitivity of personal information in question, weighing them against each other. On the other hand, base classification on the sensitivity alone.

File Formats – Documents can also differ in terms of their file format. There are many specific file formats, but a broad classification is the difference between an image file,which represents a document as pictorial graphic data, and a text file, which represents a document as computer readable text. An implication of this difference has to do with the ability to edit or change documents.

Image files cannot be changed easily. That means they are more secure, but it also means that they are harder to correct. Consequently, to satisfy the privacy requirement to provide subjects access to their data to view or correct it, it may be necessary to replace the image file with a new one,whereas, in the case of a text file, a simple edit may suffice.

Another implication of this difference has to do with securing segments of the document. Generally, document files are managed as wholes. Access to the document means access to everything in it. In most systems, image files can be redacted, and security can be set on the redactions. This allows sensitive information in the document to be hidden during viewing and printing. An EDMS often does not provide this feature for text files, though it may.

Designing Security Access Restrictions

A security scheme must be designed to meet the core access restriction requirements, as identified above:

  • Restrict access to authorized individuals and parties
  • Limit use to minimum needed for legitimate business activities

Taken together, these require that end users have the minimum necessary access to perform the tasks associated with their jobs, and that they not be given access to the parts of a document beyond what is needed.

So, for example, a human resources staff member may have the responsibility of checking that W2 forms have been filed. Under these requirements, he or she should have access to W2 forms, but no other document types. Further, given that the content of W2 forms is sensitive, the staff member should be prevented from seeing the actual content of the document, if this is practicable and appropriate to the person’s level of authorization.

To meet these core-access restriction requirements, security schemes must be sufficiently fine-grained to divvy up privileges in this task-specific way. Because each EDMS will provide different tools and methods for applying security schemes, the EDMS may have limitations on how fine-grained the scheme can be. Therefore, it’s best to work out a scheme as independently from a system’s features as possible. This will bring a number of benefits. It will:

  • Allow a better idea of what security requirements are
  • Provide a product-neutral set of requirements that can be used to transition to a new system
  • Provide a criterion against which to measure the system
  • Identify gaps that can be addressed by policies, procedures, and methods outside the system

A common way of conceptualizing security schemes – one that is probably inadequate to meeting the minimum necessary – is called the “organization chart method.” This method supposes that access requirements will correspond to the organization’s structure of divisions, departments, and workgroups. However, an organizational scheme is not fine-grained enough to meet the minimum necessary in typical situations. Employees from different work groups, departments, and divisions often need to share documents. They may not need the whole case file or dossier, but they need part of it to perform a particular business function. To compensate for this shortcoming, system administrators will often add hybrid groups to supplement the organization chart scheme. This, however, can become quite difficult to manage over time.

An alternative approach to creating a security scheme that is sufficiently finegrained to meet the necessary minimum requirement is called the “taxonomy model.” This model is based on the document taxonomy created to organize and store the documents. As mentioned above, the taxonomy should reflect or correspond to the naming conventions used in privacy policies. If it does not, then one should be created, and it should be used for the security scheme.

Like organization charts, taxonomies are hierarchical. However, they describe the organization’s information assets and how they are related, providing a clear structure upon which access rules can be based. Further, because it is hierarchical, a taxonomy allows the organization to determine what levels of the taxonomy are needed to meet security needs. Below is an illustration:

  • Employee Records
    • Performance Evaluations
    • Medical Records
    • Insurance

Using such a scheme, user access can be conceptualized as relating to the bottom-level descriptors, e.g., “Employee Records-Insurance.” This scheme allows users to be assigned to this document type based on their need to access it, whatever work group they belong to. If a user needs access to another kind of employee record, he or she can be assigned to it. In documentation, the scheme would appear as follows:

  • Employee Records
    • Insurance
      • Group,Users

The documentation would explain, for example, that “Group” includes all employees who work in HR benefits and might include in-house attorneys or specific employees from finance who need to monitor insurance expenses.

The advantage of such a scheme is that users will have access to all that they need and no more in a way that is easy to understand and manage. When a user no longer needs access to the document type, he or she can simply be removed from that document descriptor (i.e., from the group or list of users) without adjusting any of the objects in the security scheme. Using the organization chart method, such changes are more awkward, as the user’s access needs can change even without a position change.

Another advantage of the taxonomic method is that it is extensible. It allows document type divisions to be added to reflect security needs. Also, as access to a document type is granted in relation to a specific business purpose, it allows purposes or uses to be added to the taxonomic security scheme for even finer-grained restrictions. Hence, if there are multiple legitimating purposes for accessing employee insurance files, these can be added as shown below:

  • Employee Records
    • Insurance
      • Purpose/Use1
      • Purpose/Use2
        • Groups, Users

Implementing Access Restrictions

Once the security scheme is created, it can be implemented using EDMS tools. Some systems will support fine-grained security while others will be somewhat limited. The tools may not appear to be designed with an organization’s particular scheme in mind. For example, the system may allow user groups to be created and added to the security profile of a document type, but its designers may have supposed that the groups would correspond to organizational groups such as departments and divisions. Nevertheless, groups can be named so they reflect the document type and in this way allow a taxonomic security scheme to be implemented.

In addition to allowing security to be placed on document types, either by repository location (folder/subfolder) or via the document indexing template, an EDMS may allow security access restrictions to be applied on a variety of document-related tools. Relevant tools or objects include functions such as document queries, document automation tools (e.g., workflow, automated processing), index data displays such as folder columns, and index fields. The more options it has, the better.

The minimum necessary rule and the dual data structure of an EDMS require a flexible and comprehensive set of security-related features. For example, as discussed above, index data for documents may include sensitive personal information. The ability to hide that data on the index template or in the document list can be quite important. Also, parts of a document may be sensitive, while others may not be. It will be important to have the ability to place redactions on a part of a page and secure the redaction so unauthorized uses cannot disable it and see the content.

A Manageable Challenge

Implementing privacy controls in an EDMS poses special, but manageable, challenges. Unlike traditional data systems, an EDMS contains such a diversity of information, personal information can be exposed in a variety of places through a variety of methods. These multiple risks must be addressed through careful planning in the design of the system and in the use of its various security and access control tools. With a review of the system’s features, and perhaps a little imagination, it should be possible to make an EDMS privacy-compliant in relation to current requirements and privacy-ready in relation to future ones.

Norman Mooradian may be contacted at

 From March - April 2008