Business Matters:
Cleaning Up Your Information Wasteland
A very large enterprise, the U.S. Air Force (AF), realized several years ago that it had more than four petabytes of data saved on storage technology devices. Commissioning a study of the stored data that was conducted using industry standards and estimates, the AF learned that probably less than one-third of the data had been accessed in the previous six months.
In addition, the study estimated that the AF may have as much as 50% duplication in stored files across the enterprise – the result of a proliferation of Word documents, PowerPoint presentations, and other artifacts that were e-mailed to wide audiences, often as part of coordinating and synchronizing business processes.
Michael Corrigan and J. Timothy Sprehe, Ph.D.
In many cases, the owners of the data were no longer able to identify what it represented or justify why they had kept it for so long. The AF costs for operating and maintaining the storage technology to handle this volume of information were several hundred million dollars annually and growing. The conclusion drawn from the study was that the AF was creating giant “information landfills” that were expensive and wasteful.
Reflecting upon this phenomenon, the AF realized that the enterprise was failing to manage information to the end of its life cycle. In a typical case, a project’s information and lessons learned would be harvested, then managers would store the project data without deciding its final fate and move on to the next project. To some degree, this practice amounted to institutional abandonment of information management responsibilities, and it wasted scarce financial resources.
Although the AF had an enterprise records management program, this uncontrolled storage practice had been occurring independent of its scrutiny for decades – a widespread phenomenon in many large enterprises. Had records program personnel been asked, they would have determined whether this stored data constituted enterprise records, and the appropriate retention and disposition discipline would have been imposed. If the data had risen to the level of an official record, it would have been retained; if not, it would have been destroyed.
Information Asset Management
The experience of scrutinizing enterprise use of data storage technology was one trigger that led to AF Policy Directive 33-3, Information Management (AFPD 33-3), in March 2006 (www.e-publishing.af.mil/shared/media/epubs/AFPD33-3.pdf).
The directive states that the AF manages all information as assets that must be available to authorized personnel and applies the same management principles to all information assets regardless of source, owner, security classification, media, location, or other defining characteristics.
According to the AFPD 33-3, “Conceptually, viewing information as an ‘asset to be managed’ incorporates and aligns into one discipline the multiple disciplines traditionally associated with IM [information management] (data management, records management (RM), multimedia management (MM), documents management, workflow management, and publications/forms management).”
Any piece of information within the enterprise – document, e-mail, database, and so forth – is now treated as an information asset (IA). AFPD 33-3 embodies the AF information management strategy for enterprise-wide implementation of information asset management (IAM).
IAM treats all information (e.g., data, documents, e-mails, records, spreadsheets, digital images, multimedia, databases) as IAs to be managed equally under the same general principles:
-
Information is an asset so long as it has positive value to the enterprise. When information ceases to have positive value, it is a burden to the enterprise and should be disposed of or retired.
-
Information is a time-related asset. Information has positive value to the enterprise for a definable period of time, ranging from the present moment to some future definable point or to the life of the enterprise (permanent records).
-
The value of information depends on the ability of the enterprise to discover, access, understand, and consume the information.
-
Each IA shall have assigned to it at the moment of creation a period for which it is to be retained and instructions for disposition of the information at the end of that period.
-
IAs will be managed so as to minimize the effort required by personnel and applications to conduct information management activities and enterprise services. IAM explicitly entails a high degree of automation occurring in the background and transparent to the desktop user.
The implications of IAM are that each IA carries a metadata element that stipulates how long the IA is to be retained (the retention period) and instructions for the IA’s disposition at the end of the retention period. If the period is temporary, a metadata element indicates how the information is to be disposed of, usually by destruction. In other words, all IAs are to be managed with records management discipline.
The AF examined existing commercial-off-the-shelf (COTS) software systems to ascertain whether they would support IAM. Enterprise content management systems (ECMS) are the closest fit to IAM, inasmuch as ECMS suites typically contain document management, records management, workflow management, web content management, case file management, and the like.
However, in many cases, ECMS COTS products do not directly support the desired IAM because, while ECMS suites contain the pieces of IAM, the pieces are not joined into an architected framework conceptually consistent with IAM.
AF created its own framework for IAM based on service orientation, in which COTS ECMS tools, augmented by minimal custom programming, can be used to realize the framework. The issue currently facing the AF is operationalizing this framework for IAM across a widely diverse set of geographically dispersed installations and two million people in the enterprise of the Department of the Air Force.
Information Assets and Records
The purpose of declaring that all IAs will have an associated retention and disposition is to ensure that the enterprise manages information according to lifecycle principles, from its creation to its final disposition. Data storage technology abuse will not occur when the enterprise manages its information according to the principles, of IAM because any IA entering storage will possess retention/disposition features that trigger its removal when the retention period has expired.
Determining when information rises to the level of a record conventionally entails making a judgment about its nature and its value to the enterprise. Bearing directly on this judgment is the legal and regulatory framework in which the enterprise exists. IAM operates on the premise that determining the retention period and disposition instructions for a given IA requires essentially the same reasoning required to determine whether something is a record.
The difference between an IA that is not a record and an IA that is a record resides in the AF records schedules approved by the National Archives and Records Administration (NARA). When an IA falls within an approved AF records schedule as determined by its metadata characteristics, the IA is a record and is managed as such. As defined by the international records management standard, ISO 15489-1:2001 Information and Documentation – Records Management – Part 1: General, metadata is the data describing context, content, and structure of records and their management through time. Literally, data about data.
User Involvement
AFPD 33-3 states that the AF will “minimize the effort required by personnel and applications to conduct information management activities and enterprise services.” This policy entails implementing IAM “with a high degree of automation consistent with DoD [Department of Defense] direction for creation of associated metadata ‘tagging’ for each asset.”
In practice, the objectives and principles mean that assignment of retention and disposition metadata to any IA occur in the background, transparent to the desktop user and normally with no overt action by the user.
IAM relies on COTS software, minimal custom programming, and advanced systems to estimate whether an IA is a record and determine its disposition. These tools, operating as a web service at the server level and developed with the collaboration of the AF records officer, assign retention period and disposition instructions to every IA. It also determines whether the IA is an official record and modifies the IA’s metadata accordingly.
Any IA reaching a server – for example, a document saved or e-mail received or sent – is automatically subject to the web services that assign metadata values. IAM eliminates the need to train users to recognize which documents or e-mails are records and to take action if they are.
The Metadata Environment
The most critical component of the AF IAM strategy is the creation and exploitation of metadata. AF has created a metadata environment (MDE) to generate, use, and manage the metadata that links the information user to the correct and authoritative IA sought. AF created a metadata specification based on the Department of Defense Discovery Metadata Specification (http://metadata.dod.mil/mdr/irs/DDMS/) with AF extensions to support information lifecycle management and more extensive information assurance metadata.
Figure 1 shows the components of the MDE. The MDE takes its basic structure from the AF Enterprise Architecture Data Reference Model and the AF Data Reference Model, which are the AF derivations of the Federal Enterprise Architecture and its reference models (www.whitehouse.gov/omb/e-gov/fea/).

MDE contains metadata registries, catalogs, and tools to enable effective operations. MDE treats all IAs the same, providing a consistent description of each asset through a uniform set of metadata and managing all metadata through a common set of services. MDE’s components are as follows:
-
Metadata catalog. The MDE creates an entry in the metadata catalog for each instance of an IA and creates the values for the metadata tags for that instance. Discovery services then peruse the metadata catalog to find specific IAs. The metadata catalog entries link each IA to a service that delivers that asset. When a user conducts a query and locates the IA that best satisfies his or her query, the metadata catalog identifies the service that will deliver that IA.
-
Metadata registry. The metadata registry holds the definitions for the various types of metadata. The MDE uses the metadata from the metadata registry to tag actual instances of IAs with actual metadata values to support discovery, lifecycle management, storage management, and categorization of the individual IAs. The metadata registry contains the following types of metadata:
* Discovery metadata
* Structural metadata
* Semantic metadata, including taxonomies and vocabularies
* Service metadata
* Records metadata (such as DoD 5015.2-STD – Electronic Records Management Software Applications * Design Criteria Standard at
http://jitc.fhu.disa.mil/recmgt/)
* Community of interest (COI) metadata. A COI is a collaborative group of people who must exchange information in pursuit of their shared missions, goals, interests, or business processes, and who therefore must have shared definitions for the information they exchange. COIs create metadata that is harmonized with that of other DoD-wide COIs through governance authority
-
Metadata service registry. The MDE includes the service registry that stores information about implemented services and service interfaces. The service registry provides access to a set of services to create, maintain, update, and manage the metadata in the MDE. The three major services are the:
1. Discovery service or federated query
2. Open source software based on the UIMA framework (
http://incubator.apache.org/uima) called the Automate Metadata Population Service (AMPS)
3. Asset registration service
For any file in shared space, AMPS operates within the MDE using a variety of tools to assign metadata values, such as key wording, automated metadata extraction software, and others. Numerous COTS products exist for automatically extracting metadata from IAs. The AMPS tools examine each IA and automatically extract the metadata to populate the metadata registry. The asset registration service is a registry of all IAs within the MDE.
AMPS deserves special notice as it includes application tools that go beyond records management. With AMPS extracting exhaustive metadata for every stored IA, all information management functions and operations can become more accurate, thorough, and rapid. For example, AMPS greatly improves search and retrieval for any information management purpose. Its adoption should render any component of an ECM suite more efficient and effective because AMPS operates at the infrastructure level. Hence, AMPS is a set of tools highly useful in its own right.
When a desktop user hits the “Save” button for a document or sends an e-mail, the document or e-mail becomes an IA stored in shared space and subject to the MDE. Operating in the background as a web service, the collection of services that make up the MDE automatically extracts the IA’s metadata and stores it in the metadata registry. A user searching for any IA is searching the metadata registry, a process that typically consumes only milliseconds. Only after the user has identified the correct IA from the metadata registry is full text produced.
Retention, Disposition, Records Status
According to NARA’s glossary of archival terms, a records schedule is: “A type of disposition agreement developed by a Federal agency and approved by NARA that describes Federal records, establishes a period for their retention by the agency, and provides mandatory instructions for what to do with them when they are no longer needed for current Government business.”
An IA’s retention period and disposition instructions differ from a records schedule in that they are not approved by NARA. They are, however, derived from AF records schedules within the MDE retention/disposition rules engine. The targeted end state for the MDE includes a rules engine that can be utilized by AMPS to support automatic generation of retention period and disposition instructions. The rules engine operates by comparing the IA’s metadata with detailed AF records schedules, selecting the schedule that most closely matches the IA’s metadata, and assigning to the IA the retention/disposition from that schedule. Figure 2 portrays the operation of the retention and disposition rules engine.

A records schedule is itself an IA existing within the MDE and may be represented by the set of metadata extracted from its description of records falling within the schedule, plus the schedule’s rules for retention and disposition. What particularly distinguishes a record IA is that its description, retention, and disposition have been approved by NARA.
For a given IA, the rules engine finds the appropriate retention/disposition by locating the best match among all AF records schedules’ metadata. The engine then assigns to the IA the retention/disposition from the bestmatched schedule. In addition, if the IA’s metadata match a records schedule’s metadata within a given confidence level, the IA is designated as a record and so tagged in its metadata. The design of the rules engine permits associating an IA with more than one records schedule if matches are close enough.
At this writing, the IAM retention/disposition rules engine is still in the process of being constructed, although its conceptual bases and design parameters are well understood.
An Integral Process
IAM, as conceived within the AF, automates all desktop user records management decision making. Records managers must still design records file plans and create records schedules, as well as oversee and provide quality assurance for the records management operations within the MDE. Staff must still be trained in the importance of records to their work and the enterprise.
IAM assigns metadata values to all digital IA, whether documents, e-mails, or anything else, thereby rendering them discoverable within the MDE. To the extent that IAM is realized in practice, it eliminates the necessity to train desktop users to recognize what information constitutes a record and how to take appropriate action. In this sense, IAM also automates e-mail records management and solves the matter of e-discovery.
IAM is a generalized information management strategy intended to encompass the full range of enterprise information management functions, not just records management. Search and retrieval of information for any purpose, for example, is an integral aspect of IAM.
The IAM model is vendor neutral in regard to the ECMS in a given enterprise and indeed should improve the effectiveness of any ECMS. Because MDE operates as a web service at the server level, it is transportable and scalable in principle to any intranet environment regardless of the ECMS resident in that environment.
For an enterprise willing to invest in its metadata infrastructure, IAM resolves the vexing problems of managing e-mail, handling e-discovery, and motivating desktop users to carry out records management operations.
Download the PDF version here.
Michael Corrigan and J. Timothy Sprehe, Ph.D., can be contacted at jtsprehe@jtsprehe.com.
From May - June 2010