Future Watch: Strategies for Long-Term Preservation of Electronic Records
With the volume of electronic records exponentially growing and hardware and software constantly evolving, organizations face an ever-increasing challenge to maintain accessibility to those records that must be retained long-term. Strategies abound, but there is not a one-size-fits-all solution; it requires a unique combination of tools, policies, procedures, and compromise.
Gordon E.J. Hoke, CRM
While most records are short-lived, my field research shows that up to 20% need retention for 10 years or more. Further, keeping records available for 25 years and beyond is common in insurance, utilities, finance, medicine/ pharmaceuticals, and other industries. For physical records on paper or microform, this is not a major problem. For electronic records, long-term preservation presents a serious challenge without a widely accepted solution.
The fifth Generally Accepted Recordkeeping Principle® (GARP®) is the Principle of Availability, which states: “An organization shall maintain records in a manner that ensures timely, efficient, and accurate retrieval of needed information.”
A Personal Case in Point
My 7-year-old friend, Tina, just finished two years of radiation and chemotherapy for leukemia at the Mayo Clinic. Mayo started moving to digital records even before President Barack Obama signed into law the Health Care Reform Act of 2010, which included provisions to boost the use of electronic medical records to cut down on costly redundancy and waste. Tina has an excellent prognosis for a long, happy life. However, she will need her recent medical records in adulthood.
“Now we’ve learned there are long-term complications,” said Jennifer Wright, director of the Pediatric Cancer Late Effects Clinic at Salt Lake City’s Huntsman Cancer Institute, in a September 2011 Salt Lake Tribune article. “[Patients] still need close follow up by a specialist who is familiar with the treatment they received as children, as well as the risk factors they now face as adults.”
Obviously, Tina will not remember the drugs and dosages she received. The key to her adult health may be the availability of her 25-year-old digital medical records.
Sources define long-term differently, without consensus, but for this article, the term means 10 years or more. A noted security consultant (who prefers anonymity) uses the phrase “persistent records,” a useful, indefinite description. Within a 10-year span, most operating systems and application software change significantly. Security requirements change as well, and storage media may lose their integrity. In short, electronic records stored for 10 years or more may be in jeopardy.
Ray Kurzweil, eminent futurist and creator of the first optical character recognition program that could read any style of print, said, “… There is no set of hardware and software standards existing today, nor any likely to come along, that will provide any reasonable level of confidence that the stored information will still be accessible
… decades from now.”
While there are many strategies for preserving the availability of digital records, there is no single solution, no best practice, and no established policies or procedures that meet widespread needs. The issues fall into four categories: storage media, hardware, software, and governance.
Storage Media Longevity
Many variables affect the lifespan of storage media, and no comprehensive, scientific evaluations advise the consumer. However, it is clear that each medium has an Achilles’ heel. For example, prudent care reduces, but does not prevent, deterioration of digital linear tape, such as:
- Increasingly brittle tape
- Failure of the adhesive that attaches the magnetic particles to the tape
- Exposure to magnetic fields
Optical disks use organic dyes that biodegrade over time, and few blank disks come with a date of manufacture. The conditions under which disks are shipped and stored are generally uncontrolled. It is hard to predict the longevity of a spindle of DVDs produced of unspecified materials in western China, shipped over land to Shanghai, transported across the Pacific Ocean on a container ship, and sent by rail to a distributor in the United States.
Even solid state memory degrades over time due to cracking seals, unstable material, and environmental factors, including cosmic rays.
The lifespan of computers and their peripherals is shorter than that of persistent records. Only specialists retain disc drives for the 5.25-inch floppies of the 1990s or even the subsequent 3.5-inch disks. An Apple representative recently opined that its future computers would not support removable media; all storage and access to stored information would go through the cloud.
Similarly, most software is obsolete before the end of persistent records’ lifespan. Disk operating system (DOS), which was the most common operating system (OS) on personal computers in 1990, is unseen now. Microsoft stopped supporting its Windows 97 OS in 2007, and it makes no commitments to backward compatibility in future versions of Windows.
Generally, application software changes even faster. Updated versions of line-of-business programs debut regularly. The rapid churning of software developers and manufacturers means there are no guarantees of continuing support and compatibility that ity. For mission-critical applications, the source code can be held in escrow, but putting that to work is expensive and sometimes impractical.
This may be the area of greatest vulnerability for persistent records. It reeks with uncertainty because it requires action today based on predictions and assumptions about the future. Two basic issues cast doubt on the maintenance of persistent records because of governance.
First, each of the electronic options for long-term preservation, described below, has serious limitations. For example, a basic tenet of records management is provenance, the practice of organizing records in useful sequence,
such as date of creation or entry into a records management program. Without special care, a digital record’s date
(often entered as metadata) may be corrupted by something as beneficent as a virus scan. Similarly, when a record is transferred from an old medium to a newer one, the date of origin might be altered.
Second, there is no question that maintaining electronic record viability over the long haul requires effort and resources. Today, it is easy and irresponsible for a records manager to fail to make plans and investments for the future. In fact, current demands of most records’ programs fully consume the resources of their directors. Few have the assets, means, and foresight to make substantive provisions for the next 25 years and longer.
It seems presumptive to assume that the next-generation records management staff will have the skills, time, technology, money, and motivation to maintain accessibility to persistent records from today, when current records managers have made no plans to help them.
“Technological obsolescence may not be the greatest problem! Organizational commitment, and the willingness to allocate sufficient resources, may be an even bigger problem,” noted authors David O. Stephens, CRM, and Roderick C. Wallace, CRM, in Electronic Records Retention: New Strategies for Data Life Cycle Management.
Given the challenges to preserving records’ availability over extended lifecycles, it is fortunate that several prudent strategies exist. Clearly, there is no single solution that is best for all situations. In noting the various technologies and techniques available, the best strategists will consider all and create a blueprint that best meets their needs.
Print to Paper
Tried and true, paper lasts hundreds (if not thousands) of years. Longevity depends upon stable materials and controlled storage. The techniques and risks are well known. Printing digital records is usually most appropriate for small quantities of records, as accessibility may suffer and costs will increase as volume rises. Printing in fonts easily recognized by scanner software facilitates a return to digital media upon records’ retrieval. A challenge in printing digital records is to retain associated metadata with the printout.
Establish a Computer Museum
Some organizations maintain their aging records by stopping the calendar, technologically speaking. IT directors sometime assert that records within computer systems are viable, as long as the systems are active. They only see an issue when systems are retired. This is valid as long as the peripheral devices and supporting infrastructure still work with the aging system. When records’ retention period does not exceed the longevity of the system, this can be a usable strategy. It has limits, however. Recently, a septuagenarian who programs COBOL, which is one of the earliest computer programming languages, reported he continues to work for ever-higher pay because of the
scarcity of his skill.
A related strategy creates a museum of computer hardware and software to accompany record archives. The idea is that when records are needed, all components can be re-activated to deliver old records. This potentially expensive strategy fails to address the instability of storage media, degradation of hardware, and the potential lack of human skills necessary to make old systems productive.
The history of digital storage is littered with obsolete devices and related software designed to preserve digital information. An array of drives, cartridges, juke boxes, silos, tape players, and the like are now useful mainly as boat anchors and avant-garde sculpture materials.
To counter this parade of obsolescence, storage programs committed to removable media practice regular, wholesale migration of records from old media to new.
Jessica Grosset, director of IT at the Mayo Clinic in Rochester, Minn., reports that patient data is stored on primary and backup media. “[The Clinic is] very progressive in keeping technology current over time which helps insure the information is safe and accessible today, and will be available for years to come. We have nine staff members replacing older technology with newer technology on an ongoing basis, and ensuring that our six petabytes of information are safely stored on current technology.”
Grosset’s staff specifies life expectancies of the disks they use and audits quality control of manufactured
disks they purchase.
Anecdotal information indicates other large organizations also employ staff groups dedicated to systematic renewal of storage media. For organizations that can afford large-scale programs, migration may be a useful procedure. Issues of provenance and metadata maintenance require serious attention.
A bigger issue is readability. This can be ascertained only through the concerted efforts of a succession of records managers. Also, records staff need to work with IT colleagues to ensure old records that were encrypted can still be decrypted. Similarly, IT, the records staff, or both need to maintain passwords that allow access to protected records. Only well-defined policies and procedures will ensure password maintenance for 25 years or more.
Use an Archival Format
Many archivists’ favorite strategy for long-term preservation is converting digital images into non-proprietary, widely accepted formats. While this does nothing to overcome hardware obsolescence, it raises the likelihood that today’s records will be readable in the future if the storage media retain integrity.
For example, the Minnesota Historical Society stores images in TIFF format, which maps every bit without loss, even when compressed. TIFF files are relatively large, but they are complete. The format enjoys wide acceptance and stability, although it is neither an international standard approved by the International Organization for Standardization (ISO) nor an American National Standard approved by the American National Standards Institute. In fact, the format has not undergone a major update since 1992.
One concern about the TIFF format is that its copyright is privately held by Adobe Systems Incorporated, so its future is in private hands. That concern does not apply to another favored format from Adobe, PDF/A, a public version of the popular portable document format that has been approved as an ISO standard, ISO 19005-2:2005 Document management – Electronic document file format for long-term preservation – Part 1: Use of PDF 1.4 (PDF/A-1). PDF/A was well-designed for archiving, and like the other strategies described here, it is a useful tool in the records manager’s tool belt.
PDF/A limitations, however, make it less than a panacea for long-term preservation. For example, its large file size suggests its best use is with smaller quantities of records. PDF/A does not accept encryption, potentially making it inappropriate for personally identifiable information. Perhaps most worrisome, Adobe has made no promises of backward compatibility for future versions, leaving questions about long-term accessibility.
Microfilm, invented in 1839, is a relatively low-tech strategy appropriate for many persistent records. In climate-controlled environments, manufacturers tout a 500-year life expectancy for some types of film. In general storage conditions, 20 years of viability is reasonable.
Microforms, including microfiche, are not appropriate for audio and video, and color images raise the cost. They are affected by the same environmental factors as paper, but are less resistant to high heat. Retrieval of individual records can be slow, although proper indexing and computer-aided finders improve retrieval times.
Relatively few records and IT leaders understand that many microfilmed records, such as text documents (using optical character recognition) and spreadsheets (when printed as 2-D barcodes), can be returned to process-able, digital files, complete with metadata and embedded formulae, said Robert Breslawski, worldwide product manager at Eastman Park Micrographics.
When re-digitized, the records can enter a content management system for workflow, rapid retrieval, and more. In records management, however, immutability is a key principle, and standard microfilm’s stability is an advantage.
Consider Emerging Technologies
The door will never close on improvements in records preservation. For example, the current ultimate in size reduction and longevity is Rosetta HD from Norsam Technologies, which writes analog or digital records with an ion beam onto stable media, such as nickel. The engraving is 10 microns wide, which reduces character size more than 20 times from standard microfilm. The nickel is stable, unaffected by temperature, humidity, magnetic fields, and more.
More immediately applicable is a hybrid system widely used in the United Kingdom (UK), C-Cube Software, which offers a battery of technologies matched to the specific needs, values, and risk tolerance of each user. During a user conference, representatives from the UK’s National Health Service reported that dozens of facilities of Britain’s national health trust use C-Cube for long-term storage of medical records.
There are many obstacles to long-term preservation of digital information, and there are many strategies for meeting the needs. Today’s practitioners must amalgamate a compromised hybrid of policies, procedures, hardware, and software to best meet their organization’s need. All solutions will leave some imperfections and risk, and none will be permanent. The needs and the tools with which to address those needs will always evolve.
To begin a long-term preservation project, follow the steps below:
Identify Acceptable Risk Level
This process will help define the needs of a long-term preservation solution. For example, Grosset explained that the Mayo Clinic has identified a “core” class of information that is not quite “vital,” but still receives high priority for long-term preservation because losing access to it would be highly risky. Other peripheral information may not justify extensive retention, and its loss would be within stated risk tolerance, said Grosset.
Assess Current Effectiveness
This evaluation answers the question: How well is the current solution meeting the organization’s needs? Consultants Lori Ashley and Charles Dollar recently created a maturity model for long-term preservation of digital records that measures the effectiveness of a program and compares it to an organization’s acceptable level of risk. (See www.SaingTheDigitaWorld.org/MaturityModel.html.) If maturity metrics reveal a chasm between what is concrete and what is acceptable, there is a clear call to action.
Survey Existing Tools
During this process, the practitioner surveys existing tools, strategies, and tactics to cobble together a solution, with accompanying policies and procedures. This is art, as well as science, and it always involves compromises, especially in terms of budget, cost, and effectiveness. After implementation, another maturity assessment measures the effectiveness of the new preservation program.
An Ongoing Challenge
The Mayo Clinic’s approach to long-term preservation could be one to emulate.
“We work on it. We have plans, migration paths, and strategies for our infrastructure to be state of the art … the data is well protected and well supported,” explained Grosset. “We know exactly how old each piece of equipment is … asset management is a constant challenge. We have multiple migration teams. Passwords change every six months. There are multiple backups organized by Hierarchical Storage Management.”
This is a blend of realism and best practices. It doesn’t try to save everything forever, but persistent, important records are highly likely to be available for decades to come.
If my friend, Tina, needs her patient records from 2010, she is likely to find them. Long-term preservation of persistent, digital records will have served its purpose.
Download the complete PDF version here.
Gordon E.J. Hoke, CRM, can be contacted at email@example.com.
From May - June 2012