Ask the Expert - Archiving
About the Expert:
Jill Hearn is a principal product marketing manager responsible for EMC SourceOne e-mail management and associated applications within the information governance product portfolio. Prior to joining the marketing team, she spent seven years with EMC’s systems engineering group helping enterprise corporations solve their storage, e-discovery, and compliance challenges in the e-mail space.
Hearn can be contacted at jill.hearn@emc.com.
Questions:
Q: If my organization does “backups,” why do we need archiving?
There’s a common misconception that data backups are indeed archives. Let’s take a minute to review the differences.
Backups are secondary copies of active production information used when a recovery copy is needed to get an end user back to work or, in the case of a disaster, to get the business back up and running. Since backups are focused on constantly changing business information, they are generally short-term and often overwritten – say, monthly – when full backups are taken. This makes backup a poor choice for retaining data for compliance reasons.
An archive, on the other hand, is not a copy of production data, but rather the primary version of a piece of data, often inactive or non-changing. When data stops changing or is no longer frequently used, it is best to move it to an archive, where it lives outside the backup window but can still be accessed.
Archives do not focus on "recovering" an application or business data, but allow for information retrieval – usually at the level of a file, e-mail, or other individual piece of content. They are typically used for long-term retention of information and, thus, are the best choice for managing data with regulatory compliance requirements.
Q: When you archive, aren’t you just saving everything?
Well, that depends on how, what, and when you archive. Let’s take e-mail archiving as an example. For many organizations, saving everything is a requirement because of corporate or governmentally mandated regulations. In this case, the organization may want to make a copy of the “primary version” of every message by performing real-time capture, or journal copy, of each and every e-mail. These messages are archived, unaltered and modified, making the copy of the message in the user mailbox a “convenience” copy.
However, when considering archiving of file system or SharePoint data, the approach to archiving may be different. When a document is being worked on, modified, versioned, etc., there is no need to archive all of these working copies. A backup is exactly what is required while the work is in progress. Once the content is complete, no longer subject to change, and no longer actively being accessed, it’s time to archive that final copy for future reference, business reuse, intellectual property, compliance, and so forth. All those transitional copies of the content may be removed as the final copy is safely archived, indexed, and immediately available for that future request.
That’s where information governance comes into play. It encompasses the people, practices, and technology to proactively manage and take control of information and understand what information is at what point in its lifecycle and to apply the appropriate policies, including retention, disposition, and, as appropriate, long-term preservation. It includes visibility into the information within the organization, allowing the organization to understand what information it has, where it is stored, and take action on it. Archiving fits squarely into a good information governance strategy.
Q: Is archiving only for IT? Who else should care about archiving?
While IT plays a central role in archiving and derives many benefits from implementing and maintaining an archival system, IT is not the only part of the organization with a vested interest in archiving. Here are three categories of people who should care about archiving:
Line-of-business user – The line-of-business user may hold many different titles in the organization. Among the most common are: records manager, compliance officer, general counsel, or content manager. Regardless of title, these individuals are responsible for developing retention and disposition policies for content across the organization.
For instance, they may establish classification rules for e-mail messages and metadata, which define the criteria for what must be kept and for how long. They may have concerns about how any new archiving technology fits into the overall records management or enterprise content management (ECM) strategy of the company. Archiving is critical to helping the line-of-business user:
- Consistently apply retention policies based on lifecycle- or time-based triggers
- Manage content within an overall corporate retention plan strategy
- Facilitate the creation of a repeatable e-discovery process for lower costs and quick return on investment
- Prove message authenticity and chain of custody for regulatory compliance and corporate policy mandates
IT managers – IT managers apply the retention and disposition policies across the company’s IT infrastructure. IT is responsible for implementing tools to manage information according to corporate policies for e-mail compliance and governance. The IT manager has to maintain the service level agreement commitments to the business and is responsible for keeping storage costs in control. In addition, the IT manager may also have to respond to discovery requests. With archiving, the IT manager can:
- Proactively manage the growth of information and reduce the size of the production storage environment
- Get control of unmanaged data
- Improve operational efficiencies of production content-based systems
End user – End users need seamless access to all of their content regardless of its archive status. So, for this reason, even if they don’t recognize that archiving is part of an organizational process, they indirectly care about archiving. These end users will access archived data and items such as shortcut messages, access this data with from a variety of devices and locations, and intuitively search and retrieve archived data with their stated archive search tool.
Q: What kinds of archiving policies are there?
Policies can vary, but it is certain that all policies must be enforceable, repeatable, consistent, and defensible. There may be policies that support information growth, requiring that the data is monitored and managed based on its business value. The policy most likely will need to support regulatory, legal, and business requirements. Managing the access to content is vitally important to any policy, as it improves business operations and allows for business reuse of information. In fact, policies are foundational to a good information governance strategy.
Q: What kind of content can and should be archived?
Any content can be archived, from e-mail and files to SharePoint and unstructured data, if you have the right technologies in place. What should be archived depends on the business value of the information and the potential risk that information may have. For example, a business plan may have significant value to the company even after the plan is implemented, so it should be archived, but an e-mail to your mother should not.
At the same time, keeping everything or randomly disposing of information means that you’re putting your organization at risk. If an audit or litigation occurs, it’s a lot simpler (and more cost-effective) to cull relevant content or to justify why something is deleted if a standardized process (based on business value) is in place. Putting the right policies in place (see the preceding question) can help you ensure that you’re archiving what you really need and disposing of what you don’t.
Q: What should you look for in an archiving solution?
An archiving solution needs to be able to scale to your business and meet your specific business needs, but here are a few guidelines:
- Complete archive – Look for solutions that can archive as many content types as possible within a single archive or can do so between multiple archives transparently. This ensures that you’re reducing costs and risks by archiving as many content types as possible.
- Capture user interactions/legal hold – If you’re in a litigious or regulatory industry, you may need to capture user interactions with the information, which may show whether something was edited, deleted, etc., in an appropriate manner. You may also need a solution that enables notifications when content is being put on hold in the case of an audit or litigation.
- Ingesting old content – Depending on your business needs, you may need to ensure that your archiving system can take in and manage old content.
- De-duplication – As mentioned above, particularly with e-mails, you may need an archiving system that de-duplicates e-mails so you’re not saving every copy on an e-mail sent to 50 people.
- Search capabilities – You will want the solution to seamlessly search across repositories (if there are multiple) and ensure that it can quickly and efficiently search large data sets.
- Scalability – You may be a relatively small organization today, but if you think about how much e-mail and other content you generate today, just imagine what it will be tomorrow. Scalability is the key whether your organization – or just your information – is growing.
- User experience – As mentioned in the answer to the third question, one of the key aspects of an archiving solution is that the user doesn’t really notice it exists. Being able to access archived information without impacting the user experience is a sign of a good solution.
Ask the Expert sponsored by:
Submitted questions:
Q: How has automation affected archival operations?
Automation of archiving has many advantages to the IT division of organizations, as well as to the business units within an organization. From the IT standpoint, automatically capturing all messages allows for backups to become just that, backups. It’s no longer necessary to spend the time, effort, and cost maintaining backups as a form of record. With automation, there is no requirement for the business units to make determinations as to what would need to be archived, thus saving the users time and increasing their productivity. With the automated capture and indexing of data into the archive, users can quickly find data based on the indexing and search tool that is optionally provided to the users. Maintenance of mailbox data is no longer necessary, as quotas aren’t necessary.
That said, there are many organizations that still have the requirement to have certain data within the system classified manually and set to certain retention periods. For this reason, many vendor products provide organizations options for both cases. For the bulk of messages, automate the archive and set standard retention. When users deem a message to be of additional value, they can simply drag and drop that particular message/attachment into a special folder with a specific retention policy relating to the classification of the data (e.g., an accounting document that requires a seven-year retention).
I am speaking primarily to e-mail management, but the same principles apply to all forms of content (e.g., SharePoint and/or file systems data). I hope this answers your question around automation of data, if you have additional questions please feel free to respond.
— Florence