Michigan State University
Disaster Recovery Planning


Step by Step Guide
MSU Home Home ] Up ] Next ]

Step by Step Guide for Disaster Recovery Planning for Michigan State University Units

There is no one best way to write a Disaster Recovery Plan. The following step-by-step guide was created using best practice information, and is intended to help units create their plans as easily and efficiently as possible.

The forms and documentation provided in Steps 1 through 6 are to assist each Unit with organizing  information needed for developing the Disaster Recovery Plan (DRP). Structure for the plan itself is detailed in Step 7.

These forms, and the planning approach inherent in them, should be modified with information specific to your Unit's daily activities. 

If the Unit currently has a Disaster Recovery Plan in place there is no need to recreate it, but the plan should be reviewed to insure the information is complete and current.

I. Information Gathering

Step One - Organize the Project 
This step would normally be performed by  a college dean, chairperson or director, or the senior administrator of the unit, working with the coordinator/project leader identified in Task 1 listed below.

  1. Appoint coordinator/project leader, if the leader is not the dean or chairperson.
  2. Determine most appropriate plan organization for the unit (e.g., single plan at college level or individual plans at unit level)
  3. Identify and convene planning team and sub-teams as appropriate (for example, lead computer support personnel should be in the team if the plan will involve recovery of digital data and documents).
  4. At the college and/or unit level set:
    1. scope - the area covered by the disaster recovery plan, and objectives - what is being worked toward and the course of action that the unit intends to follow.
    2. assumptions - what is being taken for granted or accepted as true without proof; a supposition: a valid assumption.
  5. Set project timetable
  6. Draft project plan, including assignment of task responsibilities
  7. Obtain Dean's approval of scope, assumptions and project plan, if the leader is not the dean or chairperson.

Sample forms that may be useful in organizing Step One

Plan Organization

Project Plan

Step Two Conduct Business Impact Analysis  
This step would normally be performed by the coordinator/project leader in conjunction with functional unit administrators (chair person, assistant director, associate director, department chair or director).

In order to complete the business impact analysis, most units will perform the following steps:
  1. Identify functions, processes and systems
  2. Interview information systems support personnel 
  3. Interview business unit personnel 
  4. Analyze results to determine critical systems, applications and business processes
  5. Prepare impact analysis of interruption on critical systems  

Sample forms that may be useful in organizing Step Two

Business Impact Analysis 
Critical System Ranking Form

Step Three Conduct Risk Assessment  
The planning team will want to consult with technical and security personnel as appropriate to complete this step. The risk assessment will assist in determining the probability of a critical system becoming severely disrupted and documenting the acceptability of these risks to a unit.

For each critical system, application and process as identified in Step 2 using the Critical System Ranking Form:

  1. Review physical security (e.g. secure office, building access off hours, etc.)

  2. Review backup systems

  3. Review data security

  4. Review policies on personnel termination and transfer  

  5. Identify systems supporting mission critical functions

  6. Identify Vulnerabilities (Such as flood, tornado, physical attacks, etc.)  

  7. Assess probability of system failure or disruption

  8. Prepare risk and security analysis  

Sample forms available for organizing Step Three

Security Documentation

Vulnerability Assessment 

Step Four - Develop Strategic Outline for Recovery  
Tasks 1, 2, 3, and 4 below will be mainly applicable to units using or managing technology systems to process critical functions. The coordinator/project leader and the functional unit may wish to appoint the appropriate people (e.g., functional subject matter experts) to perform the subsequent tasks in Step 4.

  1. Assemble groups as appropriate for:

Hardware and operating systems

Communications

Applications

Facilities

Other critical functions and business processes as identified in the Business Impact Analysis

  1. For each system/process above quantify the following processing requirements:

Light, normal and heavy processing days  

Transaction volumes

Dollar volume (if any)

Estimated processing time

Allowable delay (days, hours, minutes, etc.)

  1. Detail all the steps in your workflow for each  critical business function  (e.g., for student payroll processing each step that must be complete and the order in which to complete them.)

  1. Identify systems and applications

Component name and technical id (if any)

Type (online, batch process, script)

Frequency

Run time

Allowable delay (days, hours, minutes, etc.)  

  1. Identify vital records (e.g., libraries, processing schedules, procedures, research, advising records, etc.)

Name and description

Type (e.g., backup, original, master, history, etc.)

Where are they stored

Source of item or record  

Can the record be easily replaced from another source (e.g., reference materials)

Backup

Backup generation frequency

Number of backup generations available onsite

Number of backup generations available off-site

Location of backups

Media type

Retention period 

Rotation cycle 

Who is authorized to retrieve the backups?

  1. Identify if a severe disruption occurred what would be the minimum requirements/replacement needs to perform the critical function during the disruption.

Type (e.g. server hardware, software, research materials, etc.)

Item name and description

Quantity required

Location of inventory, alternative, or offsite storage

Vendor/supplier

  1. Identify if alternate methods of processing either exist or could be developed, quantifying where possible, impact on processing. (Include manual processes.)

  2. Identify person(s) who supports the system or application

  3. Identify primary person to contact if system or application cannot function as normal

  4. Identify secondary person to contact if system or application cannot function as normal

  5. Identify all vendors associated with the system or application

  6. Document unit strategy during recovery (conceptually how will the unit function?)

  7. Quantify resources required for recovery, by time frame (e.g., 1 pc per day, 3 people per hour, etc.)

  8. Develop and document recovery strategy, including:

Priorities for recovering system/function components

Recovery schedule  

Sample forms available for organizing Step Four

Group Assignments

Critical System Processing Requirements for Recovery 

Step Five Review Onsite and Offsite Backup and Recovery Procedures
The planning team as identified in Step 1 Task 3 would normally perform this task.

  1. Review current records (OS, Code, System Instructions, documented processes, etc.) requiring protection  

  2. Review current offsite storage facility or arrange for one

  3. Review backup and offsite storage policy or create one

  4. Present to unit leader for approval  

Sample forms available for organizing Step Five

Backup and Recovery Procedures

Step Six Select Alternate Facility  
The planning team as identified in Step 1 Task 3 would normally perform this task.

ALTERNATE SITE: A location, other than the normal facility, used to process data and/or conduct critical business functions in the event of a disaster.

  1. Determine resource requirements

  2. Assess platform uniqueness of unit systems (e.g., MacIntosh, IBM Compatible, Oracle database, Windows 3.1, etc.)

  3. Identify alternative facilities

  4. Review cost/benefit

  5. Evaluate and make recommendation

  6. Present to unit leader for approval

  7. Make selection

II. Plan Development and Testing

Step Seven Develop Recovery Plan  
This step would ordinarily be completed by the coordinator/Project Manager working with the planning team.

Sample forms available for organizing Step Seven

Sample Plan Outline

The steps for developing the Recovery Plan are listed below in outline form to demonstrate how a unit may choose to organize their Disaster Recovery Plan.

  1. Objective  

The objective may have been documented in the Information Gathering Step 1 "Plan Organization".

Establish unit information  

  1. Plan Assumptions

  2. Criteria for invoking the plan

Document emergency response procedures to occur during and a fter an emergency (i.e. ensure evacuation of all individuals, call the fire department, after the emergency check the building before allowing individuals to return)  

Document procedures for assessment and declaring a state of emergency  

Document notification procedures for alerting unit and university officials  

Document notification procedures for alerting vendors

Document notification procedures for alerting unit staff and notifying of alternate work procedures or locations.

  1. Roles Responsibilities and Authority

Identify unit personnel

Recovery team description and charge

Recovery team staffing

Transportation schedules for media and teams  

  1. Procedures for operating in contingency mode

Process descriptions

Minimum processing requirements

Determine categories for vital records

Identify location of vital records

Identify forms requirements

Document critical forms

Establish equipment descriptions

Document equipment - in the recovery site

Document equipment - in the unit

Software descriptions

Software used in recovery

Software used in production

Produce logical drawings of communication and data networks in the unit

Produce logical drawings of communication and data networks during recovery

Vendor list

Review vendor restrictions

Miscellaneous inventory

Communication needs - production

Communication needs - in the recovery site

  1. Resource plan for operating in contingency mode

  2. Criteria for returning to normal operating mode

  3. Procedures for returning to normal operating mode

  4. Procedures for recovering lost or damaged data

  5. Testing and Training

Document Testing Dates

Complete disaster/disruption scenarios

Develop action plans for each scenario  

Sample Testing Diagram

  1. Plan Maintenance

Document Maintenance Review Schedule (yearly, quarterly, etc.)

Maintenance Review action plans

Maintenance Review recovery teams

Maintenance Review team activities

Maintenance Review/revise tasks

Maintenance Review/revise documentation  

  1. Appendices for Inclusion

inventory and report forms

maintenance forms

hardware lists and serial numbers

software lists and license numbers

contact list for vendors

contact list for staff with home and work numbers

contact list for other interfacing departments

network schematic diagrams

equipment room floor grid diagrams

contract and maintenance agreements

special operating instructions for sensitive equipment

cellular telephone inventory and agreements  

Step Eight - Test the Plan

  1. Develop test strategy

  2. Develop test plans

  3. Conduct tests

  4. Modify the plan as necessary

Samples

Test Plan Strategy

Test Plan Scenario

Test Results/Test Evaluation

III. Ongoing Maintenance

Step Nine - Maintain the Plan  
Dean/Director/Unit Administrator will be responsible for overseeing this.

  1. Review changes in the environment, technology, and procedures

  2. Develop maintenance triggers and procedures

  3. Submit changes for systems development procedures

  4. Modify unit change management procedures

  5. Produce plan updates and distribute

Step Ten Perform Periodic Audit

  1. Establish periodic review and update procedures