Skip to navigation | Skip to main content | Skip to footer
Menu
Search the University of Manchester siteSearch Menu StaffNet

IT service disruption on Monday/Tuesday

15 Oct 2009

How the failure of the Storage Area Network system was handled

As most members of the University community will be aware, the University suffered a major disruption to its IT Services on Monday afternoon/evening when the Storage Area Network system, which holds many critical University systems (email, calendar, shared folders, finance, student system, HR/payroll, library catalogue, etc), failed.

The system, which is supported by an outside supplier EMC Computer Systems Limited, is designed to be fully resilient and is similar to the system used by banks and large companies, so a failure on this scale is highly unusual. A full description of how events unfolded and the incident was handled by IT Services is given below:

  • Shortly after the system failed at 4.39pm on Monday, IT Services declared a "Major Incident" and a team of central and Faculty IT staff was formed to diagnose the problem, communicate with colleagues and draw up a list of services that needed to be restored as a matter of priority. Top of that list were the restoration of student services, staff email and the payroll system, which was due to run the monthly payroll the following day.
      
  • The IT Services team worked through the night with engineers on-site and specialist support in the United States to restore services. The Storage Area Network (SAN) system was recovered at around midnight on Monday and the lengthy process of restarting and validating services began. By 8.30am on Tuesday, the core student services, payroll and 50 percent staff of email were working once again. All other services except the calendar were restored by 4pm Tuesday with the calendar back online before midnight the same day. A small number of email mailbox issues are being fixed as they are reported to the IT service desk.
       
  • Throughout the period of the serious disruption, IT Services were briefing senior University managers and endeavouring to communicate with clients via the IT teams in Faculties and regular updates on the University website. They are also working closely with EMC Computer Systems Limited to identify the root cause of the problem (this is being worked on 24/7) and to put in place more robust contingency arrangements for the future.

IT Services would like to apologise for the disruption caused by this system failure. If you are still experiencing problems with your email mailbox, please contact:

  • IT Service Desk on 65544

The Director of IT Services would welcome feedback on how the incident was handled and comments on how communications could be improved, should a similar incident arise in the future. Please contact: