Reliability, Availability, And Serviceability Features - IBM Power PS700 Installation And User Manual

Power systems

Hide thumbs Also See for Power PS700:

Overview (59 pages)

Technical overview and introduction (148 pages)

Problem determination and service manual (294 pages)

Table Of Contents

Table of Contents

Light path diagnostics provides light-emitting diodes (LEDs) to help you diagnose problems. An LED

on the blade server control panel is lit if an unusual condition or a problem occurs. If this happens,

you can look at the LEDs on the system board to locate the source of the problem.

For more information, see the online information or the Problem Determination and Service Guide.

v Power throttling

If your BladeCenter unit supports power management, the power consumption of the blade server can

be dynamically managed through the management module. For more information, see the online

management-module documentation or the IBM support site at http://www.ibm.com/systems/

support/.

Reliability, availability, and serviceability features

Three of the most important features in server design are reliability, availability, and serviceability (RAS).

The reliability of the BladeCenter PS700 blade server starts with components, devices, and subsystems

that are fault tolerant.

Reliability, availability, and serviceability protect the integrity of the data that is stored in the blade server,

maintain the availability of the blade server when you need it, and enhance the ease with which you can

diagnose and correct problems.

Component-level RAS features

The blade server has the following component-level RAS features:

v Alternate processor recovery

v Bit steering

v Chipkill memory for dual inline memory modules (DIMMs)

v Diagnostic support of Ethernet controllers

v Dual inline memory module (DIMM) failure isolation

– DIMM pair identification through unrecoverable error (UE) checkpointing and message-related

recovery actions

– Single DIMM identification through recoverable component error (CE) checkpointing and garding

v Dynamic deallocation (runtime POWER7 garding of microprocessor and memory)

v L2 cache line delete

v Memory chip kill - Chipkill memory for DIMMs

v Memory Predictive Failure Analysis (PFA) alerts through scrubbing and error-checking and correction

(ECC)

v Memory scrubbing

v Peripheral component interconnect (PCI) bus parity, ECRC, and surprise link down

v PFA thresholding of correctable hardware errors of the microprocessors and L2 cache

v Processor runtime diagnostics (PRD) that initiates the following actions to recover from errors:

– Self-healing, such as redundant bit steering for memory

– Deallocation at runtime of a failing resource, such as a processor core, a memory page

– Identifying parts for service

– Runtime error persistent deallocation, if necessary, for I-Cash, D-cash, L2 cache, L3 cache

– Transparent microprocessor hardware error recovery (for example, for L2 cache errors)

v Single processor checkstop (including a partition checkstop)

Chapter 1. Product overview

Table of Contents

Reliability, Availability, And Serviceability Features - IBM Power PS700 Installation And User Manual

Reliability, availability, and serviceability features

Related Manuals for IBM Power PS700

Related Content for IBM Power PS700

Table of Contents