Posted by: Sasirekha R
High Availability, IBM, Mainframe, performance, virtual, zOS
Mainframe z/OS Components
Mainframe operating system z/OS is a share-everything runtime environment that provides for resource sharing through its heritage of virtualization technology. z/OS gets work done by dividing it into pieces and giving portions of the job to various system components and subsystems that function interdependently.
The workload management (WLM) component of z/OS controls system resources, while the recovery termination manager (RTM) handles system recovery.
At any point in time, one component or another gets control of the processor – makes its contribution, and then passes control along to a user program or another component. The control typically gets passed when a job has to wait for information to be read in from, or written out to, a device such as a tape drive or printer.
As with memory for a personal computer, mainframe central storage is tightly coupled with the processor itself, whereas mainframe auxiliary storage is located on (comparatively) slower, and cheaper external disk and tape drives.
Typical z/OS middleware (between the operating system and an end user or end-user applications) includes:
- Database systems
- Web servers
- Message queueing and brokering functions
- Transaction managers
- Java virtual machines
- Portal services
- XML processing functions
System Address Spaces and Master Scheduler
Many z/OS system functions run in their own address spaces. The master scheduler subsystem runs in the address space called *MASTER* and is used to establish communication between z/OS and its own address spaces. Master initialization routines initialize system services, such as the system log and communication task, and start the master scheduler address space.
Batch processing is the most fundamental function of z/OS. Many batch jobs are run in parallel and Job control language (JCL) is used to control the operation of each job.
z/OS requires the use of various subsystems, such as a primary job entry subsystem or JES. An address space is created for every batch job that runs on z/OS. Batch job address spaces are started by JES.
Multiple initiators (each in an address space) permit the parallel execution of batch jobs. Correct use of JCL parameters (especially the DISP parameter in DD statements) allows parallel, asynchronous execution of jobs that may need access to the same data sets (a technique to use batch parallelism and improve availability)
There are address spaces for middleware products such as DB2, CICS, and IMS (referred to as “secondary sub-systems). Typically an automation package starts all tasks in a controlled sequence. Then other subsystems are started. Subsystems are defined in a special file of system settings called a parameter library or PARMLIB.
Workload Management (WLM)
WLM manages the processing of workloads in the system according to the business goals, such as response time. WLM also manages the use of system resources, such as processors and storage, to accomplish these goals.
WLM has three objectives:
1. To achieve the business goals that are defined by the installation, by automatically assigning sysplex resources to workloads based on their importance and goals (goal achievement).
2. To achieve optimal use of the system resources from the system point of view (throughput).
3. To achieve optimal use of system resources from the point of view of the individual address space (response and turnaround time).
Goal achievement is the first and most important task of WLM. Optimizing throughput and minimizing turnaround times – which come after that – are essentially contradictory objectives.
Optimizing throughput means keeping resources busy. Optimizing response and turnaround time, however, requires resources to be available when they are needed. Achieving the goal of an important address space might result in worsening the turnaround time of a less important address space. Thus, WLM must make decisions that represent trade-offs between conflicting objectives.
WLM is particularly well-suited to a sysplex environment. It keeps track of system utilization and workload goal achievement across all the systems in the Parallel Sysplex and data sharing environments.
A mainframe installation can influence almost all decisions made by WLM by establishing a set of policies that allow an installation to closely link system performance to its business needs. Workloads are assigned goals (for example, a target average response time) and an importance (that is, how important it is to the business that a workload meet its goals).
I/O and data management
The input/output architecture is a major strength of the mainframe. It uses a special processor to schedule and prioritize I/O: the System Assist Processor (SAP). This processor is dedicated to drive the mainframe’s channel subsystem, up to 100,000 I/O operations per second and beyond. The channel subsystem can provide over 1000 high-speed buses, one per single server.
Data management activities can be done either manually or through the use of automated processes. When data management is automated, the system uses a policy or set of rules known as Automatic Class Selection (ACSTM) to determine object placement, manage object backup, movement, space, and security. ACS applies to all data set types including database and Unix file structures.
Storage management policies reduce the need for users to make many detailed decisions that are not related to their business objectives.
Today’s z/OS provides a disk device geometry called Extended Address Volume (EAV) that enables support for over 223 gigabytes (262,668 cylinders) per disk volume in its initial offering. This helps many larger customers having the 4-digit device number limitation to begin consolidation of disk farms.
Intelligent Resource Director (IRD)
Intelligent Resource Director can be viewed as Stage 2 of Parallel Sysplex. IRD gives the ability to move the resource to where the workload is. z/OS with WLM provides benefits from the ability to drive a processor at 100% while still providing acceptable response times for critical applications.
IRD is not a product or component, but consists of three separate but mutually supportive functions.
1. WLM LPAR CPU Management – provides a means to modify an LPAR weight to a higher value in order to move logical CPUs to that LPAR that is missing its service level goal.
2. Dynamic Channel-path Management (DCM) – designed to dynamically adjust the channel configuration in response to shifting workload patterns.
3. Channel Subsystem I/O Priority Queueing (CSS IOPQ) – z/OS uses this function to dynamically manage the channel subsystem priority of I/O operations for given workloads based on the performance goals
Predictive Failure Analysis and Health Checker for z/OS
Predictive Failure Analysis (PFA) is designed to predict whether a soft failure (abnormal yet allowable behaviors that can slowly lead to the degradation of the operating system) will occur sometime in the future and to identify the cause while keeping the base operating system components stateless. PFA is intended to detect abnormal behavior early enough to allow you to correct the problem before it affects your business.
PFA uses remote checks from IBM Health Checker for z/OS to collect data the installation. The objective of IBM Health Checker for z/OS is to identify potential problems before they impact z/OS’ availability or, in worst cases, cause outages.
Next, PFA uses machine learning to analyze this historical data to identify abnormal behavior. It issues an exception message when a system trend might cause a problem – thereby improving availability by going beyond failure detection to predict problems before they occur. To help customers correct the problem, it identifies a list of potential issues.