Operations, administration, and management

Operations, administration, and management or operations, administration, and maintenance (OA&M or OAM) are the processes, activities, tools, and standards involved with operating, administering, managing and maintaining any system. This commonly applies to telecommunication, computer networks, and computer hardware.

In particular, Ethernet operations, administration and maintenance (EOAM) is the protocol for installing, monitoring and troubleshooting Ethernet metropolitan area network (MANs) and Ethernet WANs. The OAM features covered by this protocol are discovery, link monitoring, remote fault detection and remote loopback.

Standards

edit
  • Fault management and performance monitoring (ITU-T Y.1731[1]) - Defines performance monitoring measurements such as frame loss ratio, frame delay and frame delay variation to assist with SLA assurance and capacity planning. For fault management the standard defines continuity checks, loopbacks, link trace, and alarm suppression (AIS, RDI) for effective fault detection, verification, isolation, and notification in carrier networks.
  • Connectivity fault management (IEEE 802.1ag[2]) - Defines standardized continuity checks, loopbacks and link trace for fault management capabilities in enterprise and carrier networks. This standard also partitions the network into 8 hierarchical administrative domains.
  • Link layer discovery (IEEE 802.1AB) - Defines discovery for all provider edges (PEs) supporting a common service instance and/or discovery for all edge devices and P routers) common to a single network domain.
  • Ethernet in the First Mile defined in IEEE 802.3ah[3] mechanisms for monitoring and troubleshooting Ethernet access links. Specifically, it defines tools for discovery, remote failure indication, remote and local loopbacks, and status and performance monitoring.
  • Ethernet protection switching (ITU G.8031) - Brings SONET APS / SDH MSP-like protection switching to Ethernet trunks.

OAMP

edit

OAMP, traditionally OAM&P, stands for operations, administration, maintenance, and provisioning. The addition of 'T' in recent years stands for troubleshooting, and reflects its use in network operations environments. The term is used to describe the collection of disciplines generally, as well as whatever specific software package(s) or functionality a given company uses to track these things.

Though the term, and the concept, originated in the wired telephony world, the discipline (if not the term) has expanded to other spheres in which the same sorts of work are done, including cable television and many aspects of Internet services and network operations. 'Ethernet OAM' is another recent concept in which the terminology is used.

Operations encompass automatic monitoring of the environment, detecting and determining faults, and alerting admins. Administration typically involves collecting performance stats, accounting data for the purpose of billing, capacity planning using Usage data, and maintaining system reliability. It can also involve maintaining the service databases which are used to determine periodic billing. Maintenance involves upgrades, fixes, new feature enablement, backing up and restoring data, and monitoring the media health. The major task is Diagnostics and troubleshooting. Provisioning is the setting up of the user accounts, devices, and services.

Although they both target the same set of markets, OAMP covers much more than the five specific areas targeted by FCAPS (See FCAPS for more details; it is a terminology that has been more popular than OAMP in non-telecom environs in the past). In NOC environments, OAMP and OAMPT are used to describe the problem management life cycle more and more - and especially with the dawn of Carrier-Grade Ethernet, telco terminology is becoming more and more embedded in traditionally IP termed worlds.

O - Operations A - Administration M - Maintenance P - Provisioning T - Troubleshooting

Procedures

edit

Operation

edit

Basically, these are the procedures you use during normal network operations.

They are day-to-day organisational procedures: handover, escalation, major issue management, call out, support procedures, regular updates including emails and meetings. In this section group, you will find things like Daily Checklists, On-call and Shift rotas, Call response and ticket opening procedures, Manufacturer documentation like technical specifications and operator handbooks, OOB Procedures

Administration

edit

These are support procedures that are necessary for day-to-day operations - things like common passwords, equipment and tools access, organisational forms and timesheets, meeting minutes and agendas, and customer Service Reports.

This is not necessarily 'network admin', but also 'network operations admin'.

Maintenance

edit

Tasks that if not done will affect service or system operation, but are not necessarily as a result of a failure. Configuration and hardware changes that are a response to system deterioration. These involve scheduling provider maintenance, standard network equipment configuration changes as a result of policy or design, routine equipment checks, hardware changes, and software/firmware upgrades. Maintenance tasks can also involve the removal of administrative privileges as a security policy.

Provisioning

edit

Introducing a new service, creating new circuits and setting up new equipment, installing new hardware. Provisioning processes will normally include 'how to' guides and checklists that need to be strictly adhered to and signed off. They can also involve integration and commissioning process which will involve sign-off to other parts of the business life cycle.

Troubleshooting

edit

Troubleshooting is carried out as a result of a fault or failure, may result in maintenance procedures, or emergency workarounds until such time as a maintenance procedure can be carried out. Troubleshooting procedures will involve knowledge databases, guides, and processes to cover the role of network operations engineers from initial diagnostics to advanced troubleshooting. This stage often involves problem simulation and is the traditional interface to design.

See also

edit

References

edit
edit
  • RFC 3429, 7276
  • Ethernet Operations, Administration, and Maintenance from Cisco
  • Operational Efficiency in ERP and CMMS with integration of AI
  • "EFM OAM Tutorial" presentation by Kevin Daines, IEEE