Tuesday, March 3, 2020

What Are The specific challenges Of Machine Intelligence at NOC?


Today, some of the main challenges of NOC management, described in the following diagram, are:

Troubleshooting billions of service alarms

Processing around 20 million workflow management notifications by NOC experts.
Manage millions of call center emails

Higher costs due to the low use of workflow management

Incident management is an area where we already use specialized system structures. However, the continually evolving nature of networks, both from the technological point of view and for implementation, makes it very difficult to maintain rules written by hand in specialized systems. Automated incident management independent of a data-controlled domain, without the need for specific regulations, would significantly improve automation in NOCs. For example, a failure in one node can cause cascading failures in other nodes, resulting in a series of alarms. Machine learning techniques allow us to discover contemporary patterns in a flow of signals and other events, allowing us to identify the root cause in most failure scenarios quickly. This frees the noc team so they can focus on more complex challenges.

What type of complexity does this imply?

Typical handling of NOC alarms involves mapping received signals for incidents using enrichment, aggregation, deduplication, and correlation techniques. It is a challenge due to the heterogeneity of alarm information caused by the solutions of several technologies and several suppliers used in today's telecommunications networks. This heterogeneity makes it difficult to create a harmonized view of the system and considerably increases the complexity associated with detecting and resolving faults.

Can we afford to encode long term domain knowledge?

Current NOC solutions include handling alarms based on rules from different sources, such as nodes or service management systems or element/network management systems. The rules are written in such a way that they convert domain-specific information into an overview of the network at the NOC Center and also include coding practices which process / correlate alarms for appropriate grouping.

This rule development is time-consuming and manually intensive. Continuous changes in the network with new types of network nodes and the resulting new types of alarms also make the development and maintenance of rules more complicated. Besides, the generation/updating of the regulations must be carried out frequently; otherwise, the rules database will be incomplete or even inaccurate.

Does this mean that we have stopped developing domain-oriented rules?

This does not mean that the development of traditional rules is disappearing, but domain-independent data approaches will augment it. Besides, automatic detection of possible correlations between alarms can increase the rule-based approach when the rules are not complete or when domain-specific knowledge has not yet been acquired.

The data-based approach will help identify correlations between domains and generate data-based information. Gradually, the system can evolve towards a fully automated solution.

NOC based data automation

We will share with you a case study on the automatic incident formation, root causes, and self-correction scenarios in which we work as part of our investigation.

We apply the principles of Machine Intelligence (data mining and data science) to discover patterns of behavior in large historical datasets. These behaviors or patterns essentially mean a correlation between alarms and co-occurrence patterns. An exciting aspect of our approach is that we evaluate it not only as time-series data but also examine how to deal with broadly symbolic or categorical information collected on the network and identify latent behaviors from it.

This approach helps experts in the field to learn evolutionary and unknown behavioral models when the environment is multi-technology and multi-vendor. These correlated and grouped models allow the automatic grouping of alarms, which opens the way for the automatic detection of incidents in the network, at the source, and in the mechanical repair.
With this approach, we can achieve an intelligent grouping of alarms and tickets with minimal manual participation; We can reduce or altogether avoid the manual development of rules, automatically identifying large and absent groups, and we can reduce the total a number of incident tickets.

No comments:

Post a Comment

What is the Adrozek Virus?

Malware that infuses counterfeit advertisements is a typical issue. It is normally found in malignant program augmentations. This malware is...