Augmented Reality for Data Centers


Augmented Reality for Data Centers

With COVID-19 still taking a heavy toll on the Silicon Valley, Data Center operators are now more critical than ever. More SREs (Site Reliability Engineers) are being called back into central control rooms & call centers to closely monitor data centers/colocations to proactively maximize the uptime of internet services. Thank goodness for Enterprise and open source monitoring tools that help to trigger sub-routines to elastically spin up and spin down services without manual intervention. However, sometimes automation doesn’t go our way and even the most sophisticated monitoring systems are prone to error. To make matters worse, new IoT devices and edge systems are being deployed, which help to improve the performance and user experience of cloud-native applications, supporting telecommunications, ecommerce, mobile payments, cryptocurrency, oil & gas, social networks, media & entertainment, and more. Many reliability issues can be circumvented with Kubernetes and orchestration tools that manage auto scaling, but popular cloud-native apps with millions of users, puts a huge amount of stress on our infrastructure per PoP (Points of Presence).

Mapping application workloads all the way down to real hardware is no easy task, especially with cloud-native applications. Even some of the most advanced ITOps teams do not have the all the essential APIs necessary to enrich their time-series databases for critical infrastructure. Even if they do, they may not have a direct way to manage sub-systems that are feeding directly into their infrastructure dashboards with power utilization, temperature sensors, air humidity, server health, etc.

In this blog, we will explore how AR can help ITOps team with a different perspective to supplement their existing dashboards, alerting, and ticketing systems. This short clip below helps to illustrate how a simple AR application can be used in data centers to monitor the power consumption of x86 servers.

Augmented Reality for Data Centers

With new microarchitectures on the horizon from both Intel & AMD, the max TDP of the CPUs are going up to 300W-400W based on this Reddit post from jhoosi. When those systems come online, ITOps teams will be monitoring those systems very closely to ensure their clusters are staying within their power budget limit. AR glasses aren’t quite here yet, but as soon as they are available, DevOps teams may want to consider developing this type of solution for ISPs and CSPs. For Enterprise companies, this may be a very challenging application to develop for B2B clients with a multi-cloud strategy, using public cloud resources, colocations services, and on-premise deployments, comprising of multiple OEM vendors and legacy systems.

OEM server vendors that are following DMTF standards for power schemas can make it much easier to develop these type of applications. Below is a list of Redfish APIs app developers can potentially use to implement into back-end systems to support an AR app for monitoring the power utilization of x86 servers. To understand this a deeper level, let’s examine how Redfish APIs may be used to manage and monitor the power of Supermicro servers.

Supermicro:

/redfish/v1/Chassis/1/Power

/redfish/v1/Chassis/1/Sensors

/redfish/v1/Chassis/1/Thermal

/redfish/v1/TelemetryService/

With a GET method, TelemetryService can collect metrics and data logs for power consumption on Supermicro services. TelemetryService contains a collection of resources, including: status (state, health) and power usage (Average, Minimum, and Maximum).

To learn more about Redfish APIs, check out the Supermicro Redfish Reference Guide.

Within a multi-vendor environment, the Redfish API implementation may be slightly different amongst different OEM server vendors. The GET method over HTTP is a convenient way to test and study the response of x86 systems. Based on the response, it should be fairly straight forward to prioritize data points with an ingest pipeline via Kibana or OpenSearch.

Fleet automatically adds ingest pipelines for its integration. Fleet applies these pipelines using index templates that include pipeline index settings. Elasticsearch matches these templates to your Fleet data streams based on the stream’s naming scheme.” On Elastic.co, they show a basic Index template to help create & test an ingest pipeline for an app with specific dataset(s).

https://www.elastic.co/guide/en/elasticsearch/reference/current/common-log-format-example.html

With these technologies in mind, DevOps teams can consider how critical infrastructure resources can be accessed by AR users. DevSecOps should also be taken into account, as critical infrastructure should only allow privileged users to view this type of data. Additionally, mobility management systems should consider the adoption of biometric authentication and geo-location tracking for AR glasses to ensure employees/contractors are locked down to using the technology at only specific sites. If we set aside management oversight and governance, developers can at least start to build mock-up applications to visualize data that is vital for ITOps.

Augmented reality needs to be much more thoughtful in order to supplement or enrich data coming through different mediums, ig: monitors, laptops, tablets, phones, etc. SREs in a central control room may already have multiple dashboards already sitting in front of them. How can an AR app provide them useful information without getting in the way? Perhaps, pinning specific monitoring systems to a persistent HUD (heads up display) could be very useful. Field technicians can also use them to pin down clusters that are drawing too much power or need to be serviced within a constrained maintenance window. Data Center Technicians may need a totally different UI/UX to only review information that is relevant to their service ticket, such as tasks, timelines, as well as an easy way to close ticket(s) with proof of work. This is where QR codes and built-in RFID scanners may work seamlessly with an AR application. For example, if a power supply has failed, a technician can scan the barcode of the new power supply and repaired system to update the new active parts in the asset management database. Companies with a sustainability program may require employees to identify/scan failed power supplies to check into their E-Waste database for proper disposal in order to close the loop. With Augmented Reality, many of these tasks can be streamlined for Data Center operations.

In future blog posts, we will dive deeper into how developers can use telemetry and log services to trigger notifications and webhooks for AR applications.