• 532 Durham Rd., Newtown PA USA
  • Info@visualsi.com
  • Office Hours: 8:00 AM – 5:30PM (Eastern US)

Background

Computers are very good at keeping lists of things that need to be done as well as when they need to be done. last 24 hours 80px - Dexterity - Remote Monitoring and Predictive MaintenanceComputers have numerous sensors that can detect if the hardware is running normally or starting to fail. Computers can also be taught to gather information about the operating system (Windows), database systems, and important applications (such as your line of business systems) to make sure they are operating at peak efficiency.

For years, people have used these characteristics to build software to monitor environments, locally and remotely. When a problem is detected, the remote management software’s information is used to assign people to fix the issue.

That’s where we were several years ago when we were asked to provide managed services for some of our larger customers. We used remote management software in their large environment, and it alerted us when there was an issue in one of the environments. One of the biggest issues we faced was that the root cause of the problem that needed fixing came so fast and strong (and most times unexpectedly) that by the time we could get to fixing it, the problem had escalated past the point where a simple solution was possible.

Examples of When this Could or Does Happen

These are some examples where Dexterity would be or has been very effective:

Snow Days / Office Closed

For example, an education customer in a snowy region of the country collects attendance data from hundreds of school districts on a daily basis (one record/message per student, information is collected whether or not the child is in school).

A while back, they had a series of very snowy weather causing the entire state to be off from school for five days in a row. When school finally resumed, all schools sent all 6 days’ worth of attendance data at the same time, amounting to several million records in less than an hour.

Peak Periods

Peak Periods happen in any industry. In sales, a specific product or company may receive excellent press in the media or on a TV show or movie. Likewise, sales can be dramatically affected by law changes, requiring groups to purchase goods or services without advance notice. Sudden interest spikes demand. Increased web traffic, higher sales, and pressure to create and ship inventory all lead to potential server crashes.

Natural Disasters

Natural Disasters happen regularly across the world, often without warning. Just as a country and its people are affected by natural disasters, so too are businesses and companies.

When we consider businesses that ship products from one location to another, a disaster in a region of a country may interrupt the normal channels of product delivery for a few hours or even for months. A company may have to reroute product deliveries from one location to another. Sometimes this isn’t possible, so a new location to handle the new traffic of products is required. The new location would now be responsible for the shipping of its own usual amount of packages, as well as the new traffic from the rerouted products, all in an attempt to avoid downtime caused by a disaster. Once the storm has passed and repairs are underway, the demand for new products for these repairs will increase in order to bring the affected areas back up to working order.

As with the other examples, data traffic will increase for the locations responsible for shipping all these products, and the servers responsible for these demands can experience overloading which may result in system crashes that could cascade throughout the entire company.

Health-Related Emergencies

During disaster situations that would cause a greater influx of patients into hospitals, patient management systems can experience lag, causing errors in patient profiles. If systems aren’t performing at the level that they are expected to, there could be an increased risk of HIPAA violations during this time as well.

If a hospital is at maximum capacity, processes need to be put in place for diverting patients to the next closest hospital that is accommodated to care for those patients. For that to happen, these hospitals would need to be in sync in order to smoothly transition patients from one hospital to the other or to balance the patient load. The different data points being filled in these respective systems would also need to be able to handle the strain of doctors and nurses across an entire health network using the same system to input patient data that must be maintained at a highly accurate level.

If not, this could negatively affect health standards at these hospitals. It could also cause problems for the staff which their patients could end up paying for with their lives.

A More Realistic View

Issues like these cause trouble for most or all of your servers at the same time, requiring that they all be addressed at the same time. If they are fixed one by one, different results may happen:

  • The first server to be fixed could be automatically offloaded with all of the workload that can’t be handled by the remaining servers, causing it to return to a failed state within a short period of time.
  • A fix applied to one server could begin a chain reaction that would make other servers more difficult to fix.
  • By the time the first server is fixed, the other servers may be past the point where relatively simple fixes will be effective.

An Enterprise Immune System

Dexterity is designed to be like an immune system for an enterprise.

First, on a regular basis, it performs many of the regular maintenance tasks normally done by administrators or scheduled scripts, plus several others that are specifically designed to maintain good performance of enterprise line of business applications.

On regular intervals, it samples the health of the system, looking for deviations from normal patterns of behavior. It checks log files, services, amounts of time the system has spent since the previous check performing certain activities, status of important system indicators and more. If it notices a behavior pattern emerging that indicates that periods of unusually high activity may be approaching, it accelerates the activities that would ensure good application performance to prevent problems such as running out of disk space, databases becoming less efficient, or  tasks getting overloaded. If the pattern reverses (false alarm/rise in activity was only short-lived), the schedule of remediation returns to normal maintenance levels.

All of these things happen automatically moments after the behaviors change to all servers being impacted. With a typical remote monitoring/staff resolution scenario, it could take an hour or more to locate the source of the problem and mobilize staff to correct it. Additionally, once they know what the problem is, they will always have a disadvantage when the number of machines starting to fail exceeds the number of people working to resolve the issue.