Managed Virtual Zones
Managed Virtual Zones is a concept, supported by software, that uses the current SIF 1.1 and 1.2 UK specifications to implement large SIF implementations in the United Kingdom. It frees them from some of the clashes that arise between multi-zone agents and the way data is typically stored in UK school systems. The objectives of this architecture are:
- …to eliminate the need for SIF agents to manage more than a single zone. This reduces the overall complexity of the system by having the multi-zone complexities managed by a central authority and by doing so, increases the reliability of the entire deployment.
- …to reduce the total number of connections, and by doing so, make the overall system easier to manage, simpler to implement and less expensive to maintain.
- …to handle certain transactions that could not be managed before because the information was not available or because a multi-zone agent had no way to know in which zone to make a request.
- …to provide a framework for handling uniform field level error-checking from the point when a record is published to when it is distributed to other applications
In general this is accomplished by using a single SIF agent, called a Virtual Zone Manager (or Zone Manager for short), to manage the complex parts of the many zone environment and merge the records into logical groupings called Managed Virtual Zones.
Background: Multi-Zone Agents
A multi-zone agent is a SIF agent that participates in more than one SIF zone. It may be a publisher or a subscriber; it may use SIF push mode or pull mode (Section 184.108.40.206 of the specification) and it may run over HTTP or HTTPS (220.127.116.11). The SIF specification itself doesn’t restrict an agent from how many zones it may join – but it does have one restriction that does pose a challenge handling the way learners are educated in the UK:
- For each zone, for each context, for each object there may be at most one provider.
The provider is the agent in the zone that answers requests from other agents when they ask for information without being specific about from which agent they want it. An agent may subscribe in more than one zone and it may generate events in more than one zone. It may also be the provider in more than one zone — the only restriction is that there must not more than one provider in the same zone.
Very often, applications are shared between zones either at the Local Authority (LA) or Regional Broadband Consortium (RBC) level. In a multi-zone environment, the application’s SIF agent needs have logic to manage the many connections and keep the data between the different schools. If it needs to generate events or if it serves as the provider in one or more of the zones, it needs to make sure that it uses the correct information for that zone and uses the correct RefIds appropriate for that zone.
The complexities increase when SIF objects cross-reference other SIF objects.
Consider this example: An application is hosted at an LA and needs to receive input from three schools, all of which reference a fourth school (a school for educating musically talented children). Each of the three schools’ MIS have a different RefId referring to the music school.
When the subscribing application is initializing, it would request SchoolInfo records from each of these MIS systems. The music school may look like three different schools. The URN field is not mandatory and may not be provided in all three; same for the EstablishmentId field and the SchoolName could be spelled differently. Each subscribing application would need “matching rules” for what happens when similar objects are received from multiple providers. Having this complex logic duplicated in each agent leads to a less reliable total solution.
Implementing Multi-Zone Agents
Conceptually, the multi-zone agent approach is very simple: create overlapping zones as needed and have the multi-zone agents appear in more than one zone.
In practice, this approach relies on the agent to be able to manage the data complexities that occur because an agent is receiving and managing data from many unrelated sources. For example, the agent:
- …must be responsible for not inappropriately co-mingling data
- …must be able to differentiate between very similar but different objects of the same type being received from two sources
- …must handle the differences between the way different suppliers provide information (different choreographs)
- …must make sure all audits are adequately maintained
- …must provide good administration facilities (perhaps a scripting language) considering that any activity will need to be applied to many (thousands of) connections
Consider the following example: this diagram shows 14 schools (a typical RBC might serve 2,500 schools). The box labeled "RBC App" represents an application being shared at a Regional Broadband Consortium and the boxes labeled "School" represent SIF school zones. Each of these school zones would probably have data provided by an MIS (Management Information System).
Some of the duties a multi-zone SIF agent would require of its administrator:
- 2,500 agent connections will need to be set up to install this agent
- If the application needs to be shut down for maintenance, it will need to be put to sleep in all 2,500 zones
- Any SIF administration that is performed in one zone will need to be performed in all 2,500 zones
- Element-level filtering will need to be set up for all 2,500 connections
- If the connections are not local, each of the 2,500 connections will require a HTTPS certificate (on each end), and assuming that all 2,500 were not installed on the same day, they will end up expiring at different times of the year
Operationally, if the agent runs in SIF "pull" mode, then it will most likely have 2,500 pull threads running, one managing each session. If each of these are polling once a minute, 2,500 SIF messages are being sent every minute, simply to ask if there is anything to do and 2,500 more to reply back with "there is nothing to do".
The real issue with the multi-zone architecture is that the number of connections and the amount of network traffic grows dramatically as more agents are added. The following diagram illustrates a more typical RBC scenario:
In this example, there are only four shared applications at the RBC level and two at the LA level and we are only showing 14 of the 2,500 schools. In a more realistic example, we might have 10 applications shared at the RBC level and 5 applications at each of 20 LAs. Doing some quick calculations:
- Administrators would need to set up: 10 x 2,500 (RBC applications) + (5 x 20 (LA applications) x 125 schools at each LA) connections = 37,600 agents to set up
- If an RBC agent needs maintenance it would need to be put to sleep in 2,500 zones (as before)
- If the RBC data center was needing to do maintenance, 25,000 sleep requests would need to be sent
- If these were "Pull" mode agents and polled once per minute, 50,200 SIF messages would be sent across the network each minute just to see if there were any waiting messages
- Element-level filtering would need to be set up on 25,100 connections
Managed Virtual Zones
With Managed Virtual Zones (MVZ), each application agent is only responsible for the data in a single zone. This architecture uses a central piece of software (a SIF agent) that manages virtual zones at different levels (LA, RBC) that contain the SIF object information appropriate for that level and scope.
The Virtual Zone Manager subscribes to all objects in each of these zones and is capable of providing whatever objects are required in the zone.
For example, a virtual zone for a given LA will contain information for all learners who spend part of their day in that LA, but not for learners from other LAs. With this architecture, the complexities of the large infrastructure are centrally managed, making each connected agent much simpler because it has to do far less.
An MVZ-connected implementation might look more like:
In this example, each application has a single connection to a single zone. This zone will have the following characteristics:
- It will contain information for learners who attend any part of the day in one of the schools that are in that virtual zone. This will properly handle records for learners who attend more than one school, even if the MIS systems in those schools come from different suppliers. This works for 14–19 learners as well as special education learners and works at the school level as well as at the LA level. Furthermore, virtual zones can be created for any grouping of schools (partnership) if needs dictate.
- Virtual Zones may overlap. A school may participate in more than one virtual zone.
- Parent (contact) records from multiple MISs may be matched (option), so that multiple records for the same person map into a single authoritative record.
- LAInfo and SchoolInfo records are mapped into a single set of authoritative records — each school becomes the authority for its own information.
- Each virtual zone can be assigned a set of element level filters (just like agents are allowed to do in the Zone Integration Server) – this type of filtering is called "Blanket filtering". This is useful for setting up default filtering for large groups of agents.
- If a shared application needs to go to sleep or perform maintenance, the request to one connection.
- If a shared application operates in SIF "Pull" mode and polls once a minute, it sends one message and receives one response to and from its virtual zone.
In the same example as above, where we had 10 applications at the RBC level. Let's do some comparisons:
Connections to Administer:
Add applications at each of the LAs and compare the number of connections required:
The number of connections corresponds to the amount of administration that would need to be done to maintain the system and the cost to maintain the system. One more statistic is important, however: the difference in the amount of network traffic generated. Many of the schools in the UK will not have fixed IP numbers available (or other reasons may) and will require their agents to run in SIF Pull mode. This will require the agent to periodically send a message across the network to the ZIS asking “Do you have anything for me”, to which the ZIS will either send back data or reply “No data.”
Assuming a completely quiet network, the following chart examines the amount of network traffic for the two models:
Even with only one shared application a the RBC level and one at each of the Local Authorities, almost a million messages per hour will be sent through the network solely for polling purposes.
There are many differences between the two models, but the conceptual differences between the two are very important:
- The complex parts of "the system" are being handled in one place (Managed Virtual Zones) instead of many (in every agent, written by every agent developer)
- Complex decisions are handled consistently
- All decisions are made in a consistent manner, in a single place
- All actions are logged in an audit trail and tools are made available so that the audits are simple to access
- Data privacy is handled in a consistent manner and provides a facility for fallback "blanket-level" filtering
- Requests are handled by the Virtual Zone Manager, offloading this responsibility from the MIS computers, easing their burden
Object Management — Overview
The Virtual Zone Manager collects data by:
- When first installed, it requests everything that can be requested from those agents that generate events (those that would normally be providers, such as MIS applications)
- Subscribing to events after the initial conversion
It then maintains an up-to-date copy of the data from each of these applications for every school that connects to it. It also maintains a master translation table, keeping track of authoritative records (those records where there may be more than one copy maintained and one copy has “better” data than the others).
When the Zone Manager receives an event, it more than likely distributes it to more than one zone.
Using a complex example to stretch the point: A learner spends part of the day in a school that is in one LA and the remainder of the day in a school that is in another LA. When an event is generated because the learner’s phone number has changed (this would be stored in the LearnerPersonal object), the “Change” event for that LearnerPersonal would be forwarded to the following zones:
- The school zone for the school that the learner attends in the morning (the main school)
- The school zone for the school that the learner attends in the afternoon
- The LA zone for the school that the learner attends in the morning (the main school)
- The LA zone for the school that the learner attends in the afternoon
- The RBC zone to which both LA zones are connected
So, any subscribing applications connected in any of these zones would receive this event.
When an application sends a request for an object and does not specify an agent, the Zone Manager will respond (it will be the default provider in each zone). Complete learner (workforce, contact) records will be returned in the response stream to this request.
For example, if a LearnerAttendance request is made for a particular learner, records for that learner may have been received from more than one school. Records from all schools will be returned and the requesting application will not need to know beforehand where the records would come from.
Objects in the UK SIF Data Model (for the sake of Managed Virtual Zones) are divided into two categories:
- Foundational Objects – the SIF objects that represent people or things that may be represented in more than one application in the same way, but are always defined at the school level. These objects are:
- Attached Objects – these objects are “attached” to Foundational or other Attached Objects. These can originate at the school, local authority or RBC level. They include all of the other objects defined in the UK SIF specification.
Among the Foundational Objects, in any given virtual zone, more than one MIS may contain a record for the same object. For example, it is highly likely that many school systems will have a SchoolInfo record for a common school. The Zone Manager will have the responsibility for selecting one that becomes the authoritative record, or the most reliable choice that will be used for redistribution. For SchoolInfo, it should be the SchoolInfo that came from the school itself, since it would be the most reliable source for its own information.
For each of the foundation objects, there would be a distinct set of rules that determine which record becomes authoritative in the event that there were two records received for the same object from two sources. In the Managed Virtual Zone model, these rule sets are formed as a set of business rules which are governed by the governing body responsible for the operation of the software.