Scalability
How Can We Help?
Thoughts on SIF — Scalability
In the early days of systems development, most applications had their own proprietary data management systems, not because this was a good idea but rather because there weren’t sophisticated database systems that managed things such as preventing multiple users trying to update the same data at the same time, scheduling preventive maintenance and managing security. But having all this complexity in many places was not good for many reasons, among them were:
- Things like security and privacy weren’t implemented in a totally consistent manner – if each application company needed to develop the code itself – even if they were all working from a common international standard – it would never be the same as if there was a single piece of software enforcing all the rules
- The system as a whole ended up being much less reliable because there were many more places where the "complex logic" parts were implemented
- The individual pieces were much more expensive initially to license and to maintain because they needed to do much more than applications that didn’t need to do these activities
The same principles could easily extend to networking or any one of several other areas we often take for granted when we work with modern servers such as Windows Server. What we want to do here is examine a popular method for implementing SIF in large organizations, look at the weak spots and see how Managed Virtual Zones can eliminate these weaknesses. It is a radical departure and will require that the reader be able to forget what you’ve heard is “the only way SIF can be implemented.” This paper is intended for those who are familiar with SIF and will specifically use examples from the UK version of the specification. The opinions expressed herein are solely those of the author and may not reflect those of anyone outside of Visual Software, Inc.
Multi-Zone Agents
A multi-zone agent is a SIF agent that participates in more than one SIF zone. It may be a publisher or a subscriber; it may use push more or pull more and it may run over HTTP or HTTPS. The SIF specification itself doesn’t restrict an agent from how many zones it may join – there is only one restriction:
- For each zone, for each context, for each object there may be at most one provider.
Now, the provider is the agent in the zone that answers requests from other agents when they ask for information without being specific about from which agent they want it. So, an agent may subscribe in more than one zone and it may generate events in more than one zone. It may also be the provider in more than one zone — the only restriction is that there must not more than one provider in the same zone.
Very often, applications are shared between zones either at the Local Authority (LA) or Regional Broadband Consortium (RBC) level. In a multi-zone environment, the application needs have logic to manage the multiple connections and keep the data straight between the different schools. If it needed to generate events or if it were to serve as the provider in one or more of the zones, it would need to make sure that it gave the correct information in a particular zone using the correct GUIDs that were appropriate for that zone. It even gets more complicated when certain SIF object cross-reference other SIF objects. Take for example: what if in one of the school zones a reference is made to "Oak Street Nursery School" using a given SIF RefId by a given MIS. It is very doubtful that another MIS from another supplier being used at another school would use the same SIF RefId to represent "Oak Street Nursery School", and using the EstablishmentId won't help much because in the UK, EstablishmentIds aren't assigned for nursery schools.
The Multi-Zone Agent Approach
The multi-zone agent approach is very simple: create overlapping zones as needed and have the multi-zone agents appear in more than one zone. This approach relies on the agent to be able to manage the data complexities that occur because an agent is receiving data from many unrelated sources. By doing this, the agent:
- …is now responsible for not inappropriately comingling data
- …must handle the differences between the way different suppliers provide information
- …must make sure all audits are adequately maintained
- …must provide good administration facilities (perhaps a scripting language) considering that any activity will need to be applied to many (thousands of) connections
To illustrate this and make the diagram simple, I'll only show 14 schools where a typical RBC might have 2,500 schools. The box labeled "RBC App" represents an application being shared at a Regional Broadband Consortium and the boxes labeled "School" represent SIF school zones. Each of these school zones would probably have data provided by an MIS (Management Information System — the UK equivalent of a Student Information System). Now, consider some of the things you might need to do as an administrator of this shared application:
- 2,500 agent connections will need to be set up to install this agent.
- If the application needs to be shut down for maintenance, it will need to be put to sleep in all 2,500 zones.
- Any SIF administration that is performed in one zone will need to be performed in all 2,500 zones.
- If the application is a SIF "pull" mode agent, then it will probably have 2,500 pull threads running. If each of these are polling once a minute, 2,500 SIF messages are being sent every minute, simply to ask if there is anything to do and 2,500 more to reply back with "nothing to do". 5,000 messages per minute per application to do nothing.
- Element-level filtering will need to be set up for all 2,500 connections.
- If the connections are not local, each of the 2,500 connections will require a HTTPS certificate, which will likely expire at a different time of the year
This was with a single application. Now, add a few other applications at the RBC level and some at the LA level:
In this example, there are only four shared applications at the RBC level and two at the LA level and we are only showing 14 of the 2,500 schools. In a more realistic example, we might have 10 applications shared at the RBC level and 5 applications at each of 20 LAs. Doing some quick calculations:
- Administrators would need to set up: 10 x 2,500 (RBC applications) + (5 x 20 (LA applications) x 125 schools at each LA) connections = 37,600 agents to set up
- If an RBC agent needs maintenance it would need to be put to sleep in 2,500 zones (as before)
- If the RBC data center was needing to do maintenance, 25,000 sleep requests would need to be sent
- If these were "Pull" mode agents and polled once per minute, 50,200 SIF messages would be sent across the network each minute just to see if there were any waiting messages
- Element-level filtering would need to be set up on 25,100 connections
- If the connections are not local, each of the 25,100 connections will require a HTTPS certificate, which will likely expire at a different time of the year
Managed Virtual Zones
With Managed Virtual Zones, each application agent is only responsible for the data in a single zone. This architecture uses a central piece of software (Envoy) that manages virtual zones at different levels (LA, RBC) that contain the SIF object information appropriate for that level and scope. For example, a virtual zone for a given LA will contain information for all learners who spend part of their day in that LA, but not for learners from other LAs. With this architecture, the complexities of the large infrastructure are centrally managed, making each connected agent much simpler because it has to do far less. An Envoy-connected implementation might look more like: In this example, each application has a single connection to a single zone. That zone will have the following features:
- They will contain information for learners who attend any part of the day there. This will properly handle records for learners who attend more than one school, even if the MIS systems in those schools come from different suppliers. This works for 14–19 learners as well as special education learners and works at the school level as well as at the LA level. Furthermore, virtual zones can be created for any grouping of schools (partnership) if needs dictate.
- Parent records from multiple MISs may be matched (option), so that multiple records for the same person map into a single authoritative record
- LAInfo and SchoolInfo records are mapped into a single set of authoritative records — each school becomes the authority for its own information
- Virtual zones can be assigned sets of element level filters (just like agents in ZIServer) — this is called "Blanket filtering". This is useful for setting up default filtering for large groups of agents
- If a shared application needs to go to sleep or perform maintenance, it makes the request to its one connection
- If a shared application is a "Pull" mode application and polls once a minute, it sends one message and receives one response.
In the same example as above, where we had 10 applications at the RBC level and 5 applications at each of 20 LAs. Let's do some comparisons:
Characteristic | Multi Zone | Managed Virtual Zones |
---|---|---|
Agent connections to be set up | 10 x 2,500 (RBC applications) + 5 x 20 (LA applications) connections = 37,600 agents to set up | 2,500 (1 for each school) + 10 (1 for each RBC app) + 100 = 2,510 |
If a single RBC agent needs maintenance, how many connections would need to be put to sleep? | 2,500 (one for each connection) | 1 |
If the RBC data center needs to do maintenance, how many connections would need to be put to sleep? | 25,000 | 10 |
If agents were "Pull" mode and polled once per minute, how many overhead messages would travel through the network per hour? (this doesn't even count real data being transmitted through the network) | 25,000 x 2 (request + response) x 60 = 3,000,000 | 10 x 2 (request + response) x 60 = 1,200 |
Element-level filtering would need to be set up on how many connections for these applications? | 37,600 | 2,500 + 10 (RBC level) + 100 (LA level) = 2,610 |
If the schools are not on the same network and certificates are required for each application, how many would be required? | 25,100 | 2,500 (one for each school) |
There are many differences between the two models, but the conceptual differences between the two are very important:
- The complex parts of "the system" are being handled in one place (Envoy) instead of many (in every agent, written by every agent developer)
- Complex decisions are handled consistently
- They are debugged once and are regression tested every time any change is made
- All decisions are made in a consistent manner — business logic is made through a "business rules" engine, leaving the decision making process to those who know the business best
- All actions are logged in an audit trail and tools are made available so that the audits are simple to access
- Data is handled a minimum number of times — When an MIS publishes an object, it is stored in Envoy. Events are forwarded to the appropriate virtual zones and then the data stays in Envoy until the data is requested. It is not passed from zone to zone or from web service to web service as in other architectures simply to make it available elsewhere. In a large environment, data cannot be re-handled simply for the sake of making it available at a second or third location.
- Data privacy is handled in a consistent manner and provides a facility for fallback "blanket-level" filtering.