Managing Shared Information — CSV, ESB, EAI
What's the Problem?
In most larger organizations, several computer applications are used to support the process of educating students, in one way or another. The majority of these applications know about who attends the school, who teaches there, and who the teachers are. Some of them know the students' schedules while others know details such as the grades they received or detailed health-related information such as immunizations or records of injuries. The point is that much of this information is common to many of them.
So, what is the "best" approach to managing the information that is shared by these applications?
- Some say it is best to have a single, centralized database containing all the information and one large application that does everything. (This is referred to as the "integrated approach".) This has its efficiencies, but the application must be:
- able to do everything and do everything well
- able to scale well, because all school users will be using it regularly
- Others say that it is best to use individual applications from those companies with expertise in the application subject areas (and find a way to resolve the data issues). This is referred to as the "best of breed" approach.
Most of the time, what ends up being adopted is either a "best of breed" or a mixture of the two: an organization starts off wanting to get an integrated system, but then realizes that some other systems are needed as well.
Regardless of what's in place, these organizations will have multiple systems that will, by necessity, contain the same data. The big question is: how does that data get into those systems and who keeps it up to date?
How can this problem be managed?
Before we introduce SIF, we'll speak to three other alternatives that have been used in the past and are being used to address this issue; one by default (manual data entry), another because it was a simple technology that was available (exports and imports), and a third that is being experimented with in other industries (enterprise service buses). These are only being presented here to give some perspective.
Manual data entry
The most obvious method (and what gets done by default) is to enter data manually into each system using the application's own user interface. Besides being the most obvious, this is also the most expensive and is the most error prone.
We had a customer district once follow some sample stacks of paperwork around their district to see how much time was spent entering data from those pieces of paper into various systems and how much of that data entry was redundant. With a student population of about 7,000 students, they estimated that they spent about two staff positions worth of time in a given year simply doing the redundant part of the data entry.
Exports and Imports
This method of synchronizing data works by having the provider application of the data create an "export" file on a regular basis, store it in a pre-determined location for each type of data (student teacher, parent, school, enrollment, etc.). Then, each of the "receiving" applications look for files in that location, open them and read the information. If they see any changes in the files, they refresh their databases with the new information.
A setup like this can be set up by an IT department if all the applications have appropriate interfaces. This assumes that:
- there is an "authority" application for each type of data (the SIS would normally be the authority for student data, an HR application might be the authority for teacher data, etc.)
- those applications have the ability to create "extract" files (typically in CSV (Comma-Separated-Values) format) — the design (and maintenance of that design) of these files is often the responsibility of the IT staff member
- those applications can generates these extract files on a regular basis
- the other applications that need the data have matching interfaces that can "import" these extract files on a scheduled basis
- the applications that import the extract files do reasonable things when they encounter errors in the source data
- all the clocks on all these servers stay synchronized so that the "imports" always follow the "exports"
Many organizations use this, but it is costly and error-prone. Costly because whether the IT staff is managing the CSV import/export process or the software suppliers are, the end user ends up paying the cost. There are no standards for the layouts of these files and most application suppliers typically spend a considerable effort maintaining a library of CSV-interfaces.
These interfaces are error-prone because the CSV format doesn't have the capability to recover from errors like other transmission methods do. Typically, if there is an error in a field, the entire record is rejected — or perhaps even the remainder of the file.
Enterprise Service Bus
There appear to be many definitions of this term (we will not include the airport car rental company shuttles in this discussion), but most of them center around the offerings of particular companies, such as Microsoft, Oracle or IBM. They allow connections from different operating systems types, they support web services interfaces and use XML and SOAP to transfer data.
Because of the way they are structured, they have a better error handling capacity than the older CSV import and export and because of its design, the transfer of information will be right away, instead of in batches. This means when a student enrolls, his or her information is immediately sent to all connected systems and is available in a few seconds, not in the next day or two.
The problem with ESB is that it is a technology — it defines the transportation mechanism through which data can pass. It is a very good and well thought-out one, but it doesn't do anything for making life simpler or more cost-effective for a school organization. Those who develop around it will still need to re-do much of the work that all of the companies participating in SIF have done for the last 10 years. This is why these projects tend to be enormously expensive.
Integration Hub
The biggest difference with SIF and these other ways of solving this problem is that it is an industry standard. It wasn't imposed by a government or invented by a single company, but was created by industry experts who amazingly agreed on a number of key concepts, including:
- What things needed to be shared between applications (these became SIF objects)
- What characteristics those things had (these became the elements and attributes in those objects)
- How those things would be passed between applications (this became the definition of the Zone Integration Server):
- Automatically sending data as soon as it changes — the Publish/Subscribe model
- Allowing applications to ask for data when it needs it — the Request/Response model
- How those things would be protected:
- Encryption, HTTPS, Certificates…
- Object-level protection — determining who has the right to see which objects
- Element-level protection — determining who has the right to see which data elements in those objects
This didn't happen overnight; this has been an ongoing process since the late 1990's and has involved hundreds of companies and thousands of school organizations (people who use and support these applications every day). It has grown to support three regional specifications in the US, UK and Australia and is currently being used in tens of thousands of schools worldwide.
SIF is standards-based; it is fast and it is secure. It can economically meet the needs of a very small installation, yet scale to meet the needs of a large state with millions of students.
How Does ThisWork?
In a SIF environment, applications share data through web connections that send and receive XML language messages in a standardized format. All of the applications communicating connect to a common hub (called a Zone Integration Server (ZIS)) that is responsible for making sure that all messages are delivered properly. So, at the highest level, the collection of systems looks like the picture to the right.
The Zone Integration Server is software that runs either at the school organization or at a hosted location.
For all of these applications to be able to communicate properly, they must "use a common language" as follows:
- The XML messages must have the same format
- If there are encoded fields (such as "Telephone Number Type" or "Citizenship Status") in the data, all parties must agree to use a common set of values for these codes when the data is being transferred (they may use whatever they want in their own applications, but as the data is being transferred it must be in this standardized form).
- All parties must agree to the same method for exchanging these messages ("the protocol")
The Message Broker
The ZIS is a piece of software that runs as a web site that routes messages between applications. It does not hold permanent copies of school information; the only time it keeps much of anything is if one of the receiving applications is offline or slow — it will temporarily hold messages for it until the receiving application drains its message queue. (Another exception is with our ZIServer product — we store a complete audit trail of all messages that pass through the ZIS — these only maintained for auditing purposes)
A ZIS can either be hosted and shared between many organizations or can be installed locally at a school or school district.
Besides routing messages, it is also responsible for maintaining the infrastructure's security.
- Before an application can connect to a SIF infrastructure, the ZIS administrator must "create" a connection for it and authorize the application for certain types of operations. For example, an administrator may create a connection for a library application and authorize it to receive student and teacher demographic information, and student enrollment information, but not financial information.
- Before one location (perhaps a school) connects to another location (perhaps a hosting provider), typically digital certificates are exchanged, so that each party can be assured that the connection is safe.
Applications and Connectors
When SIF was first envisioned, it was assumed that most applications would be adapted to work with SIF and that they wouldn't be built from the ground-up knowing how to directly communicate with a Zone Integration Server. These adapters are called "SIF Agents".
When SIF was in its infancy, several companies set out to address the need to build SIF Agents. We at Visual Software went in one direction; everyone else went in the other. The others began with the assumption that these agents were going to be different enough that each of them should be built mostly from the ground-up (with a little help from an ADK (a programmer's library).
We at Visual Software, however, had spent the previous eight years refining techniques and building technology allowing "new services to attached to existing technologies without disrupting them." So, instead of starting fresh, we used these principles and used this code base to create the first version of our configurable SIF agent ZIAgent, that provides SIF services to existing applications without requiring programming and without requiring modifications to the target application.
In years since, we've continued to refine this model, making it more and more efficient, flexible, reliable and scalable. Moreover, we've made the agent do "the difficult stuff" by default; the things many leave out; the things that separate the good from the excellent agents. To learn more about our configurable agent, see ZIAgent.
How Software Suppliers See SIF
Almost universally, software suppliers would prefer to spend as much effort as possible focused on their own application, but instead find themselves getting sidetracked with multiple interfaces to other applications. They need to stay aware of when the other applications change, then change their interfaces to match the changing designs of the other applications. This is an expensive process that takes away from their ability to address issues within their own product.
If it isn't a waste of their time, then they're charging a significant price for these interfaces, because they take significant work to maintain.
What SIF allows them to do is to create a single interface that can be used to connect to many other applications. Unlike these Import/Export interfaces (which most of the existing ones are), the SIF interface can handle errors and properly recover from them, can protect private information at levels meet or that exceed government standards and can even simplify the entire process for their customers (if done properly).
How School Organizations See SIF
Schools need to solve a number of data-related problems:
- Student, teacher and contact information appears in many different applications
- If there isn't an automatic update feed between them, the data becomes very inaccurate
- If the time-to-update is long, there will be certain operational inefficiencies (for example, day-old attendance data is not useful in the cafeteria/canteen)
SIF is a good way to solve those problems because it has good error handling characteristics, it scales well and data is passed between applications as soon as the data event happens.
We have found in many installations that once the system has been set up, it doesn't need much care beyond normal backups.
Because it is a standard, a school organization can look to an application supplier to provide such an interface without that request being considered unusual, nor seen by the vendor as something useful for only one customer.
Is it Perfect? (time to be honest…)
Well, like anything human-made, SIF has its issues. From our experience installing and supporting many SIF agents over the past ten years, we have found that if all participants use the specification the way it was intended to be used, then things work remarkably well and the users and suppliers are very pleased with the results.
Issues arise, however, whenever one party or another begins to compromise the specification. The certification process catches many of these errors, but it isn't perfect either.
A good rule for evaluation is if you ever hear anyone saying that a particular agent only works when paired with another agent or when paired with a specific Zone Integration Server, then even if it is SIF certified, don't even consider it to be a SIF agent. This means that they have compromised the SIF specification and found a way to fool the certification test harness.