When building a software product, there are two approaches to testing it. A while back, I learned that the way we do this at our company is a bit out of the ordinary, although I never knew that we were that different. Let me illustrate:
The typical approach to building a software product is to design it, write it, and then run it to see if the results match the results you expected when you designed it. In testing it, you use test data that someone else created to the best of his or her ability.
Our approach, on the other hand, has been to design the software, write it, and then run the software in "single step mode" in the development environment and prove that it works correctly each step of the way, using test data specifically designed to find every possible type of error by those writing the code (they know every junction in the code). It takes far longer to do this the first time, but in the long run, we end up spending far less time than the "typical" approach mentioned above.
We feel that the best way to test code is to prove that it works instead of testing to see if it doesn't work and then going back and patching it up where needed.
In the typical approach, the error is detected by looking at the results, noticing they do not meet expectations, then working backwards to find the source of the problem. This situation is illustrated in the following example where the source of the problem was only found by examining "post-process 6" results.
When we identify the problem in Process 3, we might be tempted to do whatever was needed to fix it there. While it might fix the issue found in this particular test, we may overlook the subtle problem in Process 2 that could emerge at some other time.
Using the "prove that it works" method, you never make either of these errors in the first place. While still imperfect, this method ensures that when development is complete the code is in sync with the programmer's understanding of the requirements. This method cannot completely eliminate human factors such as:
Despite these imperfections, this method consistently reduces code errors and provides a framework for the programmer to verify his or her understanding of the specification/requirements is reflected in his or her code.
You might be reading this wondering how this could possibly relate to a SIF implementation. Take for example, a typical US SIF implementation (we will NOT be recommending something like this):
In this example, you would discover problems with your SIF implementation if some teachers, parents or learners never made it into the VLE.
If there were a problem, you would never quite know where the problem originated without quite a bit of work (was there a problem with the MIS, with the MIS SIF agent, with the ZIS, with the VLE SIF agent or with the VLE?). Furthermore, you may have purchased your MIS from one supplier, your ZIS from another and your VLE from still another. You could also have purchased the SIF agents from someone other than the MIS or VLE suppliers.
What we will be suggesting in this paper is something analogous to our software design approach, where the implementation is built incrementally and is proven to work properly at each juncture so that at the end of the process there is a much reduced risk of having something happening like the above. What we are suggesting is to not use your subscribing applications to "debug" your SIF implementation "“ there is a better way.
Note: this document is being written by an American to a British audience, so American words (as in lawyer vs. barrister) will be used with British terminology (as in LearnerPersonal vs. StudentPersonal), so you might need to read it with a bottle of aspirin handy"¦
Hearing about SIF for the first time is exciting, especially if you've endured the headaches of export and import, rejected batches, resubmission, and all that goes with it. Before we get into connecting applications and exchanging data there are two commonly overlooked areas that should be considered: data quality and data privacy.Â Properly addressing these two issues before data starts moving is critical to any successful SIF implementation.
Data Quality is an issue that will likely catch many by surprise. In years past, maintaining "less than clean" data in a school's MIS had far less impact than it will when that system's data becomes the authority for most of the applications in the school, the local authority, the RBC and eventually as the basis for reporting to the DCSF.
Before SIF, if some words were typed into a "phone number" field instead of a real phone number (for example, the word "UNLISTED" was typed where the number should go), that might be acceptable because everyone who used the MIS knew what it meant. But when the MIS is used as the supplier of data for everyone who wants learner data, only the number itself will be acceptable in the phone number field.
Why then could SIF be implemented in the US as described earlier? Rejected messages were never an issue in the US, because the US SIF schema (the mechanism used to programmatically check the correctness of data) validates very few fields where the UK schema validates many. In the US, almost any value is passed through by the ZIS with no complaints where in the UK, the ZIS will end up rejecting a much higher percentage of messages. Is one good and one bad? They're different. The important thing is that the principles that work well in one environment may not work so well in the other.
Errors can be introduced in the source data; for the sake of the remainder of this document, I'll classify them as follows:
With SIF in place, the MIS is going to gain new authority "“ it will officially become the new "digital authority" for learner, contact and teacher information at the school. Before SIF it was only responsible for its own data (unless it generated extract files that were sent elsewhere). What this means is that as soon as an error is entered in the MIS it will propagate throughout the school, and sometimes even overwrite correct information in other applications. I like to refer to this as "data pollution.
Unfortunately, since these errors originate in so many places and are many times they aren't even errors but rather mismatches in capabilities. It is advantageous to be prepared for them and to be able to set expectations beforehand. Sometimes exercises like this may even lead to a conclusion that the project is not worth doing at all. For example, if your only expectation of the entire project depends on a subscribing application being able to do something based on information that your MIS doesn't even store (and never will), then perhaps the project is more than simply buying a ZIS and a few SIF agents.
Data privacy is one area where there are many legal differences between the US and the UK that must be understood before starting a SIF implementation. This is important because the SIF specification was originally designed by people with a good understanding of US laws and customs and sometimes these designs bump up against UK requirements.
For example, in the US, the predominant authority for privacy of student data is the Family Educational Rights and Privacy Act (FERPA ). FERPA doesn't really deal with passing of information between applications in the same school or between schools in the same community "“ it mostly addresses when data is moved from district (like a Local Authority) to state.
In the UK, schools are required to operate under the requirements of the Data Protection Act (DPA) of 1998, the Education Act of 1996 and others which have specific guidelines that govern the transmission of what the DPA refers to as "Educational Records" (SCHEDULE 11, Section 68 (1)(6)), between departments in schools as well as between applications in the same school. For more information, see Data Protection Good Practice Note .
Given these differences, SIF was designed without "fine-tuned security" when connecting applications horizontally, because in the US, it was never an issue. The MIS publishes the information to all who are listening "“ it is up to the subscriber to take what it wants, ignore what is not appropriate. It relies on trusting the integrity of the subscribers to not do "bad things" with sensitive information that has been passed to them that they didn't need.
There are really two issues: the existing SIF objects that were designed for the UK using the US "way of thinking" and "what to do going forward". I believe both have answers; the first took some development and the second may take some changes in the way we design new objects in the future. I'll address the second one first"¦
This is strictly opinion, but I believe that people should consider data sensitivity when designing new objects. Designers should differentiate sensitive data from non-sensitive data by splitting it into separate objects. In a way, this was done with the LearnerSpecialNeeds object "“ most, if not all of the information in this object is sensitive, so SIF object level permissions should be sufficient in safeguarding access to this information.
For example, if the DPA had been considered in the original design of LearnerPersonal, it might have been split into two objects: one containing basic student information and a second containing information covered by the DPA, both containing the same LearnerPersonalRefId.
To handle the "mixed objects", we've added a feature called "PrivacyPlus™" to two of our products: ZIServerâ„¢ and Envoyâ„¢. PrivacyPlus essentially puts a "strainer" in front of the "fire hose" or, more technically, allows an administrator to set up a series of filters that determine exactly which information is allowed to go where.
For ZIServer (the Zone Integration Server), the PrivacyPlus feature can restrict data from being sent to a particular application.
When the subscribing agent is set up in the ZIS, the administrator can choose which elements in a message would be filtered out before sending a message to this agent. This is useful for restricting information between applications within a school.
In this example, the administrator selects which elements (or group of elements) will not be sent to a subscribing application. When ZIServer receives a message for distribution from the provider, it will strip out this information before passing the information on to this application. Since there can be several different "versions" of the same message being distributed, each different version will be audited separately.
For Envoy, PrivacyPlus provides blanket-level filtering for implementing policies concerning what information can be transmitted from one type of organization to another. For example, if a policy states that only the basic learner (no medical or sensitive personal) information can be transferred from a school to a local authority, a set of filters (similar to those in ZIServer) can be placed on the entire Local Authority Zone in Envoy. This would cause all LearnerPersonal objects published for subscribing applications at the Local Authority level to be missing medical and personally sensitive information.
Becta strongly recommends, but does not require, the establishment of the following two roles within each school (to paraphrase their document ):
Having these responsibilities is a daunting enough responsibility, but not having the tools needed to accomplish these duties or verify that things are "as they should be" is unsettling at best.
So, to address these, Visual Software is adding a set of auditing tools to our existing product set (availability expected March 2009). These include:
After spending a short time in the UK, we soon realized that a product line designed for the US market would only partially meet the needs of the UK market and that the differences between the two were far different than post code formats or the way that contacts were stored. We couldn't just take a new database schema, update code sets and pretend that our products met the needs of the UK market, because they wouldn't. Some of the differences between the two environments run as deep as the differences between the two countries' views on data privacy.
So, in our feeble attempt to buck the American tradition to force our way of doing things on everyone else (you are supposed to be at least smiling here), we took the path of redesigning our products to meet the needs and requirements of the UK market, from the enhanced data privacy requirements to the multi-level regional zoning requirements.
There are three core products in a typical Visual Software SIF implementation: ZIServer (the Zone Integration Server, Envoy (the Multi-Zone manager) and Veracity (the Data Quality manager). All three products are designed to work on a Microsoft platform and can be installed in a small location using a freely downloadable SQL Server Express edition or can be spanned across multiple large servers forming load balanced clusters and using failover clustered databases and all the things you would expect in a large city implementation.
Each of the Visual Software products has components of our PrivacyPlus technology, developed to meet or exceed the requirements of the UK's data privacy requirements. We've added it in more than one place to make administration more reasonable. For example, putting privacy protection only in the ZIServer product would provide the protection required by the DPA and recommended by Becta, but since there are so many individual endpoints to manage there, it would become very difficult to manage if the ZIS were hosted. To make the management more reasonable to manage and to provide a way for administrators to provide a fail-safe "blanket-filter", we also added it to Envoy. This is added protection, allowing for a regional administrator to set default policies for all schools in the region as a starting place, then allowing the ZIS administrator to "take it from there". Of course, all of these are policy settings and are at the choice of those who set up the software.
This section outlines a recommendation for the implementation of the foundational SIF layer (the part that gets things going). Implementation of these parts successfully will ensure a much higher likelihood of success as other applications are added later on.
Compare this to a typical US installation:
Although the second approach sounds much simpler on the surface, it has a few problems:
This approach is like debugging an application only by looking at the output "“ you never quite know where a problem originated if one exists. You can take guesses by going backwards through the process, but in this case every time you move backwards it might involve a different supplier. Consider the scenario from before:
"¦where each of these pieces was obtained from a different supplier. If the error was discovered in the VLE, the supplier of the VLE SIF agent could be the source of the problem. Or, the problem could be in the ZIS, or in the MIS SIF agent, or in the MIS. By using the methodology we suggest, we are using methods whereby errors can be caught at each point in the process as soon as they happen, avoiding the "snowball at the bottom of the hill" syndrome.
All in all, we believe that addressing data quality and privacy issues before any data is distributed to other applications automatically will save time and avoid data security problems in the long run, thus becoming the far lower cost solution after everything is taken into consideration.