VSI — State Reporting Proposal
Please do not distribute a link to this article to anyone without express permission from Visual Software, Inc.
Currently, school districts spend an average of 10% of their administrative staff costs collecting, preparing, delivering, and managing student and staff information to meet the requirements of state and federal authorities. The most common (if not ubiquitous) technique used by the states for collecting this information is:
- The district sends collected data to a centrally location at the state;
- The state validates the data and tells the district where errors exist, giving the district a certain amount of time to correct these errors;
- The district makes corrections and repeats the submission cycle until all errors have been corrected.
The problem with this method: the errors that caused the rejection by the state are still in the source systems and will therefore reoccur and require correction after every collection, rendering this technique inefficient and costly. at both the district and state levels.
Our tool set is based on a model we are implementing in the UK that changes the submission cycle from a “batch” process to a “real time” process, allowing districts to correct errors as they occur, so that when the submission is due, the data is released error-free. Furthermore, due to its SIF interface, this process occurs automatically.
All school districts in the US are required to maintain detailed and accurate student and staff information and periodically report a portion of this information to the state, which in turn summarizes, analyzes, and reports the information to the US Department of Education. The exact information and the frequency of reporting differs from state to state, but the requirement presents a significant challenge to districts especially in light of tightening budgets and overworked staff.
With the introduction of the Schools Interoperability Framework (SIF) standard into this data collection process, some of this workload has been reduced, but we believe that by using SIF with an improved design and by implementing some of its newer features, much more of the pressure can be alleviated and much of the cost can be reduced, both at the state and district levels.
In this record collection process there are two significant, high cost sources that can be addressed through a properly packaged solution:
- correcting errors more than once
- requiring districts to make significant local upgrades when they are not necessary to acheive the desired result
Correcting errors more than once
The first item is a by-product of the design of many of the current SIF-based collection systems. These systems simply replace an existing ETL (Extract-Transform-Load or CSV upload) process with a SIF connector (for example, see: The WISE State Report Manager[[]]). These solutions include a data coordinator user role whose job is, on the time reporting date, to take a sample of the SIS data, submit it to the state, then identify and correct the errors. The problem is that the errors are typically only corrected in the copy of the data by the data coordinator and not in the original system by the source system operator.
Our tool set removes this role entirely by moving the error correction process back to the person who made the error, having it corrected shortly after it is made. By doing this, the person who has the best information about the object (student, teacher, course, bus route, etc.) makes the correction. The correct information is then distributed to other departments in the district through the SIF interface.
Several SLDS-related RFPs we have seen compel districts to perform SIS or other upgrades to ensure that they all use the same system, allowing the state to support a small number of adapters or SIF agents.
Some other architectures require that all districts use a common Zone Integration Server or usuaual or non-standard agent behavior in order to work properly.
Both of these options can be very expensive for the state and/or the districts. A much better alternative is to let the districts use what is best for their needs (typically what they have already) and provide the state with a solution that will work within that framework, making very few, or in most cases, no demands for new district-level purchases.
The tool, its use, and the intended outcomes
The tool set we intend to package during Phases I and II is a real-time data collection system for student and staff information based on a system we are presently implementing in the United Kingdom for the East of England Broadband Network (E2BN) for identity and role management and vertical reporting. A limited version of this same framework can also be used to collect the more limited set of information needed to assign and maintain state-level student identifiers.
Real-time Data Collection
This system will provide school district source system users with continuous feedback showing them how close their operational data is to the state-defined quality standards, so that when the time comes for a state data collection the data from their SIS can be sampled and submitted without any further adjustments.
Currently implemented state data collection systems, even those implemented using the Schools Interoperability Framework (SIF), do not provide such a continuous feedback cycle. This results in:
- Errors not being corrected in the source systems—the people who correct the errors in the submission sample do not have time or permission to edit the SIS records to correct the errors there. Additionally, so much time would have elapsed between the submission cutoff date (when the sample was taken) and when the corrections would be made, nobody would be sure if the change is still valid.
- Errors being corrected in the sample taken on the submission date and not in the source system, because the error feedback was only sent back for the sample and not for the live data.
- Districts wasting large amounts of effort and money needlessly re-correcting the same errors every time a new submission is due.
Using this information to maintain Student Identifiers
As with state-level records collection, a subset of this same data can be used for maintaining a single set of state-level student identifiers. Since any subset of the information can be managed in real-time (or near real-time), this information can also be combined at the state level as needed to assign and manage identifiers as required. This will be described in more detail later in this proposal.
Working with In-place District Software
The solution we are proposing is designed to work with whatever SIS software is currently in place with very few (if any) exceptions.
After working with dozens of vendors over many years, we have found that a reasonable expectation is that they can all create Comma Separated Value (CSV) export files for school, student and teacher information on an automated basis and place that information in a designated directory (they need to do this to meet current state reporting requirements).
Our tool set will include options for districts that have:
- Complete SIF implementations – the state data collectorwill become an active member in the district’s zone.
- A SIS that has a SIF agent in a district has no SIF implementation' – the state will host a SIF zone for that district and the SIS will connect to it.
- A SIS that can only produce extract files – the state will provide a “CSV SIF agent” (one of our components) that will provide SIF agent services for those types of applications
When one organization requires another to install or use some new software it is better if the new user feels that it is worth the effort to learn a new system and that the newer system is an improvement over the old.
Giving regular feedback to source application users about errors that need to be corrected for state reporting would be a marked improvement. Understanding that this toolset would also reduce or eliminate the barrage of phone calls and emails about errors found that usually accompany a submittal should substantially increase “buy-in” potential. We have seen first-hand reactions of many users when they first saw this capability and overwhelmingly their reaction has been very positive.
One of the issues some have with a real-time data collection model is a concern for student privacy. This was one of the early-stage obstacles we faced while first implementing our project in the UK. The concern is that the state will gain access to private information while in the process of gathering real-time information and monitoring the quality of the data. SIF, in its original design allowed a subscriber (such as a state) to receive all information in an object if it received any of the information in that object, including personal information which is protected by law.
To address this concern and to make real-time collection possible, Visual Software introduced the concept of “element-level filtering” to SIF in 2008 and led the effort to have it adopted as part of the standard in 2009. Since then, element-level filtering has been adopted by all of the major SIF ZIS providers and is universally available today.
Element-level filtering allows the state to become an active member in a district’s zone, but only able to receive information for the particular data elements that it needs to see / is allowed to see by law.
Through the work we have done with customers such as London (2,700 schools), the East of England (3,200 schools), the East Midlands (3,000 + schools) in the UK and Victoria, AU (1,600 schools, all in production), we have found that many of the facilities that are commonly installed individually at school districts in the US can easily be shared at state-wide or regional hosting centers at a much lower average cost, while still providing the same or better level of service.
Description of the Tool Set — Architecture Overview
The solution we are proposing is composed of a combination of configurable products currently made and sold by Visual Software, but at present not being used for state vertical reporting in the United States. These products include :
- ZIServer™: enterprise level SIF Zone Integration Server, capable of hosting thousands of schools in a single implementation (will be used as the central message handling hub);
- Veracity™: SIF-enabled data collection and data quality management tool (will be used for collecting and quality checking the data and for providing a user interface to the districts);
- Ipseity™: SIF-enabled identity and role management system (will be used for managing accounts for all district staff members that need access to any of the hosted products, so that single sign-on access can be provided and accounts can be maintained through the submitted data). This application also has a rules-based engine for comparing student, staff (and optionally contact) records and maintaining identifiers for them. It has its own language that can be used for creating these identifiers so that its users can have whatever flexibility they need in configuring ID formats.
- Envoy™: SIF-enabled Virtual Zone Management system (will be used for districts with no SIF implementation, so that school information can be consolidated at the district level in real time before it is transmitted to the state);
- Mimic™: SIF agent for applications that only have an ability to produce CSV files (will be used in those districts where the implemented SIS does not have a SIF agent available).
Each of these products is already developed, tested, and in production; some of them, however, are only offered for non-US versions of the SIF specification. The diagram to the right illustrates how the products would work together in a state deployment.
This diagram illustrates a part of a deployment with three districts (on the left) and a state collection center (on the right). The software that would be part of our solution has a highlighted background and the applications that would be hosted at the state are marked as "Hosted".
School District Type A
The first school district (the one pictured with "SIS A") has a SIF implementation and a locally installed SIF Zone Integration Server (ZIS). Presently this corresponds to more than 10 percent of the school districts in the US with very rapid growth. In this scenario, the data collector / quality monitor would become a subscriber in the school's SIF zone.
Veracity would subscribe to the SIF objects that have information required by the state and the district would be instructed how to implement its element-level filters for the connection (all the major ZIS vendors have already implemented element-level filters). Veracity could have tests to make sure that the element level filters were properly set up.
School District Type B
The second school district (the one pictured with "SIS B") does not have a SIF implementation but has a Student Information System that has a SIF agent available for it (presently this describes about 50–60% of the school districts in the country). In this scenario, the state would host a ZIS, create a zone for that school district and the district's SIF agent would register in that zone as a publisher.
In this scenario, the designated site administrator would be allowed to administer not only the Veracity access, but also the ZIServer administration access for the district's zone as if it was a Zone Integration Server located at their district office. (ZIServer is also programmed with single sign-on features so that individual zones can be administered remotely).
School District Type C
The third school district (the one represented with "SIS C") represents those that have a custom-made or Student Information Systems, but that do not have or never will have a SIF agent. We have found that such systems can already create CSV-file export files because of previous state reporting requirements.
Although, the student information is combined at the district level, in a few of these districts, most of these places will also have the added complication of having separate student databases at each school. This is why we recommended Envoy for these school district. Envoy is a SIF-enabled product that manages the consolidation of student, staff and parent records for multiple schools in a larger organization.
Note: Envoy is one of the applications that is presently only available in non-US editions and would need to be converted for the US as part of this project.
There are two audiences for our tool-set: one at the district and the SEA.
Most of the people who use this tool set at the district level would be the users of source systems that generate the data for state reports. Additionally, users would include IT staff who control role assignments and data management staff who would oversee the administration of the data collection process.
When compared to the existing methods of data collection, more people will come in contact with the vertical reporting software, but each of them will only have a small fraction of the work to perform.
State or Regional Audience
At the location where reporting data is gathered from the districts, the users of this tool set would be those who manage the data and feed it into the region's or state's data warehouse.
State-level users will participate in developing business rules for checking data. In previous data collection projects we have done using this, we have found that once users understand the capabilities of the software, the number of checks tends to grow with each new data submission cycle.
The District Daily User Experience
(note: you may click on any image to see information about the image, then again to see the image full-screen)
Application System Users
From the district user's perspective, he or she would have daily access to a portal site (hosted by the state) that shows how far away his or her district's data is from being clean enough to submit directly to the state for the next submission. The example to the right is taken from our UK edition of Veracity with some basic data rules we set up that correspond to their student Every Child Remembered census (the equivalent of a No Child Left Behinddata collection).
The data is automatically fed from the Student Information System and other connected systems (such as HR or Café systems) through the SIF connectors to the SIF real-time data collector. As the data arrives, Veracity applies data quality rules against the incoming data and maintains a separate database of current exceptions.
When the user logs into the district portal, he or she sees any outstanding errors and drill down to deeper levels of detail to see enough information to correct the error in the source system. When the error is corrected, the correction will flow the SIF connection through to Veracity (and all other connected applications). The error will then drop off the error list a few seconds later.
Districts without SIF-enabled Student Information Systems
For places where the Student Information Systems do not have SIF interfaces we developed a product called Mimic that works with the interface that most of these systems do have — the ability to create CSV on an automated basis. Mimic can be set up in a few hours using a wizard-like interface and runs on a minimally-configured PC. It assumes that the SIS will generate on a periodic basis:
- a list of new and updated records only (only adds and changes will be published), or
- a complete dump of the object (adds, changes and deletes will be published).
At a designated interval, Mimic will regularly inspect the CSV file directory; when it sees a new file; it will compare it against the last one it saw there, then produce SIF events corresponding to the differences between the two. It will also respond to requests from the SIF zone.
Single sign-on and Data Scoping
When a district user logs in, all Visual Software tools will use the information received about an employee through the SIF interface to automatically scope the data he or she is allowed to access. All users are automatically scoped based on district; as part of Phase II, we will be adding another layer of scoping which will limit a user’s access by function as well.
Our current Veracity product has the functionality to alert users to the presence of outstanding errors. As part of Phase II, we are considering the addition of an RSS feed as well (to be included with the user scoping feature). This will allow school- or district-based staff to set up an alert feed so that they will be continually reminded whenever there are any unresolved issues with student data. He or she will need to login into the portal site to find out the particular student details, but the RSS feed will keep him or her informed as to the nature and quantity of the errors.
The District Collection Experience
On certain cutoff dates, data samples are taken from the district and are prepared to be given to the state (or regional collection authority) (the SEA). The SEA expects the district's data to be "as of" that date, so even if the district is using SIF as a collection method, some sample will need to be taken on that date. Typically, the SEA will give the district a period of time to resolve any issues with the data before the data is finally due.
Our architecture changes the cycle of activity as follows:
When the reporting cutoff date is reached, the district user will see a new tab appearing on his or her screen. This is because at midnight that evening a snapshot of the data that was sitting in the district's Veracity database was taken and moved to a "reporting collection area". In this example, the user interface in the screen shot illustrates a user's view (in the UK) just after a December 2010 snapshot.
If you look closely at the screen shot on the next page, you will see that the "Dec 2010" tab has been selected and that the "Summary — Distribution" option has been selected from the menu at the left of the page. Doing this causes Veracity to show the distribution of errors that still remain in the dataset as of the data collection date (in a live district, this should be much closer to zero — this example uses test data for illustrative purposes). If the "Summary — Distribution" option were to have been with any of the other tabs, this graph would have shown a distribution of errors for that time period or the current (live) data.
If any errors remain at this point, the user can edit the sample that was taken and those changes will be made independently of the source systems. This is not optimal, but it is required because the "as of" date of the sample must be maintained. The user may get to the errors either by drilling down through the list of errors or by using Veracity's "search" interface to enter any of the characteristics of the object (learner last name, for example) to find the record, then edit its contents through a web form interface. All interactions are recorded so that the state and district have a complete record of all changes that were made to records after the snapshot was taken.
The following is an example of the search/edit interface:
Creating and maintaining student ID numbers uses the same set of facilities as Ipseity uses for maintaining large communities of Active Directory or other directory account users. Once it can assume that the data has been made relatively free of errors through a process like the one described, it can use sets of business rules to match student (and/or teacher, or parent) records and maintain an independent set of IDs for them. There can be as many matching rules per object as needed and theoretically matching can be done on any SIF object or grouping of objects.
The IDs themselves could be as simple as a semi-random number (most likely for student ID numbers) or could be composed from pieces of the original data. For example, an ID could be formed as:
- first try the first letter of the first name and 17 letters from the last name and if that has already been taken, then
- try the first letter first letter of the first name and the first letter of the middle name and 16 letters from the last name and if that has already been taken, then
- try the first letter first 2 letters of the first name and 16 letters from the last name and if that has already been taken, then
- try the first letter first 2 letters of the first name and the first letter of the middle name and 15 letters from the last name and if that has already been taken, then
- go back and add numbers to the first option to make it unique.
The State (SEA) User Experience
At the SEA, the web interface is similar to that which the user sees, except that the data now spans multiple districts and the data quality rules also span districts. The precise state user experience will vary, depending on state law and if the state will be allowed to see the districts’ live data before the reporting cut-off date.
Privacy Issues and Data Filtering
Logically (and physically), the SEA maintains a separate instance of Veracity connected to a statewide-SEA SIF zone (physically hosted on the same ZIS server but logically separate). Each school district hosted Veracity instance relay events they receive into this zone, but the objects are filtered so that only the fields that the SEA has permission to see are relayed.
The state piece of the architecture would look like the picture to the right.
Each district is represented by a zone (either at their district on their ZIS or on the shared ZIS at the SEA). The "District Veracity" copy contains the complete details of their student information and only school district personnel have access to view this information. When messages are sent to the "District Veracity" instances, filtered versions of them are relayed to the "Combined Veracity" copy.
Business Rules that Span District Boundaries
This SEA Veracity copy has a separate set of business rules that can check for things such as:
- A student that has more than one school designated as his or her "main school".
- A student is marked as "graduated from high school" without required tests being taken (this information may come from records in different districts).
When Data is Available in Real-Time
If state law allows filtered live data to be made available to the state, several new possibilities could also arise:
- State Student Identifiers could be assigned when they are needed without human intervention.
- Student multiple enrollments could be checked in real time, alerting district staff (in both schools) immediately when children move out of the district without notification.
All activity, either at the LEA or SEA level is logged by default. This includes all XML message traffic as well as all human interactions. We recognize that this generates a very large volume of information and that some of these audits are redundant. In the UK, each of these levels of auditing is required by law; in the US they are not, so we make the level of auditing a configuration option.
Districts and SEAs that use this tool set should expect to see a marked improvement in data accuracy with some improvements in efficiency in their first reporting cycle. By the second reporting cycle, significant improvements in efficiency should be noticed.
During the first reporting cycle, when an error occurs, it will be brought to the attention of the person operating the source application system shortly after the original data was entered (not weeks later). The correct information will be easier to locate and the error will be easier to correct. No call will need to be made (as when a data coordinator calls the source application operator) to find out the correct information and the correction will be made using the SIS interface and not by editing a CSV file in a text editor.
During the second reporting cycle, because these recurring errors were permanently fixed in the source system, only new errors that occur during the reporting cycle will need to be corrected.
Types of Errors
Format errors: These are the toughest type of error to handle when working with CSV files — they typically affect an entire column of data and cause hundreds or thousands of errors in the dataset. They are usually the result of an error in the software that generates the export file and not of data entry, although sometimes it is hard to tell the difference. With a CSV interface, typically the entire batch is discarded if any errors of this type occur and it typically requires substantial investigation by someone to find out what the problem is. With SIF/XML, the exact type of error is usually easily identified.
Validation errors: It is best for SIS to include validation checks for as many of these fields as possible directly in their applications so that the errors can be caught at the time of entry, but many do not for one reason or another. Sometimes, the data collected by an SEA is unusual and not normally needed by a SIS to do its normal job (for example, the results of a blood test). In cases like this, the SIS stores these values in "user defined fields" that undergo little or no validation at the point of entry. As a result, when it is delivered to the SEA as part of a submission, that field's contents may not be very consistent.
Another example of a validation error that is difficult to catch is one where the data in one field is determined to be correct or incorrect based on a value somewhere else, perhaps in the same record but perhaps in another record for the same student. For example, seeing a student enrolled in "English as a Second Language" might not make sense if the same student's Primary Language is recorded as "English". The demographics record for that student was recorded at one time, but the enrollment in that course may have come at a much later time. Also, these might come from two different applications.
Matching and other cross-district errors: These errors span outside the boundary of the local, live data. They could involve the comparison of a student's record with those of students in other districts (duplicate enrollments) or that student's previous records from previous collections (is this record consistent with previous submissions).
NOTE: The data that is used in these comparisons will only be able to be made using the small subset of fields that are in the "filtered" set that relayed to the state copy of Veracity.
Theoretical and empirical Support
Data quality and the process from raw school data to information, three key players can be identified – Collectors, Analyzers and Users. Understandably, data collected in a consistent and timely manner is the most optimal and will provide the greatest opportunity for both complete and accurate data that reduces the amount of manipulation and cleansing required later in the process.
The typical approaches to improving the performance of data quality therefore center on standardizing the data collection process and improving the training of stewards of data quality. However, while best practice protocols may exist to instruct data stewards, evidence indicates that organizations still have difficulty convincing them to comply with data collection procedures. Examples like data security breaches and information being omitted continue to haunt those seeking to be data informed near real-time decision-makers. Consequently while standardized forms, checklists, and procedures tied to training and even threats may be useful tools to encourage short term compliance, efforts are unlikely to be sustained in the long term.
Data will be more accurate if it is corrected:
- …by the person who has the best information (the person who entered it into the source system, not the person preparing the data for submission to the SEA);
- …closer to when it was originally entered and not in a rush to get it out the door before the cutoff deadline once when the error is created.
Having a separate data coordinator (typically someone in the IT department) who is unattached from the data on a daily basis correcting errors will be less reliable than having them fixed by the people who know the data best and are closest to the correct information. In most cases, the data coordinator spends most of his or her time emailing or calling the people working with the source data to find the correct information.
Lastly, we have also found that when people entering the source data realize that their input is going to be checked and those errors are going to be flagged and sent back to them for correction, the level of attention to detail increases. Perhaps they didn't realize before the consequences of leaving something out or entering something incorrectly and but also, they want to avoid the embarrassment or annoyance of handling the same record two or three times.
Similar technologies or typical practices, and potential commercial application
Similar Tools from Others
Current SIF vertical reporting implementations are based on the work of the SIF Vertical Reporting Task Force.
In the US SIF specification (section 6.17), there are four objects defined for Vertical Reporting batch-mode data collection. In our tool set, we will still be staying within the limits of the SIF specification, but instead using a simpler design that uses what most Student Information Systems already implement combined with the new Element-Level Filtering feature (that was recently added and that all the major ZIS already have).
Most (if not all) of the SIS on the market have not directly implemented the SIF Vertical Reporting objects; instead they rely on having an independent piece of software available (a data collector) that implements these objects.
Some of the implementations allow the district users to request that the data be validated by the state before the reporting cutoff date, but the process still remains an “on-demand” batch process and the process is still performed by someone other than the user of the source data system.
Model A (Data sampled on demand using SIF)
In this scenario, there is a "data coordinator" who, when the state data extract is due, logs into a state-hosted portal and directs it to use the SIF agent to take a sample from the school district's data.
From the portal screen, it will show the errors in the data and the data coordinator will begin the process of correcting the errors. This process will typically involve calling, faxing or emailing those in the schools who have the source information. Sometimes this information originated in the Student information System; other times, it came from another system such as a special education, transportation or food services application.
From speaking to several of the people in this position, this typically involves many people and consumes a significant amount of time when all time is considered. Furthermore, considering the way in which these errors are handled, most of these errors never get corrected in the source systems because there is always such a rush to get to all the correct information that there is never the luxury to stop and correct the errors. So, they remain to be corrected again during the next cycle.
In this scenario, special changes need to be made to the SIF agent so that it will work (it needs to be modified to publish the same data to two places at the same time; this is allowable but not normal SIF agent behavior).
Here, data updates the state's repository continuously (as events happen), so the data coordinator may ask for a report at any time (even before the report is due).
This is an improvement on Model A in that reports can be taken at any time, but problems still exist:
- The state needs to tell all the SIS vendors who operate in the state that they need to make changes to their application if they want to sell their software in the state. We feel that this is unacceptable. It is acceptable for them to store information needed for the state's reporting requirements — all states do that, but to require that they deviate from normal SIF agent behavior, especially when it is not necessary, is not right.
- The errors are still being returned to a single person — the data coordinator and not to the person who entered the data. The data coordinator, when he or she receives this error list will need to split it up, distribute it and then follow up to see that the errors were corrected.
- Some of these errors may involve information that is out of the privacy scope of the data coordinator. In some states, data may be collected about health-related information which may generate errors. It would be far better to have that error go directly back to the person who generated it (and had the privacy rights to see the source data) than to have a data coordinator get access to that confidential information.
Similar Tools by Visual Software
As mentioned earlier, Visual Software supports a similar tool set in the UK and in Australia, using the versions of the SIF specifications and privacy laws of those countries. We are presently implementing a similar solution on a gradual roll-out basis for the East of England (3,200 schools) that includes including Identity Management (unique student identifier) and Census Reporting (Every Child Remembered).
In the scenario we propose, data automatically updates the state repository when changes are made in any of the source systems.
If any of the new data or any of the changes causes a violation to the validation rules at the state level, errors will be sent directly back to the application staff where the data was entered and not to a data coordinator.
NOTE: In 2006–2008, an earlier version of our SIF-en