State Reporting Proposal
Real-time SIF-Based State Level Data Collection Tool Set
Please do not distribute a link to this article to anyone without express permission from Visual Software, Inc.
Currently, school districts spend an average of 10% of their administrative staff costs collecting, preparing, delivering, and managing student and staff information to meet the requirements of state and federal authorities. The most common (if not ubiquitous) technique used by the states for collecting this information is:
- The district sends collected data to a centrally location at the state;
- The state validates the data and tells the district where errors exist, giving the district a certain amount of time to correct these errors;
- The district makes corrections and repeats the submission cycle until all errors have been corrected.
The problem with this method: the errors that caused the rejection by the state are still in the source systems and will therefore reoccur and require correction after every collection, rendering this technique inefficient and costly. at both the district and state levels.
Our tool set is based on a model we are implementing in the UK that changes the submission cycle from a “batch” process to a “real time” process, allowing districts to correct errors as they occur, so that when the submission is due, the data is released error-free. Furthermore, due to its SIF interface, this process occurs automatically.
There is significant demand for products that collect information from school districts at the state level as evidenced by the NCES SLDS Grant program. As part of our normal business activities in the US, we work with districts helping them connect to these systems and submit their data.
In the UK, however, we are working with a large customer solving the same challenge, with the data being sent from the schools to the regional level in real time while continuously performing quality checks. This process increases efficiency and accuracy while significantly reducing costs for both the region and the district.
All school districts in the US are required to maintain detailed and accurate student and staff information and periodically report a portion of this information to the state, which in turn summarizes, analyzes, and reports the information to the US Department of Education. The exact information and the frequency of reporting differs from state to state, but the requirement presents a significant challenge to districts especially in light of tightening budgets and overworked staff.
With the introduction of the Schools Interoperability Framework (SIF) standard into this data collection process, some of this workload has been reduced, but we believe that by using SIF with an improved design and by implementing some of its newer features, much more of the pressure can be alleviated and much of the cost can be reduced, both at the state and district levels.
In this record collection process there are two significant high cost sources that can be addressed through a properly packaged solution:
- correcting errors more than once
- requiring districts to make significant local upgrades when they are not necessary
Correcting errors more than once
The first item is a by-product of the design of many of the current SIF-based collection systems. These systems simply replace an existing ETL (Extract-Transform-Load or CSV upload) process with a SIF connector (for example, see: The WISE State Report Manager[[]]). These solutions include a data coordinator user role whose job is, on the time reporting date, to take a sample of the SIS data, submit it to the state, then identify and correct the errors., The problem is that the errors are only corrected in the copy of the data by the data coordinator and not in the original system by the source system operator.
Our tool set removes this role entirely by moving the error correction process back to the person who made the error, and having it corrected shortly after it is made. By doing this, the person who has the best information about the object (student, teacher, course, bus route, etc.) makes the correction. The correct information is then distributed to other departments in the district through the SIF interface.
Several SLDS-related RFPs we have seen compel districts to perform SIS or other upgrades to ensure that they all use the same system, allowing the state to support a small number of adapters or SIF agents.
Some other architectures require that all districts use a common Zone Integration Server or non-standard agent behavior in order to work properly.
Both of these options can be very expensive for the state and the districts. The best alternative is to let the districts use what is best for their needs (typically what they have already) and provide the state with something that will work within that framework, making very few, or in most cases, no demands for new district-level purchases.
The tool, its use, and the intended outcomes
The tool set we intend to package during Phases I and II is a real-time data collection system for student and staff information based on a system we are presently implementing in the United Kingdom for the East of England Broadband Network (E2BN) for identity and role management and vertical reporting.
Real-time Data Collection
This system will provide school district source system users with continuous feedback showing them how close their operational data is to the state-defined quality standards, so that when the time comes for a state data collection the data from their SIS can be sampled and submitted without any further adjustments.
Currently implemented state data collection systems, even those implemented using the Schools Interoperability Framework (SIF), do not provide such a continuous feedback cycle. This results in:
- Errors not being corrected in the source systems—the people who correct the errors in the submission sample do not have time or permission to edit the SIS records to correct the errors there. Additionally, so much time would have elapsed between the submission cutoff date (when the sample was taken) and when the corrections would be made, nobody would be sure if the change is still valid.
- Errors being corrected in the sample taken on the submission date and not in the source system, because the error feedback was only sent back for the sample and not for the live data.
- Districts wasting large amounts of effort and money needlessly re-correcting the same errors every time a new submission is due.
Working with In-place District Software
The solution we are proposing is designed to work with whatever SIS software is currently in place with very few (if any) exceptions.
After working with dozens of vendors over many years, we have found that a reasonable expectation is that they can all create Comma Separated Value (CSV) export files for school, student and teacher information on an automated basis and place that information in a designated directory (they need to do this to meet current state reporting requirements).
Our tool set will include options for districts that have:
- Complete SIF implementations – the state data collectorwill become an active member in the district’s zone.
- A SIS that has a SIF agent in a district has no SIF implementation' – the state will host a SIF zone for that district and the SIS will connect to it.
- A SIS that can only produce extract files – the state will provide a “CSV SIF agent” (one of our components) that will provide SIF agent services for those types of applications
When one organization requires another to install or use some new software it is better if the new user feels that it is worth the effort to learn a new system and that the newer system is an improvement over the old.
Giving regular feedback to source application users about errors that need to be corrected for state reporting would be a marked improvement. Understanding that this toolset would also reduce or eliminate the barrage of phone calls and emails about errors found that usually accompany a submittal should substantially increase “buy-in” potential. We have seen first-hand reactions of many users when they first saw this capability and overwhelmingly their reaction has been very positive.
One of the issues some have with a real-time data collection model is a concern for student privacy. This was one of the early-stage obstacles we faced while first implementing our project in the UK. The concern is that the state will gain access to private information while in the process of gathering real-time information and monitoring the quality of the data. SIF, in its original design allowed a subscriber (such as a state) to receive all information in an object if it received any of the information in that object, including personal information which is protected by law.
To address this concern and to make real-time collection possible, Visual Software introduced the concept of “element-level filtering” to SIF in 2008 and led the effort to have it adopted as part of the standard in 2009. Since then, element-level filtering has been adopted by all of the major SIF ZIS providers and is universally available today.
Element-level filtering allows the state to become an active member in a district’s zone, but only able to receive information for the particular data elements that it needs to see / is allowed to see by law.
Through the work we have done with customers such as London (2,700 schools), the East of England (3,200 schools), the East Midlands (3,000 + schools) in the UK and Victoria, AU (1,600 schools, all in production), we have found that many of the facilities that are commonly installed individually at school districts in the US can easily be shared at state-wide or regional hosting centers at a much lower average cost, while still providing the same or better level of service.
Description of the Tool Set — Architecture Overview
The solution we are proposing is composed of a combination of configurable products currently made and sold by Visual Software, but at present not being used for state vertical reporting in the United States. These products include :
- ZIServer™: enterprise level SIF Zone Integration Server, capable of hosting thousands of schools in a single implementation (will be used as the central message handling hub);
- Veracity™: SIF-enabled data collection and data quality management tool (will be used for collecting and quality checking the data and for providing a user interface to the districts);
- Ipseity™: SIF-enabled identity and role management system (will be used for managing accounts for all district staff members that need access to any of the hosted products, so that single sign-on access can be provided and accounts can be maintained through the submitted data);
- Envoy™: SIF-enabled Virtual Zone Management system (will be used for districts with no SIF implementation, so that school information can be consolidated at the district level in real time before it is transmitted to the state);
- Mimic™: SIF agent for applications that only have an ability to produce CSV files (will be used in those districts where the implemented SIS does not have a SIF agent available).
Each of these products is already developed, tested, and in production; some of them, however, are only offered for non-US versions of the SIF specification. The diagram to the right illustrates how the products would work together in a state deployment.
This diagram illustrates a part of a deployment with three districts (on the left) and a state collection center (on the right). The software that would be part of our solution has a highlighted background and the applications that would be hosted at the state are marked as "Hosted".
School District Type A
The first school district (the one pictured with "SIS A") has a SIF implementation and a locally installed SIF Zone Integration Server (ZIS). Presently this corresponds to more than 10 percent of the school districts in the US with very rapid growth. In this scenario, the data collector / quality monitor would become a subscriber in the school's SIF zone.
Veracity would subscribe to the SIF objects that have information required by the state and the district would be instructed how to implement its element-level filters for the connection (all the major ZIS vendors have already implemented element-level filters). Veracity could have tests to make sure that the element level filters were properly set up.
School District Type B
The second school district (the one pictured with "SIS B") does not have a SIF implementation but has a Student Information System that has a SIF agent available for it (presently this describes about 50–60% of the school districts in the country). In this scenario, the state would host a ZIS, create a zone for that school district and the district's SIF agent would register in that zone as a publisher.
In this scenario, the designated site administrator would be allowed to administer not only the Veracity access, but also the ZIServer administration access for the district's zone as if it was a Zone Integration Server located at their district office. (ZIServer is also programmed with single sign-on features so that individual zones can be administered remotely).
School District Type C
The third school district (the one represented with "SIS C") represents those that have a custom-made or Student Information Systems, but that do not have or never will have a SIF agent. We have found that such systems can already create CSV-file export files because of previous state reporting requirements.
Although, the student information is combined at the district level, in a few of these districts, most of these places will also have the added complication of having separate student databases at each school. This is why we recommended Envoy for these school district. Envoy is a SIF-enabled product that manages the consolidation of student, staff and parent records for multiple schools in a larger organization.
Note: Envoy is one of the applications that is presently only available in non-US editions and would need to be converted for the US as part of this project.
There are two audiences for our tool-set: one at the district and the SEA.
Most of the people who use this tool set at the district level would be the users of source systems that generate the data for state reports. Additionally, users would include IT staff who control role assignments and data management staff who would oversee the administration of the data collection process.
When compared to the existing methods of data collection, more people will come in contact with the vertical reporting software, but each of them will only have a small fraction of the work to perform.
State or Regional Audience
At the location where reporting data is gathered from the districts, the users of this tool set would be those who manage the data and feed it into the region's or state's data warehouse.
State-level users will participate in developing business rules for checking data. In previous data collection projects we have done using this, we have found that once users understand the capabilities of the software, the number of checks tends to grow with each new data submission cycle.
The District Daily User Experience
(note: you may click on any image to see information about the image, then again to see the image full-screen)
Application System Users
From the district user's perspective, he or she would have daily access to a portal site (hosted by the state) that shows how far away his or her district's data is from being clean enough to submit directly to the state for the next submission. The example to the right is taken from our UK edition of Veracity with some basic data rules we set up that correspond to their student Every Child Remembered census (the equivalent of a No Child Left Behinddata collection).
The data is automatically fed from the Student Information System and other connected systems (such as HR or Café systems) through the SIF connectors to the SIF real-time data collector. As the data arrives, Veracity applies data quality rules against the incoming data and maintains a separate database of current exceptions.
When the user logs into the district portal, he or she sees any outstanding errors and drill down to deeper levels of detail to see enough information to correct the error in the source system. When the error is corrected, the correction will flow the SIF connection through to Veracity (and all other connected applications). The error will then drop off the error list a few seconds later.
Districts without SIF-enabled Student Information Systems
For places where the Student Information Systems do not have SIF interfaces we developed a product called Mimic that works with the interface that most of these systems do have — the ability to create CSV on an automated basis. Mimic can be set up in a few hours using a wizard-like interface and runs on a minimally-configured PC. It assumes that the SIS will generate on a periodic basis:
- a list of new and updated records only (only adds and changes will be published), or
- a complete dump of the object (adds, changes and deletes will be published).
At a designated interval, Mimic will regularly inspect the CSV file directory; when it sees a new file; it will compare it against the last one it saw there, then produce SIF events corresponding to the differences between the two. It will also respond to requests from the SIF zone.
Single sign-on and Data Scoping
When a district user logs in, all Visual Software tools will use the information received about an employee through the SIF interface to automatically scope the data he or she is allowed to access. All users are automatically scoped based on district; as part of Phase II, we will be adding another layer of scoping which will limit a user’s access by function as well.
Our current Veracity product has the functionality to alert users to the presence of outstanding errors. As part of Phase II, we are considering the addition of an RSS feed as well (to be included with the user scoping feature). This will allow school- or district-based staff to set up an alert feed so that they will be continually reminded whenever there are any unresolved issues with student data. He or she will need to login into the portal site to find out the particular student details, but the RSS feed will keep him or her informed as to the nature and quantity of the errors.
The District Collection Experience
On certain cutoff dates, data samples are taken from the district and are prepared to be given to the state (or regional collection authority) (the SEA). The SEA expects the district's data to be "as of" that date, so even if the district is using SIF as a collection method, some sample will need to be taken on that date. Typically, the SEA will give the district a period of time to resolve any issues with the data before the data is finally due.
Our architecture changes the cycle of activity as follows:
When the reporting cutoff date is reached, the district user will see a new tab appearing on his or her screen. This is because at midnight that evening a snapshot of the data that was sitting in the district's Veracity database was taken and moved to a "reporting collection area". In this example, the user interface in the screen shot illustrates a user's view (in the UK) just after a December 2010 snapshot.
If you look closely at the screen shot on the next page, you will see that the "Dec 2010" tab has been selected and that the "Summary — Distribution" option has been selected from the menu at the left of the page. Doing this causes Veracity to show the distribution of errors that still remain in the dataset as of the data collection date (in a live district, this should be much closer to zero — this example uses test data for illustrative purposes). If the "Summary — Distribution" option were to have been with any of the other tabs, this graph would have shown a distribution of errors for that time period or the current (live) data.
If any errors remain at this point, the user can edit the sample that was taken and those changes will be made independently of the source systems. This is not optimal, but it is required because the "as of" date of the sample must be maintained. The user may get to the errors either by drilling down through the list of errors or by using Veracity's "search" interface to enter any of the characteristics of the object (learner last name, for example) to find the record, then edit its contents through a web form interface. All interactions are recorded so that the state and district have a complete record of all changes that were made to records after the snapshot was taken.
The following is an example of the search/edit interface:
The State (SEA) User Experience
At the SEA, the web interface is similar to that which the user sees, except that the data now spans multiple districts and the data quality rules also span districts. The precise state user experience will vary, depending on state law and if the state will be allowed to see the districts’ live data before the reporting cut-off date.
Logically (and physically), the SEA maintains a separate instance of Veracity connected to a statewide-SEA SIF zone (physically hosted on the same ZIS server but logically separate). Each school district hosted Veracity instance relay events they receive into this zone, but the objects are filtered so that only the fields that the SEA has permission to see are relayed.
The state piece of the architecture would look like the picture to the right.
Each district is represented by a zone (either at their district on their ZIS or on the shared ZIS at the SEA). The "District Veracity" copy contains the complete details of their student information and only school district personnel have access to view this information. When messages are sent to the "District Veracity" instances, filtered versions of them are relayed to the "Combined Veracity" copy.
Business Rules that Span District Boundaries
This SEA Veracity copy has a separate set of business rules that can check for things such as:
- A student that has more than one school designated as his or her "main school".
- A student is marked as "graduated from high school" without required tests being taken (this information may come from records in different districts).
When Data is Available in Real-Time
If state law allows filtered live data to be made available to the state, several new possibilities could also arise:
- State Student Identifiers could be assigned when they are needed without human intervention.
- Student multiple enrollments could be checked in real time, alerting district staff (in both schools) immediately when children move out of the district without notification.
All activity, either at the LEA or SEA level is logged by default. This includes all XML message traffic as well as all human interactions. We recognize that this generates a very large volume of information and that some of these audits are redundant. In the UK, each of these levels of auditing is required by law; in the US they are not, so we make the level of auditing a configuration option.
Over the past several years, Visual Software has developed and refined several systems for providing high quality online support for our customers around the world, including many with whom we share few business hours in common (the United Kingdom, Egypt and Australia). These include :
- An online help desk system where they can report problems and track the progress of any open tickets;
- A wiki site with hundreds of detailed tutorials illustrating how to do the things our customers need to do on a daily basis;
- A software download portal where customers can go to get the latest releases and detailed notes about what has been changed from release to release;
- A documentation site that helps customers in planning implementations and gives them insight into new products we're developing.
Creating these facilities to be as good as they are (and as attractive as they are to our users) has become a key part of Visual Software’s “scalability” strategy. The more that our customers are drawn to this site when they need help (and find what they need there), the more of them we can support with a single internal staff member.
Districts and SEAs that use this tool set should expect to see a marked improvement in data accuracy with some improvements in efficiency in their first reporting cycle. By the second reporting cycle, significant improvements in efficiency should be noticed.
During the first reporting cycle, when an error occurs, it will be brought to the attention of the person operating the source application system shortly after the original data was entered (not weeks later). The correct information will be easier to locate and the error will be easier to correct. No call will need to be made (as when a data coordinator calls the source application operator) to find out the correct information and the correction will be made using the SIS interface and not by editing a CSV file in a text editor.
During the second reporting cycle, because these recurring errors were permanently fixed in the source system, only new errors that occur during the reporting cycle will need to be corrected.
Types of Errors
Format errors: These are the toughest type of error to handle when working with CSV files — they typically affect an entire column of data and cause hundreds or thousands of errors in the dataset. They are usually the result of an error in the software that generates the export file and not of data entry, although sometimes it is hard to tell the difference. With a CSV interface, typically the entire batch is discarded if any errors of this type occur and it typically requires substantial investigation by someone to find out what the problem is. With SIF/XML, the exact type of error is usually easily identified.
Validation errors: It is best for SIS to include validation checks for as many of these fields as possible directly in their applications so that the errors can be caught at the time of entry, but many do not for one reason or another. Sometimes, the data collected by an SEA is unusual and not normally needed by a SIS to do its normal job (for example, the results of a blood test). In cases like this, the SIS stores these values in "user defined fields" that undergo little or no validation at the point of entry. As a result, when it is delivered to the SEA as part of a submission, that field's contents may not be very consistent.
Another example of a validation error that is difficult to catch is one where the data in one field is determined to be correct or incorrect based on a value somewhere else, perhaps in the same record but perhaps in another record for the same student. For example, seeing a student enrolled in "English as a Second Language" might not make sense if the same student's Primary Language is recorded as "English". The demographics record for that student was recorded at one time, but the enrollment in that course may have come at a much later time. Also, these might come from two different applications.
Matching and other cross-district errors: These errors span outside the boundary of the local, live data. They could involve the comparison of a student's record with those of students in other districts (duplicate enrollments) or that student's previous records from previous collections (is this record consistent with previous submissions).
NOTE: The data that is used in these comparisons will only be able to be made using the small subset of fields that are in the "filtered" set that relayed to the state copy of Veracity.
Theoretical and empirical Support
Data quality and the process from raw school data to information, three key players can be identified – Collectors, Analyzers and Users. Understandably, data collected in a consistent and timely manner is the most optimal and will provide the greatest opportunity for