National Information System and Statistical Infrastructure Development Project Management Summary


3. Functional Requirements

The limitations, deficiencies, and problems that were identified during the analysis of current activities were re-stated, in summary as a set of requirements for which solutions were needed to reach project objectives. Only those requirements that were directly related to the project were addressed, directly, in the succeeding analyses. These solutions can be generally described in terms of how the most significantly affected functions will be conducted as a result of implementing proposed solutions - that is, as new operational concepts.


3.1 Future Operational Concepts

Proposed operational concepts are discussed under a number of sub-headings in the following paragraphs.


a) Data Collection and Conversion

As is currently the case, censuses and surveys will continue to rely heavily on personal interview techniques for at least one more decade.

To support the construction or maintenance of sample frames and the selection of samples, the necessary universe files, registers, and ancillary services will be maintained on a Central Data Repository System (CDRS) which hosts the primary data holdings for SIS. That repository will continue for the foreseeable future to be the current mainframe facility. These services will be supplied over a local area network (LAN) to advanced Sampling and Estimation software which will operate at the level of individual workstations. This software will support a variety of sampling and estimation methodologies, and allow for sample maintenance after the samples are drawn. A standard service interface will be defined for the provision of these services to the distributed Sampling and Estimation function. A database management utility will be selected to provide a standard format and utility service base for all files and databases maintained in CDRS. Over time, all current and historical data files will be converted to this base.

For large scale data acquisition operations, responses will be written or marked on questionnaires specifically designed for processing with image-based optical character and mark recognition equipment (OCR/OMR technology). Questionnaires will have been designed to facilitate use by enumerators with minimal training, and to minimize response recording (clearly marked skip patterns) and recording errors (maximum use of multiple choice response alternatives). To minimize both response burden and data collection effort, wherever possible, sampling will be used to collect information about some topics formerly requested from all respondents, and scientific estimation techniques will be used to inflate the data to universe proportions.

Progress of data collection operations will be monitored through batch and questionnaire controls, provided by an integrated Data Acquisition System (DAS) which spans all aspects of the operation. Completed questionnaires will be returned to the central offices where they will be processed by operator controlled, high speed OCR/OMR readers that will convert hand written block characters and any preprinted information to computer processable records, screening for errors and filling-in items derivable from the response context. High error forms will be rejected and processed manually, i.e., edited and corrected for re-processing, or dispatched for other follow-up. The clean, first stage records will be transmitted to the CDRS for any second stage edit checks and subsequent correction and/or imputation. The end result will be the accumulation of a series of clean, finished (imputed, if necessary) micro-data files which collectively constitute the data holdings of the agency.

For small scale personal interview activities, lap top computers will be used to both present the questions to the enumerator for repetition to the respondent and to record the responses. The process will be enumerator/processor interactive, flagging possible response errors for verification in process, automatic skipping, and limited automatic coding. Enumerator work assignments (sample segments) will be down loaded to the lap tops through the control facilities of the Data Acquisition System (DAS) and the results uploaded to the DAS through a direct (or one-way switchable) data update capability provided by the DAS.

Forms used to collect statistics about civil events, such as marriages, births and deaths, where these data cannot be obtained from automated administrative systems, will be re-designed for OCR/OMR processing and will be processed like census and survey data. Wherever possible for these and data extracted from other administrative records, capabilities will be developed to:

A capability will also be available to conduct telephone interviews, assisted by computer-based questionnaires. Telephone enumerators will automatically place calls assigned to them, conduct the interviews, and record the responses, correcting any errors flagged during the interview. This capability will be provided as a part of the central DAS.

An interactive data entry capability will also be available to perform data conversion for any activities or forms that cannot be accommodated by one of the above capabilities. This capability will also be available to accomplish computer assisted classification and other specialized coding tasks related to preparation of production databases.


b) Data Management

An effective data management structure will be introduced. The data will be processed in distributed platforms which will be monitored and coordinated centrally. Client Server architecture will be adopted in all data production activities. The responsibility of all processing stages of the data will be on the subject matter divisions. These activities will be supported by the IT personnel which will be receive exclusive training in new concepts and architecture.


c) Data Analysis and Production of Tables and Reports

The CDRS will provide a number of capabilities for selecting and extracting files from the full collection of SIS's information resources and for performing a variety of transformation and imputation functions, through user controlled utility functions. It will also format the extracted files for processing on the mainframe facility using SAS or for downloading the files for PC SAS. In either case, the statistician, social scientist, economist, or other user will be able to perform an array of statistical functions and produce corresponding tables and graphs for incorporation into finished information products. PC SAS is one of a number of tools provided by the Network Support System (NSS). Others include all the tools for word processing, spread sheet analysis, database query (SQL), charting and graphing, geographic thematic mapping, questionnaire design, electronic mail, and bulletin board capabilities. As new electronic maps are created, data will be geo-coded to enable projection and analysis at multiple levels of detail. Tools will be made available locally on PCs, or indirectly to PCs via special function network servers.


d) Electronic Mapping and Geographic Information System (GIS)

A new capability will be available for accepting, creating, and updating layered geo-coordinate-based digital maps at sufficient levels of resolution to permit production of census maps. As list frames and registers are developed, they will be geo-coded to facilitate data coding to the same map reference points. This capability will provide analytical capabilities to the analysts, as well as a foundation for other information products SIS will offer to the public. Ability of furnishing interviewers with the map of the neighborhood they are responsible of will facilitate their activities.


e) Information Dissemination

A Data Dissemination System (DDS) will provide public access to all public use statistical data and other information products produced by SIS, as well as selected data from international and national statistical organizations. As new information products are developed, they will be added to the data assets designated for public use. A hierarchical, menu-based user interface will guide users through options to facilitate location of information of interest. A directory of other information assets available on-line will also be maintained, and, at some point, other host connections will be supported. In addition to file/document viewing, downloading, and other information use functions, users will request assistance in using the service and creation of special data and/or analytical studies. Where confidentiality regulations will permit, SIS may offer public access to some civil and commercial registry information. Both Internet and dedicated dial-up connection lines will provide access to DDS. A database management system coupled with access media will be the engine of DDS.

SIS will continue to offer information products in paper and other media, but these products will increasingly come to be derived from the electronic service (the DDS). In time, parallel avenues for information product development will be eliminated. A comparable sales price structure will be set for all types of products, based on marginal production, promotion, and delivery costs.


f) General and Project Management

A number of management methods and processes will be put into use in SIS, among them strategic planning and quality management, life cycle management, and project management. Most consist of principles and procedures but may be implemented using some of the infrastructure support tools, such as project planing and control software. Other tools, like electronic mail (E-mail) and electronic bulletin board systems (BBS) will be used to reinforce these methods and processes. In addition to the management information produced from the Data Acquisition System (DAS), other management information systems will be developed to cover the major production and dissemination activities.


g) Organization Development

Implementation of the improvements that have been projected will require and involve implanting a new organization culture within SIS. To be successful, these changes will require and depend on extensive training of four types:

The first type encompasses highly technical training for the information technology staff, as well as comparable training for the remainder of the professional and support staff.

The proposed plan assumes a combination of formal training in SIS and direct technical assistance from experienced consultants that will provide in-process guidance and assistance in carrying out implementation activities during the early stages of the program.


3.2 Technical Concepts

The operational descriptions imply the technical solutions, which were developed in more detail in a series of projects that provided the basis for estimation of the cost of their implementation. The major projects are identified as follows:

Project 0       Data Inventory and Information Requirements (DIIR)
Project 1       Central Data Repository System (CDRS)
Project 2       Data Acquisition System (DAS)
Project 3       Data Acquisition System (DAS) Hardware and Proprietary Software for DAS Ancillary Functions
Project 4       Application Development for OCR/OMR and CAPI Services
Project 5       Network Support System (NSS)
Project 6       Data Dissemination System (DDS)
Project 7       Sampling and Estimation Systems (SES)
Project 8       Electronic Archival System (EAS)
Project 9       Automated Cartographic and GIS Center (AC & GIS)
Project 10       Management Advisory and Training Services (MATS)

An overview of the proposed technical approach is summarized below:

The technical concept can be thought of as a layered model in which a bottom physical layer is overlaid with a support/infrastructure layer, and finally with an application layer. The physical and support layers follow the technical strategies outlined in the Information Technology Strategy Document of SIS.

The proposed foundation is a client/server environment in which all computing nodes are interconnected by a network. Processing nodes which require technical separation from the network, for security/confidentiality reasons, and connected via a secure one-way switch PCs, are on individual LAN segments, with their own LAN Servers. The proposed topology is based on Ethernet network. However, considering the existing Token Ring structure, a cost benefit analysis will be conducted to decide on whether to continue on investing in the existing structure or to replace it with new Ethernet backbone.

At the infrastructure support level, TCP/IP and IPX/SPX is the transport protocol over the network. Data management uses the Relational DBMS model and industry standard SQL. Conventional LANs assume the industry de facto standard Microsoft products throughout, including the DOS-6 and Windows NT, and the full suite of Microsoft office productivity software. SAS is assumed to be the foundation software for data analysts, and is offered on the central system (CDRS) and on individual researchers' PCs. Generalized statistical products (some that can be acquired from other national statistical organizations) provide application support and reusable software. This is particularly the case with the Data Acquisition System (DAS), where the entire environment has been generalized similar to some organizations where most of the applications have been developed in that environment with portable software. Thus, the experience of such organizations may be shared at SIS.

At the application level, all data-oriented communications are accomplished by the mechanism of transmitting SQL requests; that is, a request is a series of SQL statements, and a response is series of SQL statements and Tables. Older (legacy) systems inter-operate in this environment by the mechanism of having their final micro-data files transformed to the RDBMS format. Newer applications are developed without the need to resort to third generation compilers.