Data management for prospective research studies using SAS® software

Maintaining data quality and integrity is important for research studies involving prospective data collection. Data must be entered, erroneous or missing data must be identified and corrected if possible, and an audit trail created.

Using as an example a large prospective study, the Missouri Lower Respiratory Infection (LRI) Project, we present an approach to data management predominantly using SAS software. The Missouri LRI Project was a prospective cohort study of nursing home residents who developed an LRI. Subjects were enrolled, data collected, and follow-ups occurred for over three years. Data were collected on twenty different forms. Forms were inspected visually and sent off-site for data entry. SAS software was used to read the entered data files, check for potential errors, apply corrections to data sets, and combine batches into analytic data sets. The data management procedures are described.

Study data collection resulted in over 20,000 completed forms. Data management was successful, resulting in clean, internally consistent data sets for analysis. The amount of time required for data management was substantially underestimated.

Data management for prospective studies should be planned well in advance of data collection. An ongoing process with data entered and checked as they become available allows timely recovery of errors and missing data.

Article originally published in ‘Welcome to BMC Medical Research Methodology’, 8 (2008), n. 61. – ©2008 Kruse and Mehr. Open Access article distributed under the terms of the Creative Commons Attribution License (//

View full Article in PDF

Series Navigation<< The UDHR Right to Education: how distance education helps to achieve thisBridging the gap between academia and industry through user-centred training >>