Paradigm of Prediction: Predictive Analytics to Prevent Congestive Heart Failure


Predictive modeling theory and practice has demonstrated success in electronic business intelligence and is now moving from theory into practice in health care. Congestive heart failure (CHF) was selected for this project because of its predictability and preventative nature. The purpose was to develop a predictive model for information systems to identify patients at increased risk for CHF and to alert medical professionals to initiate early preventive measures. A simple model was developed, assuming that a combination of diagnoses was prevalent as risk markers for CHF and that onset dates would be used to develop a predictive timeframe. Diagnosis, diagnosis dates, problems, and onset dates for 272 patients with CHF were found to have inconsistencies in the data fields. Results suggested that the selected diagnoses in a combined state were not as prevalent as was the independent presence of the diagnoses, thus indicating revisions to the model.

Paradigm of Prediction: Predictive Analytics to Prevent Congestive Heart Failure

Nature of Project

According to McGonigle and Mastrian (2009), the paradigm of the health care system in the United States is shifting to electronic information systems to manage and provide patient care. An efficient information system is one that reliably gathers, processes, and organizes patient data to construct a comprehensive medical history. Data are stored in a database and multiple databases are maintained in a data warehouse as a repository of retrievable information (Dennis, Wixom, & Roth, 2009). A proficient system creates a robust infrastructure of compiled data that can be analyzed for research (Kahn, Batson, & Schilling, 2012). According to the World Health Organization (2005), an efficient information system is a vital investment to track, trend, and plan patient care to prevent chronic disease. Therefore, experts in the field of information technology are pioneering the use of analytic software that can predict a patient's risk for chronic disease based on computer-generated statistical analysis of patient data (Predixon, 2013).

With its combined analysis and trending techniques, predictive, analytic, intelligent software is useful in health care technology for discovering patterns in large amounts of data (Eckerson, 2006). The Centers for Disease Control and Prevention (CDC) welcome strategies for detecting and preventing chronic disease and recognize that improving cardiovascular health is a great challenge (United States Department of Health and Human Services [DHHS], 2011).

The goal of this project is to offer insight into the use of a predictive analytic model that could identify patients at risk for congestive heart failure (CHF). The project showed how extracted and analyzed data could provide evidence of combined comorbid diseases being present among patients with CHF. But in order for an analysis to be predictive of chronic disease, an information system must contain correlational evidence based data, such as onset dates. This project also sought to make recommendations for strengthening the infrastructures of existing information systems to prepare for the use of predictive analytics to prevent chronic disease.

Problem Statement

According to DHHS (2011) there is a heavy emphasis for prevention and detection of CHF because it is recognized as the leading cause of readmissions in health care. DHHS has also reported that there are more than 5 million Americans diagnosed with CHF, 1.4 million are under the age of 60, and over 500,000 new cases are being confirmed each year, at a cost of over $300 billion dollars. Overall, CHF is presenting a grave concern for our nation’s health and economy (Emory Healthcare, 2012). The National Conference of State Legislatures (2012) reported that resolving chronic health problems, such as heart disease, are preventable and could save the nation over $5 billion dollars annually in health spending.

Heart disease events claim a life every 60 seconds (Emory Healthcare, 2012). Health care organizations are challenged with developing plans to improve patient compliance with core measures to prevent CHF readmissions (Maeda & LoSasso, 2011). Beginning in fiscal year 2012, preventing CHF readmissions was designated a priority among health care organizations (CMS, 2012). This approach, however, is reactive. On the other hand, early recognition of the signs and symptoms leading to CHF—such as a combination of diabetes, high blood pressure, obesity, and high cholesterol—offer medical professionals the opportunity to initiate educational interventions that could prevent patients from a hospital admission for CHF. This approach is pro-active.

According to Eckerson (2006), predictive analytics can help health care organizations use information proactively, rather than reactively, to improve compliance and reduce the risk of chronic disease.

Purpose Statement and Project Objectives

The purpose of the project was to develop a predictive model for information systems that would identify patients at increased risk for CHF and alert medical professionals. In the progress of developments for health care information applications, clinical decision support systems (CDSSs) were designed to warn practitioners of potential contraindications. CDSSs are intended to improve care and reduce costs. These tools are embedded into electronic health record software. When a clinician is working with a patient’s electronic health record, the data are compared to current evidence-based medicine, which is embedded in the software. A patient’s medical history is a relevant driver in a decision-making approach to preventing chronic disease. Because a practitioner can be alerted to the existence of combined comorbid risk markers, a clinical decision can be prompted to begin early preventive measures. The primary objective was to validate—through analysis of mined historical patient data—that hypertension, hypercholesterolemia, diabetes, and obesity are prevalent comorbidities contributing to a diagnosis of CHF in an acute care facility. In this study, the health records of patients between the ages of 55 and 70 who had these risk markers and who had been diagnosed with CHF were queried through a data mining program. Data mining constitutes a sophisticated computerized search of the data to discover patterns and correlations. An analysis of the retrieved data was performed to predict the prevalence of the combined risk markers and their significance as a contributor to CHF.

The use of predictive analytics is dependent upon the reliability and validity of the data. Diagnosis onset dates are key predictors for determining how much time it takes from the onset of comorbidities to a diagnosis of CHF. The outcome of this project recognizes the benefits of predictive analysis as the right tool for identifying patients who have an increased risk of acquiring CHF and thus promote care planning and education.

Assumptions and Limitations

The project question was presented as: Can predictive analytics through data mining the electronic health record and analyzing current and historical diagnoses contribute to the prevention of CHF? This project is based on the following assumptions:

  1. A predictive analytic model would be cornerstone to a CHF prevention project.
  2. A combination of comorbidities is present among patients with CHF.
  3. A timeframe can be developed indicating an average length of time that a patient would acquire CHF from the onset date of comorbidities.

The limitation of this project is that predictive analytic software is in the evolution stage for use in health care. Little is known about the data requirements for transition of its use. Therefore, I have designed a simple model to analyze internal validity of data from an existing health care system to determine readiness for the use of predictive analytics. The boundaries of this project are limited to mining historical encrypted health information data specific to diagnoses including codes and dates. Mining data from the electronic health record can be challenged by misinformation and lack of provider participation in electronic charting.

Literature Review

The United States is currently moving toward data-driven health care decision making. The data in the electronic health record is becoming a vital necessity for use in research and evidence based practice. Until recently, health care organizations did not have enough electronic data to begin using predictive analytics to help prevent chronic disease. Reducing the prevalence of CHF, as a significant chronic disease warranting management by health care organizations and clinicians, has taken the spotlight in the literature.

CHF is considered the leading cause of death after stroke and cancer and there is an association between prevalence and prevention (Suh et al., 2011). The progression of comorbidities was seen as a priority for chronic disease management (Dungan, Binkley, Nagaraja, Schuster, & Osei, 2011; Lassen & Jespersen, 2011; Palano, Paneni, Sciarretta, Tocci, & Volpe, 2011; Sakatani et al., 2005). Hypertension and hypercholesterolemia were reported to be prevalent contributors to the development of cardiovascular diseases that can worsen prognosis for CHF (Palano et al., 2011; Sakatani et al., 2005). Obesity and diabetes were factors that increased CHF mortality (Lassen & Jespersen, 2011). Throughout the literature, hypertension, diabetes, obesity, and hypercholesterolemia were major risk factors for acquiring CHF. Therefore, these four diagnoses were chosen as key variables for this project. The literature identified progression of comorbidities as causal mechanisms of chronic disease (Dungan et al., 2011; Lassen & Jespersen, 2011; Palano et al., 2011). Hypertension, diabetes, obesity and hypercholesterolemia have been marked as mechanistic links contributing to CHF (National Heart Lung and Blood Institute, 2012; Mayo Clinic, 2012; American Heart Association, 2012; Centers for Disease Control and Prevention, 2012).

Conceptual Models and Theoretical Frameworks

Predictive analytics was derived from economic theory collecting commercialized information from computer systems for marketing prospects (Burns, 2011). According to Tremblay Consulting (2005), predictive modeling is not a new concept. It has been used, in its primitive approaches, to clinically diagnosis and treat. Predictive models using computer technology are driven by data and based on scientific knowledge through computer based analysis and learning (Shmueli & Koppius, 2011). A basic model describing a data mining process of predictive analytics for prevention of CHF was developed (see Figure 1).

Figure 1. Predictive analytic model demonstrating CHF prevention.

Figure 1. Predictive analytic model demonstrating CHF prevention.

Project Design

The main purpose of the project was to develop a predictive model for information systems that would identify patients at increased risk for CHF. The design for this project used data analysis from a targeted population of subjects to determine the coexistence of four comorbid diseases as a combined predictor of CHF. Due to the innovation of this project, a basic model describing a data mining process of predictive analytics for prevention of CHF was developed (see Figure 1). The process flow of this project included extracting and analyzing the data, presenting outcomes and proposing interventions (see Figure 2).

Figure 2. Overall flow of the project.

Figure 2. Overall flow of the project.

Population and Sampling

According to CHF statistics from Emory Healthcare (2012), more than 1.4 million Americans under the age of 60 years are diagnosed with CHF; two percent of these people are between the ages of 40 and 59 years, and more than half of those who develop CHF die within 5 years of diagnosis. The sampling subjects for this project are patients between the ages of 55 and 70 years with CHF. The correlation between their medical histories of comorbidities and onset dates were targeted to develop a timeline for acquiring CHF. Based on the morbidity rate of 5 years after diagnosis the target population for early recognition and interventions is suggested for patients between 40 and 59 years without a diagnosis of CHF.

Data Collection

The data collection process was performed using a query and analysis software program. This program searched for patients admitted from November 1, 2012 through March 18, 2013, the busiest part of the community’s tourist season. The query included patients between the ages of 55 to 70 years with a diagnosis of CHF. A second query retrieved all of the diagnoses, problems, and dates from the medical record and claims databases (see Figure 3). The following data from the patient’s health record was retrieved: encounter number, diagnosis codes and description, and problem descriptions with onset dates. The data was then processed onto an Excel spreadsheet for analysis. The spreadsheet was then used to evaluate the prevalence of the risk markers and to calculate timeframes for the onset of CHF.

Figure 3. Data collection process with the use of a query and analysis program.

Figure 3. Data collection process with the use of a query and analysis program.

Project Evaluation Plan

Program Evaluation Theory was used for evaluating this project. Program theory evaluation was applied based on collecting and analyzing a patients’ historical data, looking at dates of variable diagnoses defined as hypertension, diabetes, hypercholesterolemia and obesity. An evaluation model to define the evaluation process flow corresponding with the predictive analytic model was developed (see Figure 4). The first step was to identify the most prevalent disease risk factors. These risk factors must be valid and reliable according to their existence as searchable structured data within the electronic health record. The system would then apply a logic process to determine a correlation of risk factors and onset dates to identify patients at increased risk for acquiring the chronic disease. Medical professionals using the system would receive an electronic alert to prompt a clinical decision for interventions.

Figure 4. Evaluation model of predictive analytics used for the prevention of chronic disease

Figure 4. Evaluation model of predictive analytics used for the prevention of chronic disease.

Project Strengths and Limitations

The strength of the project advocates the need for identifying patients at risk for CHF. Based on the suggestion of the hypothesis that people with multiple comorbidities are at high risk for exposure to CHF, stipulates strength in the association toward the development of early interventions. Irrespective of using a computerized system clinicians can implement early interventions by being mindful of the patient’s medical history and staying attuned to the relevance of comorbid risk factors. In retrospect of the findings hypertension was the single most prevalent risk factor co-existing with CHF.

Theoretically coronary artery disease is predisposed by high cholesterol. Therefore combining attributors differently could have changed the outcome of prevalence. In addition, there are many variables that affect the outcome of CHF such as medications, diet, and exercise. Analyzing unstructured data areas within the medical record such as progress notes and history and physicals could have indicated additional diagnoses. Valid and accurate onset dates were not available for developing a theoretical timeframe for onset of CHF. Methodologically, retrieving data on a control group could have rendered reliable observations for comparing the presence of four risk factors without a diagnosis of CHF. In addition, relative to the limited amount of research on predictive analytics in health care, no proven defined models exist to guide an analysis process on the source data prior to data mining.

Remediation of limitations regarding the relationships of selected variables could add value to the results of this project and contribute to future projects. It is recommended that healthcare organizations begin to utilize the large amounts of data within the health care records to achieve such goals.

Project Summary

The goal of this project was anticipated to present the concept of predictive analytics to be used by clinicians toward preventing chronic disease. The outcome for this project was designed to predict an average amount of time that a patient with four comorbid diagnoses would eventually acquire a CHF diagnosis. The selected diagnoses were present as contributors to CHF (see Figure 5) but in the combined state had a low relationship (see Figure 6). Additional analysis findings, from a variance perspective, identify the number of patients and percentages of CHF patients that had three, two and one out of the four variables combined. Table 1 represents the actual number of patients that were identified. Multiple assumptions can be made from the patterns of combined diagnoses (see Figure 7). The colored lines indicate the percentage of disease (noted in the column headings) from the overall sample population as noted in Table 2. The blue line represents 40 patients that had one of the four variables, the green line represents 91 patients with two of the four variables, the yellow line represents 83 patients with three of the four variables and the red line represents 37 patients with all four variables. It is noted that hypertension is mostly represented among patients with one of the four selected variables, hypertension combined with high cholesterol in patients with two variables is most prevalent, diabetes is most prevalent combined with obesity for patients with three variables, and obesity is noted as most prevalent among patients with all four variables.

Figure 5. Top 10 diagnoses found among the population sample.

Figure 5. Top 10 diagnoses found among the population sample.

Figure 6. Overall percentages of selected diagnoses among the sample population

Figure 6. Overall percentages of selected diagnoses among the sample population

Prevalence of Selected Diagnoses Among the Sample Population

Table 1. Prevalence of Selected Diagnoses Among the Sample Population

Figure 7. Percentages of diagnosis occurrence from the selected variables present among sample of 272 patients with CHF.

Figure 7. Percentages of diagnosis occurrence from the selected variables present among sample of 272 patients with CHF.

Presence of Disease for Multiple Combinations of the Four Selected Variables

Table 2. Presence of Disease for Multiple Combinations of the Four Selected Variables

Both common disease states and onset dates were needed to conclude the hypothesis of this project. A timeframe for onset of disease could not materialize based on inconsistent and unverifiable dates of diagnosis. Diagnosis dates in the electronic medical recorded are entered according to the dates for treatment during the patient’s hospital stay. When diagnosis dates were not entered into a structured data field, the coding department would research the medical record for the diagnosis and attach the current stay date as the diagnosis date for insurance claims data. A structured data field is a searchable usually required field that can be searched with a query. In the medical record there is a structured data field labeled as onset date. This date is attached in the problem text box used by nursing to identify the patient’s problems for care planning purposes. All problems and onset dates were pulled from the query search and included in the Excel data retrieved for this project. The text terminology used by nursing retrieved from the problem data field was inconsistent and difficult to correlate due to the use of free text (see Table 3). There was also inconsistent use of onset dates for the selected variables. Nursing staff were interviewed to determine their knowledge and use of the onset date field. Of the ten nurses interviewed it was identified that most nurses would enter the current days date in the field adjoining the current problem. The problem list consists of options for viewing all, active and inactive patient problems. Since the induction of the system this process was not consistently or accurately used as intended by the vendor.

The problem list was sorted and analyzed to determine a correlating onset. The problem list currently is not standardized in its use by the nursing staff. The Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) is available to use within the problem field. However, the use of free text in that field is making the aggregation of data difficult, if not impossible, for trending research. Of the over 5,600 problem line items retrieved for each patient in the sample population only one patient with four out of four comorbidities had corresponding onset dates. It is recommended that standards be developed for support and use of SNOMED CT codes and that education and training emphasize consistent and accurate recording of data by clinicians. This was recognized as a problem by the organization and they are currently revising the process and planning training sessions with the nursing staff related to SNOMED CT and problem identification.

Problem Terminology Entered by Nursing Staff as Found in the Data Fields of the Medical Record

Table 3. Problem Terminology Entered by Nursing Staff as Found in the Data Fields of the Medical Record

Project Evaluation Report

The underlying assumption of this project was that reliable data existed within structured fields to support the presence of comorbidities and establish a timeframe for onset of CHF. The overall analysis revealed that the four selected comorbid risk factors were present among patients with CHF but not as most prevalent. Lack of valid onset dates could not prove that the risk factors were present prior to the onset of CHF. Entering supportive data into the electronic health record will need to be redefined and processed accurately for advancing predictive analytics in healthcare. Feedback of these results has been provided to executive stakeholders of the organization. Change is in the process of development. The evaluation process impacting this project includes close observation of the computer inputs, processes, and outputs for any organization to succeed with the use of predictive analytics to prevent chronic disease.

Summary of Findings

More emphasis should be drawn to ensure the presence and accuracy of onset dates in the medical record. The American Health Information Management Association (2013) supports the Electronic Health Record System Functional Model of Health Level Seven International and its guiding requirement that a system should be able to capture the onset date of a problem. The technology to capture this date exists but not the proper use. As the United States moves forward with electronic technology in health care, these dates must be accurate and in place before data are electronically exchanged to other providers and regulatory agencies for continuity of care.


In conclusion, there is a strong belief that predictive analytics can play a vital role in the prevention of CHF. The data set was not sufficient to complete predictive inferential statistics or the complex logistics of a true predictive model. It is important for nursing to take the lead for the future influencing proactive responses to patient diagnoses that can lead to mortality of chronic disease. As the electronic health record emerges and technology advances multidisciplinary teams need to address the future needs of the medical record and ensure accuracy in the validity of documentation in an ongoing process.


American Health Information Management Association. (2013). Problem list guidance in the EHR. Retrieved from public/documents/ahima/bok1_049241.hcsp?dDocName=bok1_049241

American Heart Association. (2012). Understand your risk for heart failure. Retrieved from UnderstandYourRiskforHeartFailure/Understand-Your-Risk-for-Heart-Failure_UCM_002046_Article.jsp

Burns, E. (2011). Why predictive analytics are important and more. Retrieved from

Centers for Disease Control and Prevention. (2012). Division for heart disease and stroke prevention: Heart failure fact sheet. Retrieved from

Centers for Medicare and Medicaid Services. (2012). Readmissions reduction program. Retrieved from

Centers for Medicare and Medicaid Services. (2013). A record of progress on health information technology. Retrieved from

Dennis, A., Wixom, B. H., & Roth, R. M. (2009). Systems analysis & design (4th ed.). Hoboken, NJ: John Wiley & Sons, Inc.

Dungan, K., Binkley, P., Nagaraja, H., Schuster, D., & Osei, K. (2011). The effect of glycaemic control and glycaemic variability on mortality in patients hospitalized with congestive heart failure. Diabetes/Metabolism Research and Reviews, 27(1), 85-93. doi:10.1002%2Fdmrr.1155

Eckerson, W.W. (2006). Predictive analytics: Extending the value of your data warehousing investment. Retrieved from

Emory Healthcare. (2012). Heart Failure Statistics.  Retrieved from


Kahn, M. G., Batson, D., & Schilling, L. M. (2012). Data model considerations for clinical effectiveness researchers. Retrieved from

Lassen, C., & Jespersen, B. (2011). Management of diuretic treatment: A challenge in the obese patient. Scandinavian Journal of Urology and Nephrology, 45(3), 220-222. doi:10.3109/00365599.2011.552435

Maeda, J. L. K., & LoSasso, A. T. (2011). Effect of market competition on hospital performance for heart failure. American Journal of Managed Care, 17(12), 816-822. Retrieved from

Mayo Clinic. (2012). Heart failure: Causes. Retrieved from

McGonigle, D., & Mastrian, K. (2009). Nursing informatics and the foundation of knowledge. Sudbury, MA: Jones and Bartlett Publishers.

National Conference of State Legislatures. (2012, October). Chronic disease prevention and health promotion. Retrieved from

National Heart Lung and Blood Institute. (2012). What causes heart failure? Retrieved from

Palano, F., Paneni, F., Sciarretta, S., Tocci, G., & Volpe, M. (2011). The progression from hypertension to congestive heart failure. Recenti Progressi in Medicina, 102(12), 461-467. doi:10.1701/998.10857

Predixon. (2013). Predixon on healthcare. Retrieved from

Sakatani, T., Shirayama, T., Suzaki, Y., Yamamoto, T., Mani, H., Kawasaki, T., & ... Matsubara, H. (2005). The association between cholesterol and mortality in heart failure. Comparison between patients with and without coronary artery disease. International Heart Journal, 46(4), 619-629. doi:10.1536/ihj.46.619

Shmueli, G., & Koppius, O. R. (2011). Predictive analytics in information systems research. MIS Quarterly, 35(3), 553-572. doi:10.2139/ssrn.1606674

Suh, M., Chen, C., Woodbridge, J., Tu, M., Kim, J., Nahapetian, A., & ... Sarrafzadeh, M. (2011). A remote patient monitoring system for congestive heart failure. Journal of Medical Systems, 35(5), 1165-1179. doi:10.1007%2Fs10916-011-9733-y

Tremblay Consulting. (2005).Predictive health: Policy for predictive modeling and long-term health conditions. Retrieved from files/Future%20health/predictive%20modelling%20healthcare.pdf

United States Department of Health and Human Services. (2011). A public health action plan to prevent heart disease and stroke.  Retrieved from dhdsp/action_plan/pdfs/action_plan_full.pdf

World Health Organization. (2005). Preventing chronic diseases: A vital investment.  Retrieved from

Author Bio

Deborah H Selman, DNP, RN - Nursing Professor for Edison State College and Wolford College.

Dr. Selman is Chairman-elect for the Distance Education Committee of the Health Information and Management Systems Society and serves as convener for the eVALUEation of the Health IT workgroup. She began her education in 1979 with computer programming. Deborah completed her MSN specializing in Nursing Informatics and was awarded her DNP through Walden University in 2013. Over the years Deborah has designed several educational programs for leadership, nursing informatics, and utilization of simulation in nursing practice. Dr. Selman’s passion is to lead the way for nursing to be proactive in its approach toward quality healthcare for populations in this computer-driven world of inter-operability and big data.