- What are healthcare datasets and why are they important?
- Examples of datasets in healthcare
- 10 best healthcare datasets for data mining
Wikipedia defines a data set as a collection of data. What does that mean exactly? A data set is a collection of related sets of information composed of separate items, which can be processed as a unit by a computer. Generally, a single database table or a single statistical data matrix can be a data set. The set of items can consist of just a few items or millions of them. Either way, the fact that the items are stacked together makes them a set. This is particularly useful for data mining, a method of data analysis that searches for trends and patterns in data, providing the competitive advantage to any custom software solution.
What are healthcare datasets and why are they important?
Healthcare analytics is based on data and data sets in particular and provides all benefits of dashboards in healthcare systems. Due to the diversity of healthcare data sources, data standardization is a key pillar for efficient and meaningful use of the information and collaboration of healthcare professionals, care providers, insurers, and government agencies.
Data interchange in the US healthcare industry is strictly regulated both on national and federal levels. The Health Insurance Portability and Accountability Act (HIPAA), published in 1996, is the core set of healthcare IT data standards.
The HIPAA Rules regulate the use and disclosure of personal health information (PHI) and establish national standards to protect individuals’ electronic PHI from data theft. The Health Information Technology for Economic and Clinical Health Act (HITECH Act), adopted in 2009, is aimed to “improve health care quality, safety, and efficiency through the promotion of health IT, including electronic health records (EHR) and private and secure electronic health information exchange”.
Both laws insist on the importance of data interchange standards, including common encoding specifications, medical templates for structuring information, document architectures, and information models. And, of course, there are definite standards concerning data sets in healthcare. We should keep in mind that the purpose of health care data sets is to identify the data elements to be collected for each patient and to provide uniform definitions for common terms.
Let’s take a closer look at some of these data sets.
Examples of datasets in healthcare
- The Uniform Hospital Discharge Data Set (UHDDS) was an initiative of the Department of Health, Education, and Welfare, the predecessor of today’s Department of Health and Human Services (HHS). It was first implemented in 1974 and has since undergone several revisions. Though this set is specific to hospitals that provide medical services for those covered by Medicare and Medicaid, all insurance companies tend to gather information and medical records in similar styles to the UHDDS, as the value of having compatible data is clear. The UHDDS lists and defines a set of common, uniform data elements for every hospital inpatient and includes principal and previous diagnosis, the principal procedure, and other significant procedures.
- The Uniform Ambulatory Care Data Set (UACDS) regulates the area of ambulatory care. The set includes reasons for the encounter, living arrangements, and marital status. The UACDS is a recommended set, not a mandatory one.
- The Minimum Data Set for long term care (MDS) was published by the Department of Health & Human Services in 2013 and modified in 2016. It is a standardized, primary screening and assessment tool of health status that forms the foundation of the comprehensive assessment for all residents in a Medicare and or Medicaid-certified, long-term care facility. The set is used to collect demographic and clinical data on nursing home residents that must be completed for every resident at the time of admission and during reassessment periods. It is used to develop care plans and document placement at the appropriate care level.
- Data elements for emergency department systems (DEEDS) is an initiative of the Centers for Disease Control and Prevention’s (CDC) and the National Center for Injury Prevention and Control (NCIPC). DEEDS is a data set used to support the uniform collection of data at hospital-based emergency departments and to reduce incompatibilities in emergency care data. It is designed to provide uniform specifications for data elements chosen to be retained, revised, or added to their ED record systems to build reusable clinical data definitions.
- The Outcomes and Assessment Information Set (OASIS) is a standardized data set designed to facilitate the rigorous and systematic measurement of patient home health care outcomes to assess the quality of home health services. It is also used as the basis of reimbursement. The set was designed to gather data about Medicare beneficiaries who are receiving services from a home health agency. It includes a set of core data items that are collected on all adult home health patients.
- The Health Plan Employer Data and Information Set (HEDIS) is a set of standard performance measures designed to provide health care purchasers and consumers with the information they need to compare the performance of managed health care plans. It is used by the National Committee for Quality Assurance as a part of the accrediting process for managed care organizations. This tool is used by more than 90% of America's health plans to measure performance on important dimensions of care and service. The set includes administrative data, claims, and health record review data.
This list of datasets is, of course, not exhaustive but demonstrates the importance of a comprehensive approach to data collection and meaningful use for future data-driven healthcare. To catch up with other industries, healthcare organizations should adopt more long-term approaches to data collection and analysis. Moreover, with the rise of digital health systems, we have become more concerned about data security in healthcare.
Using big data for research and EMR analytics can benefit the whole healthcare system, as the ability to derive value from healthcare information and make well-grounded decisions have the potential to improve data quality in healthcare and patient care as well as reduce costs. It is a good idea to adopt the experience of other industries in coping with big data. It also makes sense to involve data scientists and IT companies to handle the collected data and get used out of a big data environment.
10 best healthcare datasets for data mining
There are a lot of data sources besides hospital data that can be useful for healthcare systems analytics. We have compiled a shortlist of the best healthcare data sets that can be used for statistical analysis. The list includes both free healthcare data sets and business data sets for healthcare providers.
- The Healthdata.gov site incorporates 125 years of US healthcare data. The data include claim-level Medicare data, epidemiology, and population statistics. Here you can find not only the data sets provided by agencies across the Federal Government but also the tools and applications for data handling and processing data.
- World Health Organization provides data and analyses on global health priorities, including health and disease statistics. Each page is dedicated to a specific topic and provides information on global situations and trend highlights. The information includes core indicators, database views, major publications and links to relevant web pages on the topic.
- data.gov includes over 197,747 data sets which, among others, include health, public safety, and science & research data sets that come from across the Federal Government. The source provides “data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations” for the purposes of improving the health and lives of all Americans.
- The Human Mortality Database (HMD) provides detailed mortality and population data for about 40 countries. The database is accessible for researchers, students, journalists, policy analysts, and others interested in the history of human longevity.
- Data and Tools of the National Center for Health Statistics include public-use data files and documentation, and restricted data. The site also provides data tools and data analysis aids, as well as data visualization for the general public, survey participants, researchers, and students.
- openFDA, launched by the U.S. Food and Drug Administration allows developers to access public FDA data through open APIs, provides raw data downloads, and offers documentation and examples. The available dataset includes reports on the adverse events of drugs, such as side effects, product use errors, product quality problems, and therapeutic failures. You can also find data on drug product labeling.
- The Big Cities Health Inventory Data Platform by the Big Cities Health Coalition is an open data platform that allows users to access and analyze health data from 26 cities, for 34 health indicators, and across six demographic indicators. The latest version of the platform features over 17,000 data points across 28 large cities, allowing users to examine a number of pressing health issues impacting urban communities across the country.
- Medicare.gov offers you a wealth of databases available to download. The available databases include 2018 Drug and Health Plan Data, 2018 Medigap Data, Hospital Compare and much more.
- US Census Bureau is another leading source of quality data about people, business and the economy of the USA. It provides demographic data at the state, city, and even zip code level.
- National Cancer Institute provides data sets on cancer incidence segmented by age, race, gender, year, and other factors. Additional datasets include Standard Population Data, U.S. Mortality Data, and U.S. Population Data. The platform also provides statistical software.
Data mining is very promising for the healthcare industry as it can identify the most useful data sources and give insights into how to use them most efficiently not forgetting about patient safety. Your facility can use data mining and analytics to answer the questions you already have and to identify inefficiencies and best practices that can improve care and reduce costs for your healthcare system.
Take advantage of your data now to get the most value out of it. By involving IT professionals experienced in managing and leveraging healthcare data, you can deliver the solutions necessary to make informed decisions in the state-of-the-art, healthcare business environment.