Unstructured Data Healthcare

The electronic future of healthcare records is inevitable. The amount of data processed in the healthcare industry is expanding rapidly as the data arrays are growing at an exponential rate, boosted by technology developments. There are numerous sources of data, such as various hospital IT systems containing clinical, financial, administrative, and pharmacy data and of course electronic health records (EHR), which now are mandated for healthcare organizations under US laws and standards.


The digital transformation of the healthcare industry is facilitated by big data, the data processing technology which allows processing of large data sets collected from multiple traditional and digital sources, both inside and outside a facility, for ongoing discovery and analysis. Big data refers both to the use of structured and unstructured data.


What Is Discrete Data in Healthcare?

Today healthcare has more data than ever before It should allow healthcare professionals to make more informed decisions and to offer the best patient care. Yet, it seems to take even more time than before to do meaningful work with the new data. Why? Because instead of being discrete, i.e. reportable and measurable data that is collected discretely and stored in a database table at the lowest level of granularity, much of the new data is unstructured.


Some studies by IT industry giants show that nearly 80% of the data contained in the clinical care documents in the U.S. (amounting to 1.2 billion documents annually) is unstructured. Unstructured data includes narrative text such as nursing notes, scanned documents, images, videos, etc. The problem with unstructured data is that this data cannot be easily organized and analyzed using standard, predefined structures.


Of course initiatives have been implemented in the healthcare industry already to help tracking and reporting, but the process is still at its infancy. The American Reinvestment & Recovery Act (ARRA) enacted in 2009 is aimed at modernizing the US healthcare infrastructure. One of those measures is the "Health Information Technology for Economic and Clinical Health (HITECH) Act" which supports the adoption of electronic health records (EHR) and meaningful use of this technology.


According to the Centers for Disease Control and Prevention (CDC, one of the major operating components of the Department of Health and Human Services), there are '5 pillars' of health outcome policy priorities which the concept of meaningful use is based on. Namely:

1.    Improve quality, safety, efficiency, and reduce health disparities

2.    Engage patients and families in their health

3.    Improve care coordination

4.    Improve the population and public health

5.    Ensure adequate privacy and security protection for personal health information


EHR systems can help to create a clearer picture of patient medical history. However, these systems primarily operate with structured data. As we already mentioned, there is still a large proportion of healthcare data in an unstructured format, so it is necessary to find ways to blend structured and unstructured data to obtain maximum value and facilitate meaningful use of this data and get insights into patient outcomes, increase treatment effectiveness and patient satisfaction.


Unstructured Data-Driven Problems

Healthcare professionals face many problems with unstructured data that should be solved to make healthcare delivery more effective, while using less time and money. Let’s get a little deeper into the challenges the unstructured data present for healthcare and IT specialists.


Much of the data is collected and stored in multiple places and formats. Healthcare data is collected from different source systems, healthcare facility departments, and even patients who use wearable devices (like monitors and blood pressure sensors) to control their vital signs. To access and use this data, a single, central system, such as an enterprise data warehouse (EDW), is necessary. It should also be mentioned that healthcare data is stored in different formats (those can be text, numeric, digital, images, videos, multimedia, and even paper), and, of course, this data should be integrated to be used effectively. For more information about data integration can be found here.


There is still no standardized data capture process. Though electronic medical records provide a platform for consistent data capture, the industry is still documenting clinical facts and findings in the ways that are most convenient for the care providers. As a result, there is a lot of data which is difficult to aggregate and analyze in a consistent and meaningful manner. To provide better data for analytics, the Healthcare Analytics Adoption Model must be followed.


Complex data makes it difficult to create a one-size-fits-all approach. To give a complete and clear picture of a patient’s history, clinical data from multiple sources is necessary. To manage the data from numerous systems and applications, a very sophisticated set of tools is needed.


Regulatory requirements are changing and may differ within the country. Regulatory and reporting requirements, like HIPAA and HL7 are still evolving and far from complete. The advantages of structured data are clearly visible in this case, as the more unified medical data you have, the easier it is for a facility to comply with the reporting requirements both in term of data consistence and privacy during data exchange.


So, it’s quite clear that understanding the patient’s medical history and the ability to view and analyze health records is an absolute must to make effective clinical decisions and to deliver the best possible care. But what are the ways to blend unstructured and structured big data into smart data to get a single, comprehensive picture?


Unstructured Data Solutions to Cope with the Challenges


A Healthcare Information and Management Systems Society (HIMMS) report says that before the era of Big Data, unstructured data was only viewed and read by healthcare providers, but now data can be processed with new technologies, including NoSQL, natural language processing (NLP) and machine learning.


Today there are new data tools and cheaper storage is available, so it is the right time to use new strategies and technologies to deal with structured and unstructured data in the healthcare industry. State-of-the-art technology facilitates data preparation, cleaning, normalizing, and processing.


Cloud storage capacities allow more affordable data storage for hospitals and other care providers. Large storage capacities allow aggregation, accommodation and integration of greater amounts of structured and unstructured data.


The technologies addressing unstructured data are Optical Character Recognition (OCR), Natural Language Processing (NLP), machine learning and deep learning to name a few. These technologies can extract codes and meaning from unstructured data, and those meanings can be further used in analytics and support clinical decisions.


Optical Character Recognition (OCR), which is the electronic conversion of images of handwritten or printed text into machine-encoded text, can facilitate the transition to electronic medical documents while eliminating old records that exist only in paper form. Most EMS systems include the ability to scan documents directly into the patient file and allow TWAIN scanners to connect directly to the system. OCR solutions can read a scanned document and extract its data directly into hospital EMR/EHR system. As a rule, good OCR solutions are expensive but the investment is reasonable if the facility is dealing with a large number data entry input issues.


As it is defined by healthitanalytics.com, Natural Language Processing (NLP) is a technology “using computer algorithms to identify key elements in everyday language and extract meaning from unstructured spoken or written input”. That is NLP can be used to make more complete and accurate electronic health records as the technology translates free unstructured text (which is widely used in doctors’ notes) into discrete data fields by identifying the key concepts and their relationships in healthcare documentation. The extracted data is normalized to industry standard vocabularies, which in healthcare are called “Ontologies”. The most widely used ontologies are ICD 9, ICD 10 and SNOMED. This technology can be a source of meaningful information to be used for healthcare analytics.


Machine learning and deep learning can benefit the healthcare industry by means of data synchronization, analysis, and innovation. These technologies can facilitate better and smarter disease identification and diagnosis, clinical trials, radiology and radiotherapy, and electronic health records as well.


The goal of healthcare facilities now is to identify the most useful data sources and to use them most effectively. It is time to take advantage of modern technologies to blend structured and unstructured clinical data to make it more digestible for analytics and thus provide lower-cost and better quality care across the healthcare system. It is time to contact IT professionals and develop an agile approach for managing and leveraging healthcare data to keep the pace with the modern healthcare business environment.