Modelling of cancer patient records: Challenges for big data analytics

Jing Lu

Research output: Chapter in Book/Report/Published conference proceedingConference contributionpeer-review


The healthcare industry has many established systems being used for electronic patient records, hospital administration, resource management and to circulate clinical results. There is a growing need to be able to share massive amounts of health data, perform complex analysis and visualise lifeline tracks of patients. One of the latest approaches is through the implementation of a digital strategy at local and/or national levels. Within the University Hospital Southampton (UHS) clinical data environment, the Southampton Breast Cancer Data System (SBCDS) has been developed as a "proof of concept" system. This has motivated a collaborative research project between UHS and Southampton Solent University with the following objectives: enhancement of the SBCDS user interface; expansion of its data mining capability; and exploitation of large-scale patient databases. Data from over 16,000 patients has been pre-processed and several data mining techniques have been implemented to discover frequent patterns from disease event sequence profiles. A conceptual architecture for Health (Big) Data Analytics has recently been proposed which gives a perspective on how emerging database technologies can be used to provide additional value from available health information such as electronic patient records. It is not only about the integration of technologies for data warehousing, OLAP analysis and data mining; but also the integration of data from various sources and how to share and connect the resources. This opens up the opportunity for quantitative modelling and visualisation on a wider scale to further inform decision making by clinicians. The UHS-Lifelines system provides a conceptual model for the time-structured presentation of all key data for any patient or any chronic condition on a single computer screen. In particular, a Cancer-Lifetrak timeline has been developed to highlight the month of onset of key episodes of breast cancer progression, diagnosis, local recurrence, metastasis etc. It also permits measurement of time intervals between episodes and the correlation of these intervals with pathology and treatments. One challenge of the Lifetrak representation is the overloading of the graphical interface by a concentration of many events over a relatively short period. A practical graphical user interface approach is thus needed which will handle this situation, so that the overriding story told by the data is not lost or corrupted. A sample dataset has been extracted for cluster analysis which includes the range of clinical data often associated with cancer patients, e.g. a mixture of pathology, radiology and clinical documentation events. The full set of records for all 16,000 breast cancer patients comes to in excess of a million rows of event data and presents a bigger challenge for data management and modelling. This paper will showcase the potential for Big Data technologies to offer enhanced visualisation in this context.
Original languageEnglish
Title of host publicationIMA Conference on Quantitative Modelling in Management of Health & Social Care
Publication statusPublished - 1 Mar 2016
Externally publishedYes


Dive into the research topics of 'Modelling of cancer patient records: Challenges for big data analytics'. Together they form a unique fingerprint.

Cite this