Knowledge Discovery Lab

Improving Precision Medicine through Lifelong Machine Learning

To date, lifelong/neverending learning research has typically been limited to either narrowly focused tasks (e.g., facial recognition) or to larger tasks that humans already do well (e.g., reading). Deep phenotyping is a large, complex task that humans do not completely understand; as such, it represents a new type of problem for this class of learners. Furthermore, most phenotypes are developed using a timeconsuming manual process, and most efforts to use machine learning to develop phenotypes still rely on manually labeled cases and chartreviews with only very limited efforts seeking to eliminate manual processing. We hypothesize that lifelong machine learning can bring an unparalleled degree of autonomy to deep phenotyping.

Specifically, this project proposes a collaboration with researchers from the Medical University of South Carolina (MUSC) to augment their opensource deep phenotyping platform (Clinical3PO) with lifelong learning. MUSC researchers, led by Dr. Lewis Frey, developed Clinical3PO to address the volume and variety of clinical data associated with deep phenotyping. Dr. Frey has already begun collaborating with us toward this goal. This is a step toward our longterm goal of a semiautonomous precision medicine support system that transforms longitudinal electronic health data into a continuously improving knowledgebase that includes phenotypic information.

This material is based upon work supported by the Office of Research at Tennessee Tech University.

Pattern Learning and Anomaly Detection across Multiple Data Streams

Effectively handling disparate data sources in today’s highly connected “big data” world involves addressing three issues: speed, storage, and knowledge discovery. One potential way of addressing the first two issues is to represent the data as streams, whereby data is processed in near real-time, and only relevant information is stored persistently. For the third issue of knowledge discovery, representing these data streams as graphs has proven effective for uncovering patterns and anomalies. However, graph-based approaches to knowledge discovery, including pattern learning and anomaly detection, have been problematic – particularly when it comes to analyzing large amounts of data, or data that is represented as a stream. In addition, a better understanding of patterns and anomalies associated with a person, place, or activity, cannot be realized through a single graph stream. For instance, in social media, one can discover interesting patterns of behavior about an individual through a single account, but better insight into their overall behavior is realized by examining all of their social media actions simultaneously. Thus, while graphs are a logical choice for representing such data, there are many challenges associated with graph mining methods in regards to designing scalable algorithms that can operate in real-time on multiple, heterogeneous graph streams.

In this project we will investigate a new framework capable of scalable knowledge discovery in multiple graph streams. Our objective will be to develop and evaluate the approach of fusing multiple graph streams into one and evaluating the performance of pattern learning and anomaly detection in terms of both scalability and accuracy. We will consider several techniques for data fusion from known entity resolution across streams to pattern-based fusion, where entities are resolved based on the similarity of their relationships across streams. The objective is not only to show that known patterns and anomalies in individual streams can still be discovered efficiently, but also that new patterns and anomalies consisting of information from multiple streams can be identified. We will use both artificial and real-world multi-stream graph data to evaluate our methods.

This material is based upon work supported by the TTU Office of Research.