CS548 Fall 2017: Sequence Mining by: Emma Clavet, Nan Hu, Shivangi

21 Slides2.96 MB

CS548 Fall 2017: Sequence Mining by: Emma Clavet, Nan Hu, Shivangi Pandey, Tesia Shizume, Xiaojun Wang Showcasing work by: Aileen P. Wright, Adam T. Wright, Allison B. “The Use of Sequential Pattern Mining to Predict Next Prescribed Medications” McCoy, Dean F. Sittig on

References [1]. Wright, A., Wright, A., McCoy, A. and Sittig, D. (2015). The use of sequential pattern mining to predict next prescribed medications. Journal of Biomedical Informatics, 53, pp.73-80. Figures on slides 7, 9, 13,14,15,16,17,19 were taken from this paper [2]. Srikant R, Agrawal R. Mining sequential patterns: generalizations and performance improvements. Springer; 1996. [3]. “ Almaden Institute - IBM, 25-Jul-2016. [Online]. Available:http://researcher.watson.ibm.com/researcher/view group.php?id 4260. [Accessed: 29-Nov-2017]. [4]. Blue Cross Blue Shield of Texas. (2017). Special Enrollment. [online] Available at: https://www.bcbstx.com/ [Accessed 30 Nov. 2017]. [5]. Zaki MJ. SPADE: an efficient algorithm for mining frequent sequences.MachLearn2001;42(1–2):31–60. Figures on slides 10, 11, 12 were taken from this paper [6].D. Nathan, J. Buse, M. Davidson, E. Ferrannini, R. Holman, R. Sherwin and B. Zinman. (2009) Medical Management of Hyperglycemia in Type 2 Diabetes: A Consensus Algorithm for the Initiation and Adjustment of Therapy. Diabetes care, 32, pp. 193-203. Figure 2 on slide 15 was taken from this paper and modified to use only drug classes

Introduction What is Sequential Pattern Mining? A data mining technique used to identify patterns of ordered events[3] within a database Original applications were in the retail industry How does Sequential Pattern Mining apply to this paper? This paper uses sequential pattern mining to infer relationships between medications for diabetes patients The transitioning from paper to electronic medical records has helped accumulate large amounts of clinical data

Background Stepwise Pharmacological Therapy: Recommends a treatment algorithm (clinical decision support system) according to the progression of diseases Common for progressive conditions like diabetes Relies on an updated knowledge base Sequential Pattern Mining: Automated development of a knowledge base of temporal relationships between medications, which could be used to guide clinical decision support based around drug regimen changes

Hypothesis “Sequential Pattern Mining is an effective technique to identify temporal relationships between medications and generate rules which diabetes medication is prescribed next for a patient.” Research explored two questions: 1. Is sequential pattern mining useful for predicting changes to a patient’s diabetes medications? 2. How useful is the patient’s history of prior medication changes to predicting the next medication?

Dataset Description: Claims data from Blue Cross Blue Shield of Texas, for medications between 2008 and 2011 Included patients on at least one diabetes medication Split 90% Training set, 10% Test set n 145,936 n (Claims records from 2008-2011) 16,011 Limitations: If a patient started on diabetes medication before 2008 that would not be captured This dataset only spans 3 years, and the progression of Type II diabetes can take decades

Data Collection & Preprocessing World Health Organization Anatomic Therapeutic Chemical drug classes E.g. (Biguanide,Sulfonylurea) (Metformin,Glipizide,Glyburide)

Process Two experiments: Generic Drug Level & Drug Class Level

sid

SPADE (Sequential Pattern Discovery using Equivalence Classes) Algorithm Transform horizontal database layout into a vertical “id-lists” Minimizes the number of database scans required. cSPADE incorporates constraints on sequences, like length or time of a sequence Used R package: ‘arulesSequences’ Support Counting: S(A) 4, S(B) 4,S(D) 2, S(F) 4 Id-lists of the most frequent items (1-sequences)

SPADE (Sequential Pattern Discovery using Equivalence Classes) Algorithm Equivalence classes 2 sequences are in the same class if they share a commonn length prefix Rules of temporal id-list join Event atom with event atom PA & PF PAF Event atom with sequence atom PF & P A PF A Sequence atom with sequence atom P A F P A &P F P F A P FA

SPADE (Sequential Pattern Discovery using Equivalence Classes) Algorithm Sequence atom with sequence atom P A&P F P A F P F A P FA Decrease the memory requirement

Evaluation of Results Fig. 2. Evaluation example. Mined patterns from training data are transformed into rules which are used to predict the next drug, given base stems derived from patient sequences in the test set.

Evaluation of Results - Dataset statistics The top patterns of length 2–6 items: 87% (13/15) begin with a Biguanide 47% (7/15) have a Sulfonylurea as the second item 67% (10/15) end with Insulin. Consistent with recommendation made by American Diabetes Association

Visualization of top 2-item sequential patterns Figure 2- Algorithm for the metabolic management of type 2 diabetes Fig. 3. Digraph of diabetes medications. The most frequent 2-item sequences are shown. Differences in support are represented by edge thickness. For clarity, only the direction between nodes with highest support are shown; reverse directions with lesser support are suppressed.

Evaluation at drug class level & generic drug level Rules not reaching 5 attempts: Meglitinide fonylurea Biguanide SulDPP-4inhibitor Attempt 1: Meglitinide fonylurea Biguanide SulDPP-4inhibitor Attempt 2: Biguanide Sul-fonylurea DPP-4inhibitor Attempt 3: Sul-fonylurea DPP-4inhibitor

Evaluation at drug class level & generic drug level

Evaluation -- 10-fold Cross validation Under 10-fold cross validation at the drug class level, the percentage of patients with a correct prediction made within 3 attempts

Medication prediction use case Physician Patient Pharmacy

Conclusions Sequential pattern mining is a useful data mining technique for identifying temporal relationships between medications. The temporal relationships are useful for making predictions about which medication a prescriber is likely to choose next when treating a progressive disease, such as diabetes. Future work: optimize the use of sequential pattern mining to detect temporal relationships among items in medical records and improve patient care.

Questions?

Back to top button