Project Summary / Abstract: Bladder cancer is the sixth most common cancer in the United States, and non-muscle invasive bladder cancer (NMIBC) accounts for 75-80% of all cases. Tumor recurrence and progression are common among NMIBC patients: over 50% of patients have their tumors recur, most within the first year, and up to 45% of high-risk tumors progress to muscle-invasive disease within 5 years. Patients therefore undergo intensive clinical surveillance and treatment, contributing to bladder cancer being the most expensive cancer to treat on a per patient basis. Large population-based studies have been limited in their ability to study tumor recurrence and progression because these key outcomes are not typically captured in cancer registry or other discretely coded data. To overcome this limitation and facilitate future epidemiologic and outcomes studies on NMIBC, we propose to develop and validate automated algorithms using natural language processing (NLP) to capture bladder cancer recurrence (Aim 1) and progression (Aim 2) from free-text pathology, urology, and imaging notes. We will externally validate the accuracy of the algorithms for extracting tumor characteristics using a national sample of 575 patients from the Veterans Affairs (VA) healthcare system (Aim 3). NLP is a powerful tool that works by segmenting notes into units of related text (e.g., sentences) and applying computational methods to determine meaning and extract data. We will use a novel, internally-developed NLP tool that integrates the best components of several open source NLP packages to efficiently develop, refine, and validate the proposed algorithms. Kaiser Permanente Southern California (KPSC) is an ideal study setting because of its large, diverse population, advanced electronic health record, high-quality cancer registry, and complete capture of care. The initial NLP algorithms will be created based on clinical input and chart reviews of a sample of medical records. The algorithms first will be developed using diagnostic reports, leveraging validated cancer registry data on 6,000 patients; the same clinical procedures are used for initial diagnosis as for recurrence / progression. Then, algorithms will be applied to surveillance reports and iteratively refined based on false positive and negative results vs. study chart reviews (n=100 for each iteration). The final algorithms will be compared to an expert reference standard provided by 2 urologic oncologists and a pathologist in a sample of 200 patients. Algorithm performance will be assessed by sensitivity, specificity, positive predictive value, and negative predictive value. The final algorithms will be applied to 4,000 newly diagnosed NMIBC patients age >18 from 2008-2017 within KPSC. The frequency of recurrence and progression will be described, and characteristics of patients with and without the outcomes will be compared. Successful completion of study aims will produce novel, automated methods that will facilitate large epidemiologic and outcomes studies, whose results may improve care for NMIBC patients.