Event Archives

2018 Events

Tuesday, January 16, 2018, Chanelle J. Howe, Brown University School of Public Health

“Causal Mediation Analyses When Studying HIV Racial/Ethnic and Other Health Disparities"

Reducing HIV racial/ethnic and other health disparities in the United States is a high priority. Reductions in HIV racial/ethnic and other health disparities can potentially be achieved by intervening on important intermediates. Causal mediation analysis techniques can be used to identify important intermediates of HIV racial/ethnic and other health disparities as well as estimate the impact of intervening on such intermediates when certain conditions are met. Using racial disparities in HIV virologic suppression as an example, this talk will: (1) describe a conceptual framework for studying HIV racial/ethnic and other health disparities; (2) review causal mediation analysis techniques that can be used for the aforementioned identification and estimation; (3) discuss why studies of HIV racial/ethnic and other health disparities can be particularly vulnerable to selection bias and detail potential approaches that can be used to minimize such selection bias; and (4) emphasize the importance of “good data” when performing causal mediation analyses in this setting.

2017 Events

Tuesday, December 5, 2017, Ming (Daniel) Shao, University Of Massachusetts Dartmouth

“Low-Rank Transfer Learning and Its Applications"

For knowledge-based machine learning algorithms, label or tag is critical in training the discriminative model. However, labeling data is not an easy task because these data are either too costly to obtain or too expensive to hand-label. For that reason, researchers use labeled, yet relevant, data from different databases to facilitate learning process. This is exactly transfer learning that studies how to transfer the knowledge gained from an existing and well-established data (source) to a new problem (target). To this end, we propose a method to align the structure of the source and target data in the learned subspace by minimizing the reconstruction error, called low-rank transfer subspace learning (LTSL). The basic assumption is if each datum in a specific neighborhood in the target domain can be reconstructed by the same neighborhood in the source domain, then the source and target data might have similar distributions. The benefits of this method are two-fold: (1) generality to subspace learning methods, (2) robustness by low-rank constraint. Extensive experiments on face recognition, kin relationship understanding, and objection recognition demonstrate the effectiveness of our method. We will also discuss the potential of using low-rank modeling for other transfer learning related problems including: clustering, dictionary learning, and zero-shot learning.

Tuesday, November 7, 2017, Yue Wu, Northeastern University

“Low-shot Learning and Large Scale Face Recognition”

Automatic face recognition in visual media is essential for many real-world applications: e.g., face verification, automatic photo library management, biomedical analysis, along with many security applications. In this talk, I will first introduce the large scale face recognition problem, which suffers from classes and data explosion. Recognizing large scale people naturally brings the low-shot learning problem since many people only have limited number of images available for training. This talk also includes how to solve the low-shot learning problem. Along this line, some interesting applications, e.g., image-based kinship recognition, are also illustrated.

Yue Wu is a second-year Ph.D candidate in the Department of Electrical & Computer Engineering at Northeastern University, supervised by Professor Yun Raymond Fu. He received his B.S. degree in Electronic Information Engineering and M.S. degree in Information and Communication Engineering from Beijing University of Posts and Telecommunications, China, respectively. He was a research intern at France Telecom-Orange Lab Beijing during 2013-2014. His research is focused on face recognition, object recognition and deep learning, with the goal to make computers better understand faces and objects. 

Tuesday, October 3, 2017, Brandon Marshall, PhD

"Responding to The Opioid Overdose Crisis:  Insights from Rhode Island"

In this seminar, participants will learn about the state of the opioid overdose epidemic nationally and in Rhode Island. Prof. Brandon Marshall will discuss the state’s strategic plan to reduce overdose deaths, and will highlight the role that epidemiologists play in responding to the nation’s top public health crisis.

Brandon Marshall, PhD is an Associate Professor of Epidemiology at the Brown University School of Public Health. His research interests focus on infectious disease epidemiology, substance use, and the social, environmental, and structural determinants of health of vulnerable populations. He has published more than 125 scientific publications, including articles in JAMA, BMJ, and The Lancet. He works closely with the Rhode Island Department of Health on the state’s overdose epidemic efforts and directs www.PreventOverdoseRI.org, a CDC-funded statewide online surveillance system. He also chairs the Rhode Island Overdose Data Working Group and serves as an expert advisor to the Governor’s Overdose Prevention and Intervention Task Force.

Tuesday, September 19, 2017, Zhigang Li, PhD, Geisel School of Medicine at Dartmouth 

"A Semiparametric Joint Model for Terminal Trend of Quality of Life and Survival in Palliative Care Research”

Dr. Li’s research interests include developing statistical modeling tools in the field of molecular epidemiology to analyze human microbiome and epigenetic changes in order to identify microbes and DNA methylation that mediate disease-leading causal pathways in children’s health research. He is also interested in developing joint modeling approaches to model longitudinal quality of life and survival data in palliative care research to answer important questions in this area. A broad range of modeling tools are involved in his research such as lasso-type regularization, SCAD regularization, mediation analysis, structural equation modeling, GEE, mixed models, Cox model, etc.

 Tuesday, September 5, 2017, Jiang Gui, PhD, Geisel School of Medicine at Dartmouth

“Efficient Survival Multifactor Dimensionality Reduction Method for Detecting Gene-Gene Interaction

The problem of identifying SNP-SNP interactions in case-control studies has been studied extensively and a number of new techniques have been developed. Little progress has been made, however, in the analysis of SNP-SNP interactions in relation to censored survival data. We present an extension of the two class multifactor dimensionality reduction (MDR) algorithm that enables detection and characterization of epistatic SNP-SNP interactions in the context of survival outcome. The proposed Efficient Survival MDR (ES-MDR) method handles censored data by modifying MDR’s constructive induction algorithm to use logrank Test. 
We applied ES-MDR to genetic data of over 470,000 SNPs from the OncoArray Consortium. We use onset age of lung cancer and case-control (n=27,312) status as the survival outcome. We also adjust for subject’s smoking status. We first use PLINK to generate a pruned subset of SNPs that are in approximate linkage equilibrium with each other. We then ran ES-MDR to exhaustively search over all one-way and two-way interaction models. We identified that chr17_41196821_INDEL_T_D from BRCA1 gene and rs11692723_C from LOC102467079 gene as the top SNP-SNP interaction that associated with lung cancer onset age.

Jiang Gui, received his PhD. in statistics from University of California, Davis. His research involved the development of statistical and computational methods for relating high-dimensional microarray gene expression data to censored survival data. He is also interested in identifying gene-gene interaction and gene-environment interactions using machine learning algorithms.

2016 Events

Tuesday, June 7, 2016, Richard Palumbo, Ph, Post-doctoral Researcher, Computational Behavioral Science Lab, Northeastern University

"Interpersonal Autonomic Physiology:  Methods for Quantifying Physiological Synchrony in Dyads"

Interpersonal autonomic physiology is the study of dynamic interactions between people’s autonomic nervous systems. Findings from this line of research indicate that physiological activity (e.g., heart rates) between two or more people can become associated or interdependent, often referred to as physiological synchrony.  Physiological synchrony has been found in both new and established relationships across a range of contexts, and correlates with psychosocial constructs including empathy, attachment, teamwork, and emotional contagion. Given these findings, interpersonal physiological interactions are theorized to be ubiquitous social processes underlying observable behavior. However, analysis of this type of multivariate, intensive longitudinal data can be complex. Issues including nonstationarity, autocorrelation, and nonlinearity are problematic for many standard analyses. This presentation will review some of the main questions, potentials, and challenges of IAP research and analysis. Basic components of a selection of viable methods will be covered, including idiographic and nomothetic approaches.

Tuesday, May 17, 2016, Professor Sridhar Mahadevan, Director, Autonomous Learning Laboratory, College of Information and Computer Sciences, University of Massachusetts, Amherst

“New Directions in the Autonomous Discovery of Representations”

The Autonomous Learning Lab (ALL) is one of the oldest machine learning laboratories in the US, and has graduated 35 PhD students in its three decade history, many of whom are now distinguished researchers in the field. In the first two decades, research in the ALL focused on the autonomous learning of behavior, developing the basic theoretical framework used widely to study behavior acquisition throughout science and engineering, from dopamine models of reward learning in the brain to Google Deep Mind's recent demonstration of AlphaGo, which beat the human Go world champion. More recently, ALL research has focused on the autonomous discovery of representations from experience. In this talk, I'll survey a number of lab projects including a new four year NSF funded effort on developing new architectures and algorithms for deep learning, as well as NASA and NSF funded efforts to analyze spectroscopic data from Curiosity, the Mars rover. 

Tuesday, May 3, 2016, Yu Cao, PhD, Assistant Professor of Computer Science, The University of Massachusetts Lowell

 Deep Learning and Digital Health”

Recently, a new set of machine learning algorithms named ”Deep Learning”, which aims at learning multiple levels of representation and abstraction that help infer knowledge from data such as images, videos, audio, and text, is making astonishing gains in machine vision, speech recognition, natural language processing, and etc. The impact of deep learning is far reaching on applications in medical, social and commercial domains. In this talk, I will introduce our recent results in the field of deep learning with applications to digital health, such as biomedical sensor informatics for scalable behavioral activity profiling, medical imaging informatics for large-scale mining/classification.

Tuesday, April 19, 2016, W. Bruce Croft, PhD, Distinguished Professor and Interim Dean, College of Information and Computer Sciences, University of Massachusetts, Amherst

Finding Answers instead of Documents in Web Search”

Moving “beyond the ten blue links” has become a popular topic in discussions about web search. The emphasis on finding answers to questions rather than lists of documents becomes even more important in situations where the interaction bandwidth is limited, such as searching using small screens or voice-based search. In this talk, I will review the history, approaches, techniques, and challenges involved in designing search engines that retrieve answers. In particular, I will focus on recent results using “deep learning” or neural net models compared to more traditional probabilistic models and discuss the relative merits of these two approaches.

Tuesday, April 5, 2016, Li Zhou, MD, PhD, Assistant Professor, Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical School Senior Medical Informatician, Clinical Informatics, Partners eCare, Partners Healthcare

Natural Language Processing and Speech Recognition - Applications to Improve the Quality and Safety of Healthcare”

A large amount of valuable clinical information is stored in free-text, such as admission notes, discharge summaries, and clinic visit notes. In this talk I will introduce the use of advanced natural language processing (NLP) technology to enhance health data analytics, decision support and the quality of clinical documents, and focus on the following topics: 1) the MTERMS NLP system for processing free-text EHR data; 2) Applications built upon MTERMS for improving the quality and safety of care, including “NotesLink” (a web application for medication management) and a risk stratification model for predicting hospital readmissions; and 3) the quality of dictated clinical notes generated by speech recognition software.

Tuesday, March 15, 2016, Haibo He, Ph.D., P.E., Robert Haas Endowed Chair Professor, University of Rhode Island

“Imbalanced Learning in Big Data”

In this talk, Dr. He will start with an overview of the nature and foundation of the imbalanced learning, and then focus on the state-of-the-art methods and technologies in dealing with the imbalanced data, followed by a systematic discussion on the assessment metrics to evaluate learning performance under the imbalanced learning scenario. He will also present the latest research development that his group has developed and tested on various imbalanced data sets. Finally, as a relatively new challenge to the community, Dr. He will highlight the major opportunities and challenges, as well as potential important research directions for learning from imbalanced data facing the big data era.

Tuesday, February 16, 2016, Anthony P. Nunes, Ph.D., Epidemiologist, Optum Life Sciences, Waltham, MA


Information captured in provider notes adds critical insights to the clinical patient narrative that are not otherwise found in structured data.  Simple text searches of clinical notes are insufficient due to the complex language and grammar of medical free text, which varies by attributes of provider, patient, facility, type of service, and location within note.  For efficiency and validity purposes, there is a need to rely on natural language processing (NLP) approaches that utilize the full domain of available EHR while also enabling targeted development and validation of specific concepts of interest.  This talk will address concepts of NLP with applied examples in binge eating disorder and hypoglycemia, including presentation of an epidemiological study of the association between hypoglycemia severity and cardiovascular disease.

Tuesday, February 2, 2016, Susan C. Miller, PhD, Professor of Health Services, Policy & Practice, Brown University School of Public Health


Dr. Miller’s research aims to improving nursing home care, particularly near the end of life. She will briefly present findings from early studies examining the benefits of Medicare hospice care in nursing homes, discuss the barriers to timely hospice access, and then present current research on evaluating the value of specialty palliative care consults in nursing homes. During this presentation Dr. Miller will share the challenges in conducting this palliative care research, including the difficulty in ascertaining nursing home consult recipients and the need to address potential endogeneity.   

Tuesday, January 12, 2016, Rosa Rodriguez-Monguio, PhD, Associate Professor, Health Policy and Management, University of Massachusetts, Amherst

“Exploring the Association between Federal and State Policies Addressing Opioid Misuse, Abuse and Diversion and the Utilization of Opioid Prescription Drugs”

The US is experiencing an epidemic of misuse, abuse and diversion of opioid prescription drugs leading to unprecedented numbers of overdoses and deaths, and substantial healthcare and social costs. Regulation and policy initiatives intended to curve the opioid epidemic have been implemented at the Federal and state levels. In December 2010, the Commonwealth of Massachusetts launched the Massachusetts Online Prescription Monitoring Program (MA PMP) to support safe prescribing and dispensing and to address misuse, abuse and diversion of prescription opioid drugs included in the Drug Enforcement Administration (DEA) Schedule II-V. In July 2012, the Food and Drug Administration (FDA) introduced new safety measures recommending the use of extended-release opioid formulations to be reserved for patients for whom alternative treatment options (e.g., non-opioid analgesics or immediate-release opioids) are ineffective, not tolerated, or inadequate to provide sufficient management of pain. The effect of the MA PMP and the FDA policies on the opioid prescription drug use has not been evaluated. 


2015 Events

November 18, 2015, James Scanlan, Attornery at Law, Washington, DC

"The Mismeasure of Health Disparities in Massachusetts and Less Affluent Places"

Problems exist in the measurement of health and healtcare disparities arising from the failure to consider ways that standard measures of differences between outcome rates tend to be systematically affected by the prevalance of an outcome.  Special attention will be given to Massachusetts in two respects.  One involves the fact that relative demographic differences in adverse outcomes tend to be camparatively large, while relative differences in the corresponding favorable outcomes tend to be comparatively small, in geographic areas where adverse outcomes are comparatively uncommon.  The second involves anomalies in the Massachusetts Medicaid pay-for-performance program arising from the use of a measure of healthcare disparities that is a function of absolute differences between rates.

Tuesday, November 3, 2015, Steven D. Pizer, PhD, Associate Professor of Health Economics, Department of Pharmacy and Health Systems Sciences, Northeastern University

Big Data, Casual Inference, and Instrumental Variables”

“I plan to discuss the potential and risk related to analysis of big data in health services and clinical research, conduct a brief tutorial on causal inference in observational studies using instrumental variables, and then present some of my own work that puts these methods to use. The specific application uses prescribing pattern instrumental variables to study the comparative effectiveness of alternative 2nd-line medications for type 2 diabetes.”

Tuesday, October 20, 2015, Finale Doshi-Velez, PhD, Assistant Professor in Computer Science, Harvard University

“Data-Driven Phenotype Trajectories in Autism Spectrum Disorders”

Autism Spectrum Disorder (ASD) is an extremely heterogeneous developmental disorder that affects nearly one in fifty children today.  Understanding the heterogeneity of ASD is critical to discovering distinct etiologies and guiding treatment.  The proliferation of electronic health records (EHRs) has made it possible for data-driven disease subtyping in a variety of disorders, such as chronic kidney disease and diabetes.  However, especially in developmental disorders, the presentation of the disease is inextricably linked to the development of the child: these stages do not necessarily mark disease progression but rather disease evolution.  In these cases, it makes sense to think about pathophenotypes as phenotype trajectories that evolve over time.  In this talk, I will describe several approaches for deriving phenotype trajectories from EHR, which resulted in the discovery of novel ASD phenotypes.

Tuesday, October 6, 2015, Norma Terrin, PhD, Professor, Tufts University School of Medicine

Joint Models for Predicting Clinical Outcomes from Quality of Life Data”

Our objective was to test whether longitudinally measured health-related quality of life (HRQL) predicts transplant-related mortality in pediatric hematopoietic stem cell transplant (HSCT). A standard analysis (Cox model with a time-varying covariate) would not have been adequate because it ignores measurement error in the covariate, missing data in the covariate, correlation between measurement error and survival, and the endogeneity of the covariate. Instead we used a joint model, which analyzes both the longitudinal and time-to-event variables as outcomes. Specifically, we used a shared parameter model with other causes of mortality as a competing risk. The trajectories for each HRQL domain were modeled by random spline functions. The survival submodels were adjusted for baseline patient, family, and transplant characteristics and the longitudinal submodels were run with and without adjustment. We found that HRQL trajectories were predictive of transplant-related mortality in pediatric HSCT, even after adjusting the survival outcome for baseline characteristics. Unadjusted trajectories were better predictors than adjusted trajectories. 

Tuesday, September 15, 2015, Hoifong Poon, PhD, Researcher, Microsoft Research

Machine Reading for Cancer Panomics”

Advances in sequencing technology have made available a plethora of panomics data for cancer research, yet the search for disease genes and drug targets remains a formidable challenge. Biological knowledge such as pathways can play an important role in this quest by constraining the search space and boosting the signal-to-noise ratio. The majority of knowledge resides in text such as journal articles, which has been undergoing its own explosive growth, making it mandatory to develop machine reading methods for automating knowledge extraction. In this talk, I will formulate the machine reading task for pathway extraction, review the state of the art and open challenges, and present our Literome project and latest attack to the problem based on grounded semantic parsing.

Tuesday, September 1, 2015, Roee Gutman, PhD, Department of Biostatistics, Brown University

“Robust Estimation of Causal Effects with Application to Erythropoiesis-stimulating Agents for End-stage Renal Disease”

This talk will focus on the proposal of an outcome-free three-stage procedure to estimate causal effects from non-randomized studies.  First, we create subclasses that include observations from each group based on the covariates.  Next, we independently estimate the response surface in each group using a flexible spline model.  Lastly, multiple imputations of the missing potential outcomes are performed.  A simulation analysis which resembles real life situations and compares this procedure to other common methods is carried out.  In relation to other methods and in many of the experimental conditions examined, our proposed method produced a valid statistical procedure while providing a relatively precise point estimate and a relatively short interval estimate.  We will demonstrate an Extension of this procedure to estimate the effects of Erythropoiesis-stimulating agents (ESAs) for end-stage renal disease patients undergoing hemodialysis.

May 19, 2015, Matthias Steinrücken, PhD, Assistant Professor of Biostatistics, Department of Biostatistics and Epidemiology, UMass Amherst

"Detecting Tracts of Local Ancestry in Genomic Sequence Data of Modern Humans"

The complex demographic history of modern humans has had a substantial impact on the genetic variation we observe today. Due to the process of chromosomal recombination the genomes of contemporary individuals can be mosaics comprised of different DNA segments originating from diverged subpopulations. This is of particular interest when studying variation related to genetic diseases. On the one hand, one has to account for neutral background variation resulting from the demographic history, but on the other hand, knowledge about the distribution of these ancestry segments can also be used to identify causal variants.

In this talk, I present a new method to detect tracts of local ancestry in genomic sequence data of modern humans, and demonstrate its accuracy and efficiency on simulated data. Explicitly modeling the underlying demographic history allows detection under very general scenarios. I will discuss extensions of the method and potential applications using the local ancestry information to foster the detection of functional genetic variation. The distribution of these tracts can also be used to infer features of the demographic history.

May 5, 2015, Yizhou Sun, PhD, Assistant Professor, College of Computer and Information Science, Northeastern University

"Mining Information Networks by Modeling Heterogenous Link Types"

 Real-world physical and abstract data objects are interconnected, forming gigantic, interconnected networks. By structuring these data objects and interactions between these objects into multiple types, such networks become semi-structured heterogeneous information networks. Most real-world applications that handle big data, including interconnected social media and social networks, scientific, engineering, or medical information systems, online e-commerce systems, and most database systems, can be structured into heterogeneous information networks. Different from homogeneous information networks, where objects and links are treated either as of the same type or as of untyped nodes or links, heterogeneous information networks in our model are semi-structured and typed, following a network schema. We then propose different methodologies in mining heterogeneous information networks by carefully modeling the links from different types. In this talk, I will introduce three recent developed techniques, which include (1) meta-path-based mining, (2) relation strength-aware mining, and (3) semantic-aware relation modeling, and their applications, such as similarity search, clustering, information diffusion, and voting prediction.

April 21, 2015, Scott Evans, PhD, Senior Research Scientist, Department of Biostatistics, Harvard School of Public Health

"Battling Superbugs with Statistical Thinking: Using Endpoints to Analyze Patients Rather than Patients to Analyze Endpoints"

 Suberbugs, “nightmare bacteria” that have become resistant to our most potent antibiotics, are one of our most serious health threats. In the United States, at least 2 million people annually acquire serious bacterial infections that are resistant to antibiotics, originally designed to treat those infections. Resistance undermines our ability to fight infectious diseases and presents increased risk to vulnerable patient populations, including those with HIV, cancer, renal failure, as well as patients requiring surgery, neonatal care, and intensive care. In September of 2014, President Obama issued an Executive Order outlining a national strategy for combating antibiotic-resistant bacteria.

April 7, 2015, Nicholas Reich, PhD, Assistant Professor, Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst

"Statistical Challenges in Real-Time Infectious Disease Forecasting"

 Epidemics of communicable diseases place a huge burden on public health infrastructures across the world. Advanced warnings of increases in disease incidence can, in many cases, help public health authorities allocate resources more effectively and mitigate the impact of epidemics. However, scientists and public health officials face many obstacles in trying to create accurate real-time forecasts of infectious disease incidence. Challenges range from the logistical (what data do you need and when is it available), to the statistical (what are the best methods for training and validating a forecasting model), to the scientific (what are the best models of disease transmission). In collaboration with the Thai Ministry of Public Health, we have developed a real-time forecasting model for dengue hemorrhagic fever in the 79 provinces of Thailand. Dengue is a mosquito-borne virus that annually infects over 400 million people worldwide. In this talk we will present results from our ongoing real-time forecasting efforts in Thailand while discussing the frameworks we have developed to address the challenges of this project.

March 17, 2015, Michael McGeachie, PhD, Instructor, Channing Division of Network Medicine, Harvard Medical School

"Longitudinal Microbiome Prediction with Dynamic Bayesian Networks"

High resolution DNA sequencing allows high resolution quantitative assessments of microbiome bacteria populations, and emerging evidence suggests that differences or aberrations in the microbiome can lead to various diseases and chronic conditions. Dynamic Bayesian Networks have been used in other settings to successfully model time series data and obtain accurate predictions of future behavior as well as identify salient connections and relationships within the data. In this work, we show that a DBN model of the infant gut microbiota ecology captures explicit relationships casually observed previously, including a relationship between age and clostridia, and between clostridia, gammaproteobacteria, and bacilli. DBN models is further useful for identifying rare, dramatic, sudden shifts in microbiome population (“abruptions”) observed in some infants, and providing quantitative likelihood estimates for these events. We will further discuss the differences between iterative and sequential prediction of infant gut microbiome composition, and the DBN’s usefulness for predicting response to perturbations and unusual initial conditions.

March 3, 2015, Matthew Fox, DSc, MPH, Associate Professor, Center for Global Health & Development, Department of Epidemiology, Boston University

"Quantitative Bias Analysis:  The Case of the NO-SHOTS trial"

It is well understood that both systematic and random error can impact the results of epidemiologic research, however while random error is nearly always quantified, systematic error rarely is. Systematic error is typically relegated to discussion sections of manuscripts despite the fact that simple methods to quantify the impact of sources of bias have existed for years. This talk will demonstrate simple methods for quantitative bias analysis for misclassification problems and use the example of the NO-SHOTS randomized trial to demonstrate how the methods can be effective at exploring but the magnitude and direction of bias.

February 17, 2015, Xiagnan Kong, PhD, Assistant Professor, Computer Science Department, Worcester Polytechnic Institute

"Towards Taming Big Data Variety: From Social Networks to Brain Networks"

Over the past decade, we are experiencing big data challenges in various research domains. The data nowadays involve an increasing number of data types that need to be handled differently from conventional data records, and an increasing number of data sources that need to be fused together. Taming data variety issues is essential to many research fields, such as biomedical research, social computing, neuroscience, business intelligence, etc. The data variety issues are difficult to solve because the data usually have complex structures, involve many different types of information, and multiple data sources. In this talk, I'll briefly introduce the big data landscape and present two projects that help us better understand how to solve data variety issues in different domains. The first project addresses the challenge of integrating multiple data sources in the context of social network research. Specially, I will describe a network alignment method which exploit heterogeneous information to align the user accounts across different social networks. The second project addresses the challenge of analyzing complex data types in the context of brain network research. I will model the functional brain networks as uncertain graphs, and describe a subgraph mining approach to extract important linkage patterns from the uncertain graphs. I'll also introduce future work in this direction and explain some possibilities for upcoming evolutions in big data research.

January 20, 2015, Hao Wu, PhD, Assistant Professor, Department of Psychology, Boston College

“A Nonparametric Bayesian Item Response Model for Monotonic Selection Effect” 

 In various practical settings of educational and psychological measurement, individuals are potentially selected according to their ability levels before being measured. In this case, understanding the selection process would shed light on either possible unexpected issues in the administration of the measurement or important features of the group of people being measured. Given such importance, we will explore the potential selection process in this research. Especially, we will build a nonparametric Bayesian model to account for a monotonic selection effect in item response theory (IRT), where individuals with higher ability are more likely to be measured. Simulation results show that this model is able to identify and recover the selection effect in the population.

2014 Events

 December 2, 2014:  Lorenzo Trippa, PhD, Assistant Professor, Harvard School of Public Health

“Bayesian Nonparametric Cross-Study Validation Prediction Methods”

We consider comparisons of statistical learning algorithms using multiple datasets, via leave-one-in cross-study validation: each of the algorithms is trained on one dataset; the resulting model is then validated on each remaining dataset. This poses two statistical challenges that need to be addressed simultaneously.  The first is the assessment of study heterogeneity, with the aim of identifying subset of studies within which algorithm comparisons can be reliably carried out. The second is the comparison of algorithms using the ensemble of datasets. We address both problems by integrating clustering and model comparison.  We formulate a Bayesian model for the array of cross-study validation statistics, which defines clusters of studies with similar properties, and provides the basis for meaningful algorithm comparison in the presence of study heterogeneity. We illustrate our approach through simulations involving studies with varying severity of systematic errors, and in the context of medical prognosis for patients diagnosed with cancer, using high-throughput measurements of the transcriptional activity of the tumor's genes.

November 4, 2014:  Todd MacKenzie, PhD, Associate Professor, Dartmouth College

“Causal Hazard Ratio Estimation Using Instrumental Variables or Principal Strata”

Estimation of treatment effects is a primary goal of statistics in medicine. Estimates from observational studies are subject to selection bias, while estimates from non-observational (i.e. randomized) studies are subject to bias due to non-compliance. In observational studies confounding by unmeasured confounders cannot be overcome by regression adjustment, conditioning on propensity scores or inverse weighted propensities. The method of instrumental variables (IVs) can overcome bias due to unmeasured confounding. In the first part of this talk a method for using IVs to estimate hazard ratios is proposed and evaluated. In the second part of this talk the approach of principal strata for deriving treatment effects for randomized studies subject to all-or-nothing compliance is reviewed and an estimate of the complier hazard ratio is proposed and evaluated. 

October 21, 2014:  Wei Ding, PhD, Associate Professor, UMass Boston

“Data Mining with Big Data”

Big Data concerns large-volume, complex, growing data sets with multiple, autonomous sources.  In this talk, I will give an overview of our recent machine learning and data mining results in feature selection, distance metric learning, and least squares-based optimization with applications to NASA mission data analysis, extreme weather prediction, and physical activity analysis for children obesity.

October 7, 2014:  Tam Nguyen, PhD, Assistant Professor, Boston College, Connell School of Nursing

 Application of Item Response Theory in the Development of Patient Reported Outcome Measures: An Overview”

The growing emphasis on patient-centered care has accelerated the demand for high quality data from patient reported outcome measures (i.e. quality of life, depression, physical functioning).  Traditionally, the development and validation of these measures has been guided by Classical Test Theory.  However, Item Response Theory, an alternate measurement framework, offers promise for addressing practical measurement problems found in health-related research that have been difficult to solve through Classical methods.  This talk will introduce foundational concepts in Item Response Theory, as well as commonly used models and their assumptions.  Example will be provided that exemplify typical applications of Item Response Theory.  These examples will illustrate how Item Response Theory can be used to improve the development, refinement, and evaluation of patient reported outcome measures.  Greater use of methods based on this framework can increase the accuracy and efficiency with which patient reported outcomes are measured.

September 16, 2014:  Dr. Amresh Hanchate, Assistant Professor, Health Care Disparities Research Program, Boston University School of Medicine

“Did MA reform increase or decrease use of ED services?  An Application of Difference-in-Differences Analysis”

This presentation will focus on difference-in-differences regression models as an approach to estimate causal relationships. Commonly applied in the context of “natural experiments”, I will examine its application to evaluate the impact of Massachusetts health reform on the use of emergency department services. Two previous studies applying this approach (Miller 2012 & Smulowitz 2014) obtained contrasting results, one finding an increase in ED use and the other a decrease, following the Massachusetts insurance expansion (2006-2007).  I will report the findings of a comparative assessment of these contrasting results, based on side-by-side replication of the original analysis using similar data.


September 2, 2014:   Stavroula Chrysanthopoulou, PhD, Department of Biostatistics, Brown University School of Public Health

 “Statistical Methods in Microsimulation Modeling:  Calibration and Predictive Accuracy”

This presentation is concerned with the statistical properties of MicroSimulation Models (MSMs) used in Medical Decision Making. The MIcrosimulation Lung Cancer (MILC) model, a new, streamlined MSM describing the natural history of lung cancer, has been used as a tool for the implementation and comparison of complex statistical techniques for calibrating and assessing the predictive accuracy of continuous time, dynamic MSMs. We present the main features of the MILC model along with the major findings and conclusions, as well as the challenges imposed from the implementation of the suggested statistical methods.

May 20, 2014:  Craig Wells, Ph.D., Department of Educational Policy, Research and Administration, UMASS Amherst

"Applications of Item Response Theory"

Item response theory (IRT) is a powerful, model-based technique for developing scales and assessments. Due to the attractive features of IRT models, it is the statistical engine that is used to develop many types of assessments. The purpose of the presentation will be to describe the fundamental concepts of IRT as well as its applications in a variety of contexts. The presentation will address the advantages of IRT over classical methods, describe popular IRT models and applications” 

May 6, 2014:  Jeffrey Brown, Ph.D., Department of Population Medicine, Harvard Medical School

FDA's Mini-Sentinel Program to Evaluate the  Safety of Marketed Medical Products

“The Sentinel Initiative began in 2008 as a multi-year effort to create a national electronic system for monitoring the safety of FDA-regulated medical products (e.g., drug, biologics, vaccines, and devices). The Initiative is the FDA’s response to the Food and Drug Administration Amendments Act requirement that the FDA work develop a system to obtain information from existing electronic health care data from multiple sources to assess the safety of approved medical products. The Mini-Sentinel pilot is part of the Sentinel Initiative. Mini-Sentinel uses a distributed data approach in which data partners retain control over data in their possession obtained as part of normal care and reimbursement activities. Using this approach allows Mini-Sentinel queries to be executed behind the firewalls of data partners, with only summary level or minimum necessary information returned for analysis. The Mini-Sentinel network allows FDA to initiate hundreds of queries a year using across a network of 18 data partners and over 350 million person-years of electronic health data. These queries use privacy-preserving approaches that have greatly minimized the need to share protected health data. Mini-Sentinel analyses have been used to support several regulatory decisions, and Mini-Sentinel.” 

April 15, 2014:  Jessica Meyers Franklin, Ph.D., Department of Medicine, Division of Pharmacoepidemiology & Pharmacoeconomics, Harvard Medical School

"High-dimensional simulation for evaluating high-dimensional methods: Comparing high-dimensional propensity score versus lasso variable selection for confounding adjustment in a novel simulation framework"

“The high-dimensional propensity score (hdPS) algorithm has been shown to reduce bias in nonrandomized studies of treatments in administrative claims databases through empirical selection of confounders. Lasso regression provides an alternative confounder selection method and allows for direct modeling of the outcome in a high-dimensional covariate space through shrinkage of coefficient estimates. However, these methods have not been able to be compared, due to limitations in ordinary simulation techniques. In this talk, I will discuss a novel "plasmode" simulation framework that is better suited to evaluating methods in the context of a high-dimensional covariate space, and I will present a study in progress that uses this framework to compare the performance of hdPS to that of a lasso outcome regression model for reduction of confounding bias.” The Department of Quantitative Health Sciences and the Quantitative Methods Core will conduct monthly seminars to explore statistical issues of general interest.

Tuesday, April 1, 2014: Presented by: Michael Ash, Ph.D., Chair, Department of Economics, Professor of Economics and Public Policy, University of Massachusetts Amherst

"Critical Replication for Learning and Research"

“Critical replication asks students to replicate a published quantitative empirical paper and to extend the original study either by applying the same model and methods to new data or by applying new models or methods to the same data. Replication helps students come rapidly up to speed as practitioners. It also benefits the discipline by checking published work for accuracy and robustness. Extension gives a practical introduction to internal or external validity and can yield publishable results for students. I will discuss critical replication of three published papers: Growth in a Time of Debt (Reinhart and Rogoff 2010); Mortality, inequality and race in American cities and states (Deaton and Lubotsky 2003); and Stock markets, banks, and growth (Levine and Zervos 1998).”

Social Science & Medicine Cambridge Journal International Review of Applied Ergonomics

Tuesday, March 18, 2014: Presented by: Balgobin Nandram, Ph.D., Professor, Mathematical Sciences, Worcester Polytechnic Institute

"A Bayesian Test of Independence for Sparse Contingency Tables of BMD and BMI"

“Interest is focused on a test of independence in contingency tables of body mass index (BMI) and bone mineral density (BMD) for small places. Techniques of small area estimation are implemented to borrow strength across U.S. counties using a hierarchical Bayesian model. For each county a pooled Bayesian test of independence of BMD and BMI is obtained. We use the Bayes factor to perform the test, and computation is performed using Monte Carlo integration via random samples rather than Gibbs samples. We show that our pooled Bayesian test is preferred over many competitors.”

Key Words: Bayes factor, Contingency tables, Cressie-Read test, Gibbs sampler, Monte Carlo integration, NHANES III, Power, Sensitivity analysis, Small area estimation.

Tuesday, March 4, 2014: Presented by: Krista Gile, Ph.D., Assistant Professor, Department of Mathematics and Statistics, University of Massachusetts

"Inference and Diagnostics for Respondent-Driven Sampling Data"

“Respondent-Driven Sampling is type of link-tracing network sampling used to study hard-to-reach populations. Beginning with a convenience sample, each person sampled is given 2-3 uniquely identified coupons to distribute to other members of the target population, making them eligible for enrollment in the study. This is effective at collecting large diverse samples from many populations.

Unfortunately, sampling is affected by many features of the network and sampling process. In this talk, we present advances in sample diagnostics for these features, as well as advances in inference adjusting for such features.

This talk includes joint work with Mark S. Handcock, Lisa G. Johnston and Matthew J. Salganik.”

Tuesday, February 18, 2014: Presented by: John Griffith, Ph.D., Associate Dean for Research, Bouve College of Health Sciences, Northeastern University


"Translating Science to Health Care: the Use of Predictive Models in Decision Making"

“Clinical predictive models take information about a patient or subject and synthesize it into a composite score that can then assist with decision making concerning treatment for the individual patient. To be useful, these tools need to accurately categorize the risk of events for patients and their use needs to positively impact treatment decisions and patient outcomes. Statistical approaches can be used for internal validation of these models. However, clinical trials are often needed to show treatment effectiveness. The issues that arise with the development, testing, and implementation of such models will be discussed.”

John Griffith Presentation Slides

Tuesday, February 4, 2014: Presented by: Christopher Schmid, Ph.D., Professor of Biostatistics, Center for Evidence Based Medicine, Brown University School of Public Health


"N-of-1 Trials"

“N-of-1 trials are a promising tool to enhance clinical decision-making and patient outcomes. These trials are single-patient multiple-crossover studies for determining the relative comparative effectiveness of two or more treatments within each individual patient. Patient and clinician select treatments and outcomes of interest to them, carry out the trial, and then make a final treatment decision together based on results of the trial. This talk will discuss the advantages and challenges in conducting N-of-1 trials, along with some of the design and analytic considerations. A study to test the effectiveness of the N-of-1 trial as a clinical decision tool comparing patients randomized to N-of-1 vs. usual care is ongoing. The challenges of implementing the decision strategy in such a context will be discussed.”

Christopher Schmid slides

Tuesday, January 21, 2014: Presented by:David MacKinnon, Ph.D., Professor, Arizona State University, Author of "Introduction to Statistical Mediation Analysis"

"Mediation Analysis"

Learning Objective: Understanding and Running Mediation Analyses

Bring in your laptops and run step-wise mediation analyses with the speaker using SAS and free Mplus demo program (http://www.statmodel.com/demo.shtml).

Dr. David MacKinnon's Seminar Materials

McKinnon PM2002

McKinnon MBR2004

McKinnon AR2007

McKinnon Slides

2013 Archives


Tuesday, December 3, 2013: Presented by: Erin M. Conlon, Ph.D., Associate Professor, Department of Mathematics and Statistics Lederle Graduate Research, University of Massachusetts, Amherst


"Bayesian Meta-Analysis Models for Gene Expression Studies"

Biologists often conduct multiple independent gene expression studies that all target the same biological system or pathway. Pooling information across studies can help more accurately identify true target genes. Here, we introduce a Bayesian hierarchical model to combine gene expression data across studies to identify differentially expressed genes. Each study has several sources of variation, i.e. replicate slides within repeated experiments. Our model produces the gene-specific posterior probability of differential expression, which is the basis for inference. We further develop the models to identify up- and down-regulated genes separately, and by including gene dependence information. We evaluate the models using both simulation data and biological data for the model organisms Bacillus subtilis and Geobacter sulfurreducens.

Tuesday, November 19, 2013: Presented by: Jing Qian, Ph.D., Assistant Professor of Biostatistics, Division of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst

"Statistical Methods for Analyzing Censored Medical Cost and Sojourn Time in Progressive Disease Process"

To conduct comprehensive evaluation in clinical studies for chronic diseases like cancer, features of the disease process, such as lifetime medical cost and sojourn time in progressive disease process, are often assessed in addition to the overall survival time. However, statistical analysis of these features is challenged by dependent censoring and identifiability issue, arising from the incomplete follow-up data in clinical studies. In this talk, I will first present a semiparametric regression model for analyzing censored lifetime medical cost, which can be used to address cost difference between different treatments in the motivating example of a lung cancer clinical trial. Next, I will discuss how to use the similar inference approach to estimate sojourn time in progressive disease process, motivated by a colon cancer study where patients progress through cancer-free and cancer-recurrence states. Inference procedures and simulation studies will be described. The methods will be illustrated through a lung cancer and a colon cancer clinical trials.

Thursday, November 7, 2013: Presented by: Bei-Hung Chang, Sc.D., Associate Professor, Boston University School of Public Health, VA Boston Healthcare System


"Mind and Body Medicine Research: Study Design and Statistical Method Demonstrations"

The nature of mind /body practices, such as meditation and acupuncture, poses a challenge for evaluating the intervention effect. Blinding, randomization, control group selection, and placebo effects are among the list of these challenges. This talk will present two studies that employed innovative study designs to overcome these challenges in investigating the health effect of acupuncture and the relaxation response/meditation. The use of statistical methods including a 2-slope regression model and mixed effects regression models in the studies will also be demonstrated.

Tuesday, October 15, 2013: Presented by: Laura Forsberg White, Ph.D., Associate Professor, Department of Biostatistics, Boston University School of Public Health

"Characterizing Infectious Disease Outbreaks: Traditional and Novel Approaches"

Infectious disease outbreaks continue to be a significant public health concern. Quantitative methods for characterizing an outbreak rapidly are of great interest in order to mount an appropriate and effective response. In this talk, I will review some traditional approaches to doing this and discuss more recent work. In particular, this talk will focus on methods for quantifying the spread of an illness through estimation of the reproductive number. We will also briefly discuss methods to determine the severity of an outbreak through estimation of the case fatality ratio and attack rate. Applications of this work to the 2009 Influenza A H1N1 outbreak will be discussed. We will also discuss methods to estimate heterogeneity in the reproductive number.

Laura Forsberg White, Ph.D. slides

Tuesday, October 1, 2013: Presented by Molin Wang, Ph.D., Assistant Professor, Department of Medicine, Harvard Medical School, Departments of Biostatistics and Epidemiology, Harvard School of Public Health

"Statistical Methods and SAS Macros for Disease Heterogeneity Analysis"

Epidemiologic research typically investigates the associations between exposures and the risk of a disease, in which the disease of interest is treated as a single outcome. However, many human diseases, including colon cancer, type II diabetes mellitus and myocardial infarction, are comprised of a range of heterogeneous molecular and pathologic processes, likely reflecting the influences of diverse exposures. The approach, which incorporates data on the molecular and pathologic features of a disease directly into epidemiologic studies, Molecular Pathological Epidemiology, has been proposed to better identify causal factors and better understand how potential etiologic factors influence disease development. In this talk, I will present statistical methods for evaluating whether the effect of a potential risk factor varies by subtypes of the disease, in cohort studies, case-control studies and case-case study designs. Efficiency of the tests will also be discussed. SAS macros will be presented to implement these methods. The macros test overall heterogeneity through the common effect test (i.e., the null hypothesis is that all of the effects of exposure on the different subtypes are the same) as well as pair-wise differences in exposure effects. In adjusting for confounding, the effects are allowed to vary for the different subtypes or they can be assumed to be the same across the different subtypes. To illustrate the methods, we evaluate the effect of alcohol intake on LINE-1 methylation subtypes of colon cancer in the Health Professionals Follow-up Study, where 51,529 men have been followed since 1986 during which time 268 cases of colon cancer have occurred. Results are presented for all 3 possible study designs for comparison purposes. This is a joint work with Aya Kuchiba and Donna Spiegelman.

Tuesday, September 17, 2013: Presented by Zheyang Wu, Ph.D., Assistant Professor, Department of Mathematical Sciences, Worcester Polytechnic Institute, Worcester, MA.

"Genetic Effects and Statistical Power of Gene Hunting Using GWAS and Sequence Data"

Genome-wide association studies (GWAS) use high-density genotyping platforms to reveal single-nucleotide and copy number variants over whole genome for gene hunting. Although many significant genetic factors have been identified, genes discovered so far account for a relatively small proportion of genetic contribution to most complex traits, the so-called “missing heritability”. A key statistical research to champion the discovery of novel disease genes is to reveal the capacity of association-based detection strategies and design optimal methods. We study this problem from the view of statistical signal detection for high-dimensional data, while considering three major features of those unfound genetic factors: weak effects of association, sparse signals among all genotyped variants, and complex correlations and gene-gene interactions. In this talk, I will discuss two relevant results. First, we address how gene-gene interaction and linkage disequilibrium among variants influence the capacity of model selection strategies for searching and testing genes. In particular, we developed a novel power calculation framework for model selection strategies to pick up proper signals of disease genes. Second, the requirement for signal strength in gene detection could be reduced when we target on the detection of groups of signals, instead of on individual signals. Specifically, we established a theory of detection boundary, which clarifies the limit of statistical analysis: genetic effects below the boundary are simply too rare and weak to be reliably detected by any statistical methods. Meanwhile, we developed optimal tests that work for these minimally detectable signals. These results are also applicable in designing statistical association tests for detecting rare variants in exome or whole-genome sequence data analysis.

2009 ZhaoWuPlosGenetPowerModelSelectionGWAS



Wu Slides

Tuesday, September 3, 2013: Presented by Raji Balasubramanian, Sc.D., Assistant Professor of Biostatistics, Division of Biostatistics and Epidemiology, UMass Amherst


Variable importance in matched case control studies in settings of high dimensional data

In this talk, I’ll describe a method for assessing variable importance in matched case-control investigations and other highly stratified studies characterized by high dimensional data (p >> n). The proposed methods are motivated by a cardiovascular disease systems biology study involved matched cases and controls. In simulated and real datasets, we show that the proposed algorithm performs better than a conventional univariate method (conditional logistic regression) and a popular multivariable algorithm (Random Forests) that does not take the matching into account.

This is joint work with E. Andres Houseman (Oregon State University), Rebecca A. Betensky (Harvard School of Public Health) and Brent A. Coull (Harvard School of Public Health).

Powerpoint slides from presentation


Tuesday, May 21, 2013:  Presented by Alexander Turching, MD, MS, Director of informatics Research, Department of Endocrinology, Diabetes and Hypertension Harvard Medical School

Using Electronic Medical Records Data for Clinical Research:  Experience and Practical Implications

Electronic medical records (EMR) systems represent a rich source of clinical data that can be utilized for research, quality assurance, and pay-for-performance, among others. However, it is important to recognize that, like any other data source, EMR data has its own pitfalls that need to be approached in a rigorous fashion. In particular, a large fraction of data in EMR is “locked” in narrative documents and can therefore be especially challenging to extract. This presentation will discuss common flaws in EMR data with a special focus on a systematic approach to using data from narrative electronic documents. The discussion will be illustrated by specific examples of clinical research using EMR data, including narrative text.

Learning Objectives:

1. To understand limitations and caveats of EMR data

2. To learn how to approach development of NLP algorithms

3. To learn how to evaluate NLP algorithms


Tuesday, May 7, 2013:  Presented by Tingjian Ge, PhD, Assistant Professor, Department of Computer Science, UMass Lowell

How Recent Data Management and Mining Research can Benefit Biomedical Sciences

Data management (a.k.a. databases, traditionally) and data mining have been active research topics in Computer Science since the 1960s, both in academia and in the research and development groups of companies (for example IBM Research). In recent years we have seen a surge in this research due to the “big data” trend. On the other hand, various areas in the biomedical sciences are producing increasingly large amount of data due to the prevalence of automatic data-generating devices. It is natural to consider what some of the most recent results from data management and mining can do for the state-of-the-art biomedical research and practice.

In this talk, I will discuss the potential applications of my research in data management and mining to various biomedical studies. They include: (1) complex event detection over correlated and noisy time series data, such as ECG monitoring signals and real-time dietary logs; (2) ranking and pooled analysis of noisy and conflicting data, such as microarray results and emergency medical responses in disaster scenes (e.g., terrorist attacks or earthquakes); and (3) association rule mining on mixed categorical and numerical data, such as the dietary logs, for food recommendation and weight control.


Tuesday, April 16, 2013:  Presented by Jeffrey Bailey, MD, PhD,

Computational Approaches for Analyzing Copy Number Variation and Standing Segmental Duplication

 Segmental duplication represents the key route for the evolution of new genes within an organism.  An regions of duplication are often copy number variant providing increased functional diversity.  Detecting regions of duplication and copy number variation is still a challenge even with hihg-throughput sequencing.  The lecture will review the key methods for identifying duplicated sequence and copy number variant regions within genomic sequence and provide an overview of our laboratory's ongoing work to detect, type and correlate such regions with phenotype particularly vis-a-via malaria.

Tuesday, April 2, 2013:  Presented by Becky Briesacher, PhD

"Offsetting Effects of Medicare Part D on Health Outcomes and Hospitalization?"

This presentation will cover a Medicare Part D policy evaluation and the novel use of time-series and bootstrapping methods. My early results challenge the assumption of the US Congressional Budget Office that Medicare prescription drug costs are offset by medical service savings. I will also describe how we used Pre-Part D data to create simulated post-Part D outcomes. Confidence intervals were constructed using bootstrapping and the test for differences was based on the proportion of simulated values that exceeded/fell below the observed value.

BBriesacher 4/2/2013 Intermediate Level Policy Eval Paper

BBriesacher 4/2/2013 Advanced Methods paper

Tuesday, March 5, 2013:  Presented by David Hoaglin, PhD, Professor, Biostatistics and Health Services Research


"Regressions Gone Wrong: Why Many Reports of Regression Analyses Mislead"

Regression methods play an important role in many analyses: multiple regression, logistic regression, survival models, longitudinal analysis. Surprisingly, many articles and books describe certain results of such analyses in ways that lead readers astray. The talk will examine reasons for these problems and suggest remedies.

Speed 2012 DHoaglin 3/5/13

Making Sense DHoaglin 3/5/13

February 19, 2013:  Presented by Wenjun Li, PhD, Associate Professor, Preventative and Behavioral Medicine

Use of Small Area Health Statistics to Inform and Evaluate Community Health Promotion Programs

This presentation discusses the application of small area estimation methods to identify priority communities for public health intervention programs, to tailor community-specific intervention strategies, and to evaluate the effectiveness at the community level.

2012 Events


December 4, 2012:  Presented by Thomas Houston, MD, MPH Professor and Chief

Comparative Effectiveness Research (CER) Seminar Series -- Pragmatic Clinical Trials (PCT II) (following Bruce Barton's PCT 1 on Sept. 18)

 Dr. Houston will describe a series of cluster-randomized trials where they have used the Internet and informatics to support Interventions for providers and patients. He will also review the PRECIS tool, a way to characterize your pragmatic trials, and the stages of implementation complete (SIC measure) a time-and-milestone-based method to assess success in implementation.

 November 20, 2012:  Presented by Jennifer Tjia, MD, MSCE, Associate Professor of Medicine

Pharmacoepidemiologic Approaches to Evaluate Outcomes of Medication Discontinuation

The self-controlled case series method, or case series method for short, can be used to study the association between an acute event and a transient exposure using data only on cases; no separate controls are needed. The method uses exposure histories that are retro­spectively ascertained in cases to estimate the relative incidence. That is, the incidences of events within risk periods—windows of time during or after experiencing the exposure when people are hypothesized to be at greater risk—relative to the incidences of events within control periods, which includes all time before the case experienced the exposure and after the risk has returned to the baseline value. For many researchers, the main appeal of the self-controlled case series method is the implicit control of fixed confounders. We will discuss the application of this method in pharmacoepidemiologic outcomes studies, and explore the idea of whether this approach offers advantages over more conventional cohort studies when evaluating adverse drug withdrawal events following medication discontinuation. We will use examples from a linked Medicare Part D and Minimum Data Set database to facilitate discussion.

November 6, 2012:  Presented by Molin Wang, PhD, Harvard University

Latency Analysis under the Cox Model when the effect may change over time

We consider estimation and inference for latency in the Cox proportional hazard model framework, where time to event is the outcome. In many public health settings, it is of interest to assess whether exposure effects are subject to a latency period, where the risk of developing disease depending on the exposure level varies over time, perhaps affecting risk only during times near the occurrence of the outcome, or perhaps affecting risk only during times preceding a lag of some duration. Identification of the latency period, if any, is an important aspect of assessing risks of environmental and occupational exposures. For example, in air pollution epidemiology, of interest is often not only the effect of the m-year moving cumulative average air pollution level on risk of all cause mortality, but also point and interval estimation of m itself. In this talk, we will focus on methods for point and interval estimation of the latency period under several models for the timing of exposure which have previously appeared in the epidemiologic literature. Computational methods will be discussed. The method will be illustrated in the study of the timing of the effects of constituents of air pollution on mortality in the Nurses’ Health Study.

October 16, 2012:  Presented by Dr. Sherry Pagoto and Deepk Ganesan, PhD

mHealth-based Behavioral Sensing and Interventions

This presentation will review mHealth and sensing research and methodologies at the UMass Amherst and UMass medical School campuses. We will discuss ongoing research in mobile and on-body sensing to obtain pysiological data in the field, and to design a toolkit for processing such data to derive high quality features, deal with data quality issues (e.g. loose sensors, missing data), and leverage diverse sensor modalities to improve inference quality. To demonstrate the methodologies, we will discuss a recently funded pilot project in which mobile and sensing technology will be used to assess and predict physiological and environmental factors that impact eating behavior. Once eating behavior is predictable with accuracy, interventions will be delivered via technology at the precise moments when individuals are the most likely to overeat. The purpose of this research is to improve the impact of behavioral weight loss interventions.

October 2, 2012:  Presented by Amy Rosen, PhD

Assessing the Validity of the Agency of Healthcare Research and Quality (AHRQ) Patient Safety Indicators (PSIs) in the VA

This presentation will review general patient safety concepts and ways in which patient safety events are identified.  Background on the PSIs will be provided, and a recent multi-faceted validation study that was conducted in the VA to examine both the criterion and attributional validity of the indicators will be presented.  Two questions will be specifically addressed:  1) Do the PSIs Accurately Identify True Safety Events? 2) Are PSI rates associated with structures/processes of care?

PSI 1  PSI 2

September 18, 2012:  Presented by Bruce Barton, PhD

Pragmatic Clinical Trials:  Different Strokes for Different Folks

Pragmatic clinical trials (PCTs) are relatively new on the clinical research scene and are being proposed routinely for NIH funding.  In a sense, PCTs are comparative effectiveness studies on steroids!  This presentation will discuss the concepts behind this new breed of clinical trial, how PCTs differ from the usual randomized clinical trial, and what to be careful of when developing one.  We will review two PCTs as case studies to look at different approaches to the study design.  The references are two of the more recent papers on PCT methodology and approaches.

PCT1 Reference    PCT2 Reference   PCT Slides

July 17, 2012:  Presented by Dianne Finkelstein, PhD; Mass General Hospital and Harvard School of Public Health, Boston, MA

Developing Biostatistics Resources at an Academic Health Center.

Although biostatistics plays an important role in health-related research, biostatistics resources are often fragmented, or ad hoc, or oversubscribed within Academic Health Centers (AHCs).  Given the increasing complexity and quantity of health-related data, the emphasis on accelerating clinical and translational science, and the importance of reproducible research, there is need for the thoughtful development of biostatistics resources with AHCs.  I will be reporting on a recent collaboration of CTSA biostatisticians who identified strategies for developing biostatistics resources in three areas:  (1) recruiting and retaining biostatisticians; (2) using biostatistics resources efficiently; and (3) improving science through biostatistics collaborations.  Ultimately, it was recommended that AHCs centralize biostatistics resources in a unit rather than disperse them across clinical departments, as the former offers distinct advantages to investigator collaborators, biostatisticians, and ultimately to the success of the research and education missions of AHCs.

May 15, 2012: Presented by George Reed, PhD

Modeling disease states using Markov models with covariate dependence and time varying intervals.

An example of modeling transitions among multiple disease states where measurements are not made at fixed and equal time intervals and the primary interest is in factors associated with the transition probabilities. Both first order and higher order Markov models are considered.

May 1, 2012: Presented by Becky Briesacher, PhD

"Medicare Prescription Drug program and Using Part D Data for Research"

In 2006, the Medicare program began offering coverage for prescription drugs, and as of June 2008, Part D data have been available to researchers. This presentation will briefly introduce the audience to the Medicare Part D program and Part D data for research purposes. The presentation will include personal reflections on becoming a drug policy researcher and excerpts from my own program evaluation research.

April 17, 2012: Presented by David Hoaglin, PhD

"Indirect Treatment Comparisons and Network Meta-Analysis: Relative Efficacy and a Basis for Comparative Effectiveness"

Evidence on the relative efficacy of two treatments may come from sets of trials that compared them directly (head to head); but often one must rely on indirect evidence, from trials that studied them separately with a common comparator (e.g., placebo) or from a connected network of treatments. The talk will review basic meta-analysis, discuss steps and assumptions in network meta-analysis, and comment on applications to comparative effectiveness

  • ISPOR States Its Position on Network Meta-Analysis
  • Conducting Indirect-Treatment-Comparison and Network-Meta-Analysis Hoaglin 2011 ViH
  • Appendix: Examples of Bayesian Network Hoaglin Appen 2011
  • Jansen 2011 ViH
  • Luce 2010 Millbank

March 20, 2012: Presented by Thomas English,PhD

"Using Allscripts Data at UMass for Clinical Research"

I will discuss work that I have done that has been enabled by EHRs. This should give an idea of how the current EHR at UMass could help your research.

February 28, 2012: Presented by Nancy Baxter, MD, PhD, FRCSC, FACRS

"Room for Improvement in Quality Improvement"

In most circumstances in clinical medicine randomized clinical proving efficacy are required before widespread adoption of interventions. However in the area of quality improvement many strategies have been implemented with little supporting evidence. Why is this, and why worry? These are topics that will be explored in my presentation.

February 21, 2012: Presented by Stephen Baker, MScPH

"Sequentially Rejective Procedures for Multiple Comparisons in Genome Wide Association Studies (GWAS)"

The problem of additive type I error due to multiple comparisons has been well known for many years, however with the introduction of microarrays and other technologies it has become one of the central problems in data analysis in molecular biology. Sequential testing procedures have been popular but have limitations with these new technologies. I will discuss some popular methods, some new ones and illustrate them with microarray data for associating gene expression with disease status.

February 7, 2012: Presented by Arlene Ash, PhD

Risk Adjustment Matters

What variables should be included, and how, in models designed to either detect differences in quality among providers with very different “case-mix” or to isolate the effect of some patient characteristic on outcome? What role does the purpose of the modeling effort play? What are the consequences of different modeling choices? What does “do no harm” mean for a statistical analyst?

January 17, 2012: Presented by Zhiping Weng

Computational Identification of Transposon Movement With Whole Genome Sequencing

Transposons evolve rapidly and can mobilize and trigger genetic instability. In Drosophila melanogaster, paternally inherited transposons can escape silencing and trigger a hybrid sterility syndrome termed hybrid dysgenesis. We developed computational methods to identify transposon movement in the host genome and uncover heritable changes in genome structure that appear to enhance transposon silencing during the recovery to hybrid dysgenesis.

2011 Events

December 20, 2011: Presented by Jacob Gagnon, PhD

Gene Set Analysis Applied to a Leukemia Data Set

Gene set analysis allows us to determine which groups of genes are differentially expressed when comparing two subtypes of a given disease. We propose a logistic kernel machine approach to determine the gene set differences between B-cell and T-cell Acute Lymphocytic Leukemia (ALL). Compared to previous work, our method has some key advantages: 1) our hypothesis testing is self-contained rather than being competitive, 2) we can model gene-gene interactions and complex pathway effects, and 3) we test for differential expression adjusting for clinical covariates. Results from simulation studies and from an application of our methods to an ALL dataset will be discussed.

December 14, 2011: Presented by Yunsheng Ma, MD, PhD

Determinants of Racial/Ethnic Disparities in Incidence of Clinical Diabetes in Postmenopausal Women in the United States: The Women’s Health Initiative 1993- 2009

Although racial/ethnic disparities in diabetes risk have been identified, determinants of these differences have not been well-studied. Previous studies have considered dietary and lifestyle factors individually, but few studies have considered these factors in aggregate in order to estimate the proportion of diabetes that might be avoided by adopting a pattern of low-risk behaviors. Using data from the Women’s Health Initiative, we examined determinants of racial/ethnic differences in diabetes incidence.

  • This paper, “Diet, lifestyle, and the risk of type 2 diabetes mellitus in women", by Hu et al., presented ways to analyze diabetes risk factors in aggregate in order to estimate the proportion of diabetes that might be avoided by adopting a pattern of low-risk behaviors.

    Technical Level


    Focus Application
    Data Nurses’ Health Study
    Methods Cox proportional hazards models

November 9, 2011: Presented by: Nanyin Zhang, Ph.D.

In the presentation I will Introduce the fundamental mechanisms of fMRI. I will also talk about potential applications of fMRI in understanding different mental disorders.

  • Article #1 (Functional Connectivity and Brain Networks in Schizophrenia), by Mary-Ellen Lynall et. al., tested the hypothesis that Schizophrenia is a disorder of connectivity between components of large-scale brain networks by measuring aspects of both functional connectivity and functional network topology derived from resting-state fMRI time series acquired at 72 cerebral regions over 17 min from 15 healthy volunteers (14 male, 1 female) and 12 people diagnosed with schizophrenia (10 male, 2 female).

    Technical Level


    Focus Application
    Data Real
    Methods Proof
  • Article #2 (Hyperactivity and hyperconnectivity of the default network in schizophrenia and in first-degree relatives of persons with schizophrenia), by Susan Whitfield-Gabrieli, examined the status of the neural network mediating the default mode of brain function in patients in the early phase of schizophrenia and in young first-degree relatives of persons with schizophrenia.

    Technical Level


    Focus Application
    Data Real
    Methods Proof

October 18, 2011: Presented by: Bruce A. Barton, Ph.D.

The Continuing Evolution of Randomized Clinical Trials – the Next Steps: Continuing the discussion initiated by Wenjun Li, Ph.D., at the April QHS/QMC Methods Workshop (“Role of Probability Sampling in Clinical and Population Health Research”), this workshop will discuss some proposed designs for randomized clinical trials (RCTs) which provide partial answers to some of the problems with the current design of RCTs – as well as possible next evolutionary steps in RCT design to better address the primary issues of patient heterogeneity and of generalizability of results.

September 20, 2011: Presented by Zi Zhang, MD, MPH

Using Address-Based Sampling (ABS) to Conduct Survey Research -

The Traditional random-digital-dial (RDD) approach for telephone surveys has become more problematic due to landline erosion and coverage bias. Dual-sample frame method employing both landlines and cell phones is costly and complicated. We will discuss the use of the U.S. Postal Service Deliver Sequence File as an alternative sampling source in survey research. We will focus on sample coverage and response rate in reviewing this emerging approach.

July 19, 2011: Presented by: Jennifer Tjia, MD, MSCE:

Addressing the issue of channeling bias in observational drug studies

Channeling occurs when drug therapies with similar indications are preferentially prescribed to groups of patients with varying baseline prognoses. In this session, we wil discuss the phenomenon of channeling using a specific example from the Worcester Heart Attack Study.

June 21, 2011: Presented by Mark Glickman, PhD:

Multiple Testing: Is Slicing Significance Levels Producing Statistical Bologna?

Procedures for adjusting significance levels when performing many hypothesis tests are commonplace in health/medical studies. Such procedures, most notably the Bonferroni adjustment, control for study-wide false positive rates, and recognize that the probability of a single false positive result increases with the number of tests. In this talk we establish, in contrast to common wisdom, that significance level adjustments based on the number of tests performed are, in fact, unreasonable procedures, and lead to absurd conclusions if applied consistently. We argue that confusion may exist between an increased number of tests being performed with a low (prior) probability of each null hypothesis being true. This confusion may lead to the unwarranted multiplicity adjustment. We finally demonstrate how false discovery rate adjustments are a more principled approach to significance level adjustments in health and medical studies.

April 19, 2011: Presented by: Wenjun Li, PhD:

Role of Probability Sampling in Clinical and Population Health Research

This workshop uses practical examples to illustrate the use of probability sampling of RCT and population health studies. The approach is used to optimize the generalizability of, increase statistical power and add values to the collected data by preserving the possibility of sub-group analysis.

March 15, 2011:

The Peters-Belson Approach to study Health Disparities: Application to the National Health Interview Survey

This workshop will discuss cancer screening rates varyingly substantially by race/ethnicity, and identifying factors that contribute to this disparity between the minority groups and the white majority should aid in designing successful programs. The traditional approach for examining the role of race/ethnicity is to include a categorical variable, indicating minority status, in a regression-type model, whose coefficient estimates this effect. We applied the Peters- Belson(PB) approach, used in wage discrimination studies, to analyze disparities in cancer screening rates between different race/ethnic groups from the 1998 National Health Interview Survey (NHIS), and to decompose the difference into a component due to differences in the covariate values in the two groups and a residual difference. Regression model was estimated accounting for the complex sample design. Variances were estimated by the jackknife method where a single primary sampling unit was considered as the deleted group and compared to analytic variances derived from Taylor linearization. We found that among both men and women, most of the disparity in colorectal cancer screening and digital rectal exam rates between whites and blacks was explained by the covariates but the same was not true for the disparity between whites and Hispanics.

Dr Rao also would like to suggest a book for anyone who wants to analyze national surveys.  This is "Analysis of Health Surveys" by Korn EL and Graubard BI.  It was published by Wiley, New York, NY in 1999.

February 15, 2011:

Multivariable Modeling Strategies: Uses and Abuses

This workshop will be hosted by George Reed, PhD and will discuss regression modeling strategies including predictor complexity and variable selection.  The workshop will examine the flaws and uses of methods like stepwise procedures, and discuss how modeling strategies should be tailored to particular problems.

Dr Reed would also like to recommend Chapter 4 from the book, "Frank Harrell's regression modeling strategies."

“REGRESSION MODELING STRATEGIES: Chapter 4: ‘Multivariable Modeling Strategies’ by Frank E. Harrell, Jr. Copyright 2001 by Springer. Reprinted by permission of Springer via the Copyright Clearance Center’s Annual Academic Copyright License.”

November 16, 2010:

Bootstrapping: A Nonparametric Approach to Statistical Inference

This workshop will discuss analytic approaches to situations where the sampling distribution of a variable is not known and cannot be assumed to be normal.  Bootstrap resampling is a feasible alternative to conventional nonparametric statistics and can also be used to estimate the power of a comparison.

October 19, 2010:

Propensity Score Analyses, Part II

Last month's workshop spent a lot of time on the propensity score (PS) "basics" and ended with a rather hurried discussion of what variables do and don't belong in a PS model.  This month we will address a range of more advanced issues, including the previously promised discussion of why and when it may not be a good idea to include "all available" variables in a PS analysis, and the pros and cons of PS matching vs. weighting vs. covariate adjustment.

September 21, 2010:

Propensity Score Analyses, Part I

This meeting will discuss the separate roles of propensity scores and instrumental variables. Time permitting, we will explore implementation issues in constructing propensity score models.

  • Analyzing Observational Data:  Focus on Propensity Scores (Powerpoint presentation by Arlene Ash, PhD)
  • This draft article, Observational Studies in Cardiology , by Marcus et al provides a fairly straightforward, non-technical "review of three statistical approaches for addressing selection bias: propensity score matching, instrumental variables, and sensitivity analyses. There are many other places where such issues are discussed.

    Technical Level


    Focus Application
    Data Real
    Methods Case Study
  • This paper, "Variable Selection for Propensity Score Models", by Brookhart et al., presented "the results of two simulation studies designed to help epidemiologists gain insight into the variable selection Problem" in a propensity score analysis.

    Technical Level


    Focus Theory
    Data Simulated
    Methods Simulation
▴ Back To Top
Section Menu To Top