Associations between
Single Nucleotide Polymorphisms
& Unexplained Lung Cancer Risk
in the ARIC Study Dataset

MPH Epidemiology-Biostatistics Concentration
Research In Progress Meeting
11/30/2017

Martin Skarzynski
Capstone Mentor: Prof. Elizabeth Platz
Johns Hopkins School of Public Health

Atherosclerosis Risk in Communities Study (ARIC) Dataset

Cancer types with highest primary cancer incidence and cancer mortality 1987-2012 among 14,735 at risk ARIC participants; 8,028 females, 6,707 males

Site Incidence Mortality
Colon 364 109
Lung and bronchus 748 526
Hematopoietic/lymphatic 378 177
Melanoma 130 14
Breast (female & male) 696 112
Prostate 887 91
Kidney 178 42
Bladder 234 36

Can SNPs explain some variance left after taking into account known risk factors?

Everything I need is on the cluster:

  • ARIC epidemiologic & genomic data
  • BASH & R scripts to work with the data
  • Compute resources
In [1]:
candy_store = "Joint High Performance Computing Exchange (JHPCE)"
kid = "Martin Skarzynski"
kid in candy_store
Out[1]:
False

Working on getting access to the cluster and the dataset :)

Thanks for listening!

Slides: https://marskar.github.io/jupyter-notebook-slides
Code: https://github.com/marskar/jupyter-notebook-slides