This directory contains iPython notebooks to reproduce some phenotypes used to train and evaluate the PRS release models used to generate the PRS in the UK Biobank PRS Release.*
Details on these phenotypes can be found in the Thompson et al publication https://www.medrxiv.org/content/10.1101/2022.06.16.22276246v1.
These notebooks can be run from within a DNANexus project to generate phenotypes that are resolved to your unique participant identifiers.
*Please note that these notebooks many not recapitulate the exact phenotypes used in the UKB PRS Release if the backing data changes (for example if UKB update data in a new release, or if samples are withdrawn)
Before attempting to run a pheconstructor file, please ensure that you have a DNANexus project prepared with the UKB data dispensed to a spark dataset.
- Select the file corresponding to either a disease phenotype (
disease_pheconstructor.py
) or a quantitative trait phenotype (qt_pheconstructor.py
)and convert this file to a Jupyter notebook usingjupytext
. This can be done via the command line with the commandjupytext --to ipynb [disease|qt]_pheconstructor.py
- Upload the notebook to your DNANexus project with
dx upload [disease|qt]_pheconstructor.ipynb
- Upload the YAML of valid traits to your DNANexus project with
dx upload [diseases|qt].yaml
- Start a JupyterLab instance from the DNANexus web interface, ensuring that it is running on a Spark cluster
- Open the pheconstructor notebook and replace the
TRAIT_CODE
variable in the first Python cell of this notebook with the trait of interest - Replace the
'<dataset_id>.dataset'
string in the first Python cell with the ID of your UKB dataset - Run the notebook - this should output the phenotype data to a CSV file named
<TRAIT_CODE>_phenotype.csv
, which is compatible with the UKB-PRET tool. If the Spark queries in the notebook fail, try increasing the number of cores and/or processing power of your instance
*For instruction on running your phenotype with ukb-pret
in the DNANexus Research Analysis Platform, please refer to
project README