IMPC Extraction, Transformation and Loading process to generate the data that supports mousephenotype.org among with other internal processes.
Download the latest release package from the releases page and decompress it. Then submit your job to your Spark 2 cluster using:
spark-submit --py-files impc_etl.zip,libs.zip main.py
-
Install Spark 2+ and remember to set the
SPARK_HOME
environment variable. -
Fork this repo and then clone your forked version:
git clone https://github.com/USERNAME/impc-etl.git cd impc-etl
-
Run make to create a venv in the
./.venv
path and install the development dependencies on it:make devEnv
-
Use your favorite IDE to make your awesome changes and make sure the project is pointing to the venv generated. To do that using Pycharm fo to the instructions here.
-
Then update and run the unit tests:
make test
-
Run pylint to be sure that we are using the best practices:
make lint
-
And finally commit and push your changes to your fork and the make a pull request to the original repo when you are ready to go. Another member of the team will review your changes and after having two +1 you will be ready to merge them to the base repo.
In order to sync your forked local version with the base repo you need to add an upstream remote:
git remote add upstream https://github.com/mpi2/impc-etl.git
Please procure to have your version in sync with the base repo to avoid merging hell.
git fetch upstream git checkout master git merge upstream/master git push origin master
pdoc --html --force --template-dir docs/templates -o docs impc_etl