This repository contains the paftools
Python module which provides functionality for recovery and data analysis of target capture data in the PAFTOL Project. Some of this functionality will be applicable and hopefully found to be useful beyond PAFTOL, e.g. for processing HybSeq data in general. Functionality can be accessed either via the paftools
script or the Python API.
The documentation provided here is intended to help beta-testers with getting started.
Description and examples of usage are provided in our usage and tutorial readme.
Building and installing the paftol
module requires
- Python 2.7.x (Python 3.x is currently not supported)
- Python 2.7.x development package (
libpython-all-dev
) - BioPython 1.66 (newer versions are likely to work as well) (
python-biopython
, and alsopython-biopython-sql
) - setuptools (
python-setuptools
) - epydoc (
python-epydoc
) - GNU C compiler (
gcc
) and associated tools - GNU make
The following bioinformatics applications and suites are required for full functionality of the module and the paftools
script:
- Trimmomatic - to use Trimmomatic via Paftools a little shell script is required called
trimmomatic
that needs to be available from the command line
#! /bin/bash
args=$@
java -jar <FULL_PATH_TO>/trimmomatic-0.39.jar ${args[@]}
- blast
- spades
- samtools
- bwa
- exonerate
- mafft
- clustalo (aka clustal-omega)
- emboss
- embassy-phylip
- fastqc (currently exactly version 0.11.5 is required)
Additional prerequisites for PAFTOL internal use include:
- Python
mysql.connector
These prerequisites should generally be provided on the cluster.
- Clone the repository
git clone https://github.com/RBGKew/pypaftol
- Install by running the command
make hinstall
This will install the package in $HOME/lib/python
, which is the standard directory for installing Python modules for use in your
account only. You'll need to ensure that your PYTHONPATH
environment variable includes this directory, see Tips section below.
- Check that the installation was successful by running
paftools -h
This should give you a help message listing the paftools
subcommands
currently available.
- If you like a HTML version of the APIs provided by the
paftools
package and its subpackages, run the command
make doc
At the time of writing this README
, this installation process works on the cluster. Sharing any feedback is very welcome, of course.
Additional information about installation are available [here][Advanced_Install.md]
PYTHONPATH
is an environment variable which the Python interpreter uses to obtain a list of directories to search for modules when
executing an import
statement. By default, this variable won't include any directories in your login directory, so if you want to
install any modules in your personal space, you'll need to add the directory where you install modules for your personal use. This can be
done by the following snippet of bash code:
if test -z "$PYTHONPATH" ; then
PYTHONPATH=${HOME}/lib/python
else
PYTHONPATH="${HOME}/lib/python:${PYTHONPATH}"
fi
export PYTHONPATH
- identify the (mandatory) parameters and options required by
foo
- write a function
addFooParser
that takes an argparse parser as an argument and adds the relevant parameters and options to that - write a
runFoo
function that takes a singleargNamespace
parameter, containing the argument namespace generated by the parser, and uses the attributes in that namespace to execute the command - finally, wire everything up by calling
addFooParser
inpaftoolsMain
and by callingp.add_default(func=runFoo)
on the subparser