Installation

This guide walks you through installing the Sanger DNA Damage Analysis Pipeline on your system.

Note

About This Tool

Before installing, please note that this pipeline is designed for sample prioritization and preliminary screening, not for definitive ancient DNA authentication. The tool helps identify promising samples for follow-up NGS analysis based on damage patterns, insert size, and haplogroup assessment.

📋 Prerequisites

System Requirements

  • Python: 3.8 or higher

  • Operating System: Linux, macOS, or Windows

  • Memory: Minimum 4GB RAM (8GB+ recommended for large datasets)

  • Storage: 1GB free space for installation + space for your data

Required External Tools

The pipeline requires several external bioinformatics tools:

MAFFT (Multiple Sequence Alignment)

Ubuntu/Debian:

sudo apt update
sudo apt install mafft

macOS (using Homebrew):

brew install mafft

Windows:

Download from MAFFT website and follow installation instructions.

Verify Installation:

mafft --version

Bio Module Dependencies

The pipeline uses BioPython which should be installed automatically, but you may need additional system packages:

Ubuntu/Debian:

sudo apt install python3-dev python3-pip

macOS:

# Usually pre-installed with Python

🚀 Installation Methods

Method 2: Direct Installation from Source

For users who want a cleaner installation without development files:

  1. Download and Extract

    # Download the latest release
    wget https://github.com/allyssonallan/sanger_adna_damage/archive/main.zip
    unzip main.zip
    cd sanger_adna_damage-main
    
  2. Install

    pip install -r requirements.txt
    pip install .
    

🔧 Configuration Setup

  1. Copy Default Configuration

    # Copy the default configuration to a working directory
    cp config/default_config.yaml my_config.yaml
    
  2. Edit Configuration (Optional)

    # Edit with your preferred editor
    nano my_config.yaml
    
  3. Verify Configuration

    python -m src.sanger_pipeline.cli.main status --config my_config.yaml
    

📁 Directory Structure Setup

Create your working directories:

# Create project structure
mkdir -p my_sanger_project/{input,output,config}

# Copy configuration
cp config/default_config.yaml my_sanger_project/config/

# Your directory structure should look like:
my_sanger_project/
├── input/          # Place your AB1 files here
├── output/         # Pipeline results will go here
└── config/         # Configuration files

🧪 Testing Installation

Test with Sample Data

# Create test directories
mkdir -p test_run/{input,output}

# If you have sample AB1 files, place them in test_run/input/
# Then run a test analysis:

python -m src.sanger_pipeline.cli.main run-pipeline \\
    --input-dir ./test_run/input \\
    --output-dir ./test_run/output \\
    --config ./config/default_config.yaml

Run Unit Tests

# Run the test suite (if available)
python -m pytest tests/

Verify All Components

# Check external tools
mafft --version

# Check Python modules
python -c "import Bio; print(f'BioPython version: {Bio.__version__}')"

# Check pipeline modules
python -c "from src.sanger_pipeline.core.pipeline import SangerPipeline; print('Pipeline import successful')"

# Generate help to verify CLI
python -m src.sanger_pipeline.cli.main --help

🔍 Troubleshooting Installation

Common Issues

ImportError: No module named ‘Bio’

pip install biopython

MAFFT command not found

Make sure MAFFT is installed and in your PATH:

which mafft  # Should show the path to mafft
echo $PATH   # Check if mafft directory is in PATH

Permission denied errors

Use virtual environments or install with –user flag:

pip install --user -r requirements.txt

Python version issues

Check your Python version:

python --version  # Should be 3.8+

Virtual Environment Issues

If you have problems with virtual environments:

# Remove existing environment
rm -rf venv

# Create new environment with explicit Python version
python3.8 -m venv venv  # or python3.9, python3.10, etc.

source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

💡 Development Setup

For developers who want to contribute:

  1. Fork the Repository on GitHub

  2. Clone Your Fork

    git clone https://github.com/YOUR_USERNAME/sanger_adna_damage.git
    cd sanger_adna_damage
    
  3. Set Up Development Environment

    # Create development environment
    python3 -m venv dev_env
    source dev_env/bin/activate
    
    # Install development dependencies
    pip install -r requirements.txt
    pip install -e .
    
    # Install development tools (if requirements-dev.txt exists)
    pip install -r requirements-dev.txt
    
  4. Set Up Pre-commit Hooks (if available)

    pre-commit install
    

🎯 Next Steps

Once installation is complete:

  1. Read the Quick Start Guide: Quick Start Guide

  2. Configure the Pipeline: Configuration

  3. Follow Tutorials: Tutorials

  4. Run Your First Analysis: How-To Guides

If you encounter any issues during installation, please check the Troubleshooting guide or open an issue on GitHub.