CLI Reference¶
Complete reference for the Sanger DNA Damage Analysis Pipeline command-line interface.
🖥️ Overview¶
The pipeline provides a comprehensive command-line interface (CLI) for all operations. All commands are accessed through the main CLI module:
python -m src.sanger_pipeline.cli.main [COMMAND] [OPTIONS]
📋 Available Commands¶
run-pipeline¶
Run the complete analysis pipeline from AB1 files to final results.
Syntax:
python -m src.sanger_pipeline.cli.main run-pipeline [OPTIONS]
Options:
Option |
Type |
Description |
---|---|---|
|
PATH |
Required. Directory containing AB1 files |
|
PATH |
Required. Directory for pipeline outputs |
|
PATH |
Configuration file path (default: config/default_config.yaml) |
|
INTEGER |
Override quality threshold from config |
|
INTEGER |
Override minimum sequence length from config |
|
FLAG |
Overwrite existing output directory |
|
FLAG |
Show what would be processed without running |
|
FLAG |
Enable verbose logging |
|
FLAG |
Show help message |
Examples:
# Basic usage
python -m src.sanger_pipeline.cli.main run-pipeline \
--input-dir ./ab1_files \
--output-dir ./results
# With custom configuration
python -m src.sanger_pipeline.cli.main run-pipeline \
--input-dir ./ab1_files \
--output-dir ./results \
--config ./custom_config.yaml
# Override quality threshold
python -m src.sanger_pipeline.cli.main run-pipeline \
--input-dir ./ab1_files \
--output-dir ./results \
--quality-threshold 25
# Dry run to check what will be processed
python -m src.sanger_pipeline.cli.main run-pipeline \
--input-dir ./ab1_files \
--output-dir ./results \
--dry-run
Return Codes:
0
: Success1
: General error2
: Invalid arguments3
: Input files not found4
: Configuration error
generate-report¶
Generate interactive HTML QC reports from pipeline outputs.
Syntax:
python -m src.sanger_pipeline.cli.main generate-report [OPTIONS]
Options:
Option |
Type |
Description |
---|---|---|
|
PATH |
Required. Directory containing pipeline outputs |
|
PATH |
Directory for report files (default: output-dir/reports) |
|
FLAG |
Open report in browser after generation |
|
TEXT |
Custom report title |
|
PATH |
Custom HTML template file |
|
CHOICE |
Report format: html, json (default: html) |
|
FLAG |
Show help message |
Examples:
# Generate report and open in browser
python -m src.sanger_pipeline.cli.main generate-report \
--output-dir ./results \
--open-browser
# Custom report location and title
python -m src.sanger_pipeline.cli.main generate-report \
--output-dir ./results \
--report-dir ./custom_reports \
--title "Ancient DNA Analysis Report"
# JSON format for programmatic access
python -m src.sanger_pipeline.cli.main generate-report \
--output-dir ./results \
--format json
analyze-damage¶
Perform ancient DNA damage analysis on sequences.
Syntax:
python -m src.sanger_pipeline.cli.main analyze-damage [OPTIONS]
Options:
Option |
Type |
Description |
---|---|---|
|
PATH |
Required. Directory containing FASTA sequences |
|
PATH |
Required. Directory for damage analysis results |
|
PATH |
Configuration file path |
|
FLOAT |
P-value threshold for significance (0.0-1.0) |
|
INTEGER |
Bootstrap iterations (1000-100000) |
|
PATH |
Reference sequence file |
|
FLAG |
Show help message |
Examples:
# Basic damage analysis
python -m src.sanger_pipeline.cli.main analyze-damage \
--input-dir ./results/final \
--output-dir ./damage_results
# With custom parameters
python -m src.sanger_pipeline.cli.main analyze-damage \
--input-dir ./results/final \
--output-dir ./damage_results \
--threshold 0.01 \
--iterations 50000
status¶
Check pipeline status and output summary.
Syntax:
python -m src.sanger_pipeline.cli.main status [OPTIONS]
Options:
Option |
Type |
Description |
---|---|---|
|
PATH |
Pipeline output directory to check |
|
PATH |
Original input directory |
|
PATH |
Configuration file used |
|
FLAG |
Show detailed per-file status |
|
FLAG |
Output status in JSON format |
|
FLAG |
Show help message |
Examples:
# Basic status check
python -m src.sanger_pipeline.cli.main status \
--output-dir ./results
# Detailed status with original inputs
python -m src.sanger_pipeline.cli.main status \
--output-dir ./results \
--input-dir ./ab1_files \
--detailed
# JSON output for scripts
python -m src.sanger_pipeline.cli.main status \
--output-dir ./results \
--json
validate¶
Validate configuration files and check system requirements.
Syntax:
python -m src.sanger_pipeline.cli.main validate [OPTIONS]
Options:
Option |
Type |
Description |
---|---|---|
|
PATH |
Configuration file to validate |
|
FLAG |
Check external dependencies (MAFFT, etc.) |
|
PATH |
Validate input directory |
|
FLAG |
Show help message |
Examples:
# Validate configuration
python -m src.sanger_pipeline.cli.main validate \
--config ./my_config.yaml
# Check all dependencies
python -m src.sanger_pipeline.cli.main validate \
--check-deps
# Validate input directory
python -m src.sanger_pipeline.cli.main validate \
--check-input ./ab1_files
convert¶
Convert AB1 files to FASTA format only.
Syntax:
python -m src.sanger_pipeline.cli.main convert [OPTIONS]
Options:
Option |
Type |
Description |
---|---|---|
|
PATH |
Required. Directory containing AB1 files |
|
PATH |
Required. Directory for FASTA outputs |
|
FLAG |
Apply quality filtering during conversion |
|
INTEGER |
Quality threshold for filtering (default: 20) |
|
FLAG |
Show help message |
Examples:
# Simple conversion
python -m src.sanger_pipeline.cli.main convert \
--input-dir ./ab1_files \
--output-dir ./fasta_files
# With quality filtering
python -m src.sanger_pipeline.cli.main convert \
--input-dir ./ab1_files \
--output-dir ./fasta_files \
--quality-filter \
--quality-threshold 25
🔧 Global Options¶
These options work with all commands:
Option |
Type |
Description |
---|---|---|
|
FLAG |
Show pipeline version |
|
FLAG |
Show help for command |
|
FLAG |
Enable verbose output |
|
FLAG |
Suppress non-error output |
|
PATH |
Write logs to file |
|
FLAG |
Show configuration help |
Examples:
# Check version
python -m src.sanger_pipeline.cli.main --version
# Get help for any command
python -m src.sanger_pipeline.cli.main run-pipeline --help
# Verbose logging to file
python -m src.sanger_pipeline.cli.main run-pipeline \
--verbose \
--log-file ./pipeline.log \
--input-dir ./input \
--output-dir ./output
📝 Configuration via CLI¶
Many configuration parameters can be overridden via command line:
Quality Control Overrides:
python -m src.sanger_pipeline.cli.main run-pipeline \
--quality-threshold 25 \
--min-length 75 \
--input-dir ./input \
--output-dir ./output
Damage Analysis Overrides:
python -m src.sanger_pipeline.cli.main analyze-damage \
--threshold 0.01 \
--iterations 50000 \
--input-dir ./sequences \
--output-dir ./damage
🔄 Chaining Commands¶
Commands can be chained for custom workflows:
# Step-by-step processing
# 1. Convert AB1 to FASTA
python -m src.sanger_pipeline.cli.main convert \
--input-dir ./ab1_files \
--output-dir ./fasta_files
# 2. Run full pipeline on converted files
python -m src.sanger_pipeline.cli.main run-pipeline \
--input-dir ./ab1_files \
--output-dir ./results
# 3. Generate report
python -m src.sanger_pipeline.cli.main generate-report \
--output-dir ./results \
--open-browser
# 4. Check status
python -m src.sanger_pipeline.cli.main status \
--output-dir ./results
📊 Exit Codes¶
All commands return standard exit codes:
Code |
Meaning |
---|---|
|
Success - command completed without errors |
|
General error - something went wrong during execution |
|
Invalid arguments - check command syntax and options |
|
Input error - files not found or invalid input data |
|
Configuration error - invalid or missing configuration |
|
Dependency error - external tools not found or not working |
|
Output error - cannot write to output directory |
Using exit codes in scripts:
#!/bin/bash
python -m src.sanger_pipeline.cli.main run-pipeline \
--input-dir ./input \
--output-dir ./output
if [ $? -eq 0 ]; then
echo "Pipeline completed successfully"
python -m src.sanger_pipeline.cli.main generate-report \
--output-dir ./output \
--open-browser
else
echo "Pipeline failed with exit code $?"
exit 1
fi
🌍 Environment Variables¶
The CLI respects several environment variables:
Variable |
Description |
---|---|
|
Default configuration file path |
|
Default output directory |
|
Default quality threshold |
|
Temporary directory for processing |
|
MAFFT installation directory |
Using environment variables:
# Set default configuration
export SANGER_CONFIG=/path/to/my/config.yaml
# Set default output location
export SANGER_OUTPUT_DIR=/data/sanger_results
# Run with environment defaults
python -m src.sanger_pipeline.cli.main run-pipeline \
--input-dir ./input
🔍 Debugging and Troubleshooting¶
Enable verbose output:
python -m src.sanger_pipeline.cli.main run-pipeline \
--verbose \
--input-dir ./input \
--output-dir ./output
Save logs to file:
python -m src.sanger_pipeline.cli.main run-pipeline \
--log-file ./debug.log \
--input-dir ./input \
--output-dir ./output
Dry run to check inputs:
python -m src.sanger_pipeline.cli.main run-pipeline \
--dry-run \
--input-dir ./input \
--output-dir ./output
Validate before running:
# Check configuration
python -m src.sanger_pipeline.cli.main validate \
--config ./my_config.yaml \
--check-deps \
--check-input ./input
📝 Scripting Examples¶
Bash script for automated processing:
#!/bin/bash
# automated_analysis.sh
INPUT_DIR="$1"
OUTPUT_DIR="$2"
CONFIG_FILE="${3:-config/default_config.yaml}"
# Validate inputs
python -m src.sanger_pipeline.cli.main validate \
--config "$CONFIG_FILE" \
--check-input "$INPUT_DIR"
if [ $? -ne 0 ]; then
echo "Validation failed"
exit 1
fi
# Run pipeline
python -m src.sanger_pipeline.cli.main run-pipeline \
--input-dir "$INPUT_DIR" \
--output-dir "$OUTPUT_DIR" \
--config "$CONFIG_FILE" \
--verbose
# Generate report if pipeline succeeded
if [ $? -eq 0 ]; then
python -m src.sanger_pipeline.cli.main generate-report \
--output-dir "$OUTPUT_DIR" \
--open-browser
# Show final status
python -m src.sanger_pipeline.cli.main status \
--output-dir "$OUTPUT_DIR" \
--detailed
fi
Python script for batch processing:
#!/usr/bin/env python3
# batch_process.py
import subprocess
import sys
from pathlib import Path
def run_pipeline(input_dir, output_dir, config_file):
"""Run pipeline with error handling"""
cmd = [
sys.executable, "-m", "src.sanger_pipeline.cli.main",
"run-pipeline",
"--input-dir", str(input_dir),
"--output-dir", str(output_dir),
"--config", str(config_file)
]
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode == 0:
print(f"✓ Successfully processed {input_dir}")
return True
else:
print(f"✗ Failed to process {input_dir}: {result.stderr}")
return False
# Process multiple directories
base_dir = Path("./samples")
output_base = Path("./results")
config = Path("./config/default_config.yaml")
for sample_dir in base_dir.iterdir():
if sample_dir.is_dir():
output_dir = output_base / sample_dir.name
run_pipeline(sample_dir, output_dir, config)
🆕 Enhanced Quality Control Tools¶
Added in version 2.0.
The enhanced quality control tools provide advanced processing for ancient DNA samples.
enhanced_hsd_converter.py¶
Applies comprehensive quality control to pipeline outputs.
Syntax:
python enhanced_hsd_converter.py
Description:
This tool automatically applies enhanced quality control to the most recent pipeline output:
Combines consensus sequences from the pipeline output directory
Cleans sequences using aDNA-specific algorithms
Converts to HSD format with quality filtering
Performs diversity analysis with comprehensive reporting
Output Files:
{output}_final_cleaned.fasta
: Cleaned consensus sequences{output}_final_high_quality.hsd
: High-quality HSD fileConsole output with diversity analysis report
Quality Control Features:
Artifact Removal: Eliminates common aDNA artifacts
Quality Filtering: 70% quality threshold by default
Diversity Analysis: Comprehensive genetic diversity assessment
Sample Prioritization: Identifies highest-quality samples
Manual Quality Control Tools¶
For advanced users who want to run quality control steps individually:
aDNA Sequence Cleaner¶
from sanger_pipeline.utils.adna_sequence_cleaner import aDNASequenceCleaner
cleaner = aDNASequenceCleaner(min_length=50, min_quality=0.6)
cleaned_sequences = cleaner.clean_fasta_file("input.fasta", "cleaned.fasta")
Improved HSD Converter¶
from sanger_pipeline.utils.improved_fasta_to_hsd_converter import ImprovedFastaToHSDConverter
converter = ImprovedFastaToHSDConverter(min_quality_threshold=0.7)
converter.convert_fasta_to_hsd("cleaned.fasta", "output.hsd")
HSD Diversity Analyzer¶
from sanger_pipeline.utils.hsd_diversity_analyzer import HSDDiversityAnalyzer
analyzer = HSDDiversityAnalyzer()
samples = analyzer.parse_hsd_file("output.hsd")
diversity_report = analyzer.analyze_diversity(samples)
This comprehensive CLI reference covers all available commands and options for the Sanger DNA Damage Analysis Pipeline, including the new enhanced quality control features. Use it as a quick reference while working with the pipeline, or for developing automated workflows and scripts.