Troubleshooting¶
Common issues, solutions, and debugging strategies for the Sanger DNA Damage Analysis Pipeline.
🚨 Quick Diagnostic Commands¶
Before diving into specific issues, run these diagnostic commands:
# Check pipeline installation
python -c "from src.sanger_pipeline.core.pipeline import SangerPipeline; print('✓ Pipeline imported successfully')"
# Check external dependencies
mafft --version
# Validate your configuration
python -m src.sanger_pipeline.cli.main validate --config your_config.yaml --check-deps
# Check input files
python -m src.sanger_pipeline.cli.main validate --check-input ./your_input_dir
🔧 Installation and Setup Issues¶
Pipeline Won’t Import¶
Error: ModuleNotFoundError: No module named 'src.sanger_pipeline'
Solutions:
Check Installation:
# Verify you're in the right directory ls -la src/sanger_pipeline/ # Install in development mode pip install -e .
Python Path Issues:
# Add to Python path temporarily export PYTHONPATH="${PYTHONPATH}:$(pwd)" # Or use absolute imports python -m src.sanger_pipeline.cli.main --help
Virtual Environment:
# Make sure virtual environment is activated which python pip list | grep -i bio # Check if BioPython is installed
MAFFT Not Found¶
Error: MAFFT executable not found
or Command 'mafft' not found
Solutions:
macOS:
# Install via Homebrew
brew install mafft
# Verify installation
mafft --version
Ubuntu/Debian:
# Update package list and install
sudo apt update
sudo apt install mafft
# Verify installation
which mafft
Manual Installation:
# Download and install manually
wget https://mafft.cbrc.jp/alignment/software/mafft-7.490-without-extensions-src.tgz
tar -xzf mafft-7.490-without-extensions-src.tgz
cd mafft-7.490-without-extensions/core/
make clean
make
sudo make install
Path Issues:
# Check if MAFFT is in PATH
echo $PATH
# Add MAFFT directory to PATH
export PATH="/usr/local/bin:$PATH"
# Make permanent in ~/.bashrc or ~/.zshrc
echo 'export PATH="/usr/local/bin:$PATH"' >> ~/.bashrc
BioPython Issues¶
Error: ImportError: No module named 'Bio'
Solutions:
# Install BioPython
pip install biopython
# If installation fails, try with conda
conda install -c conda-forge biopython
# Verify installation
python -c "import Bio; print(f'BioPython version: {Bio.__version__}')"
Common BioPython Installation Issues:
# If you get compiler errors on macOS
xcode-select --install
# If you get permission errors
pip install --user biopython
📁 File and Data Issues¶
No AB1 Files Found¶
Error: No AB1 files found in input directory
Diagnostic Steps:
# Check file extensions
ls -la input/ | grep -i ab1
# Check for hidden characters or wrong extensions
file input/*
Solutions:
Check File Extensions:
# Rename files with wrong extensions for file in input/*.AB1; do mv "$file" "${file%.AB1}.ab1" done
Check File Permissions:
# Fix permissions chmod 644 input/*.ab1
Verify File Format:
# Check if files are actually AB1 format python -c " from Bio import SeqIO try: record = SeqIO.read('input/sample.ab1', 'abi') print('✓ File is valid AB1 format') except: print('✗ File is not valid AB1 format') "
Corrupted AB1 Files¶
Error: Error reading AB1 file
or Invalid trace file format
Diagnostic:
# Check file size and integrity
ls -lh input/*.ab1
# Test with BioPython
python -c "
from Bio import SeqIO
import sys
try:
record = SeqIO.read(sys.argv[1], 'abi')
print(f'Sequence length: {len(record.seq)}')
print(f'Quality scores available: {hasattr(record, \"letter_annotations\")}')
except Exception as e:
print(f'Error: {e}')
" input/sample.ab1
Solutions:
Re-download Files: Get fresh copies from sequencing provider
Check Transfer Method: Ensure files weren’t corrupted during transfer
Alternative Conversion: Try manual conversion with other tools
File Pairing Issues¶
Error: Cannot pair forward and reverse reads
Diagnostic:
# Check naming patterns
ls -la input/ | grep -E "_[FR]\.ab1|_[12]\.ab1|_(forward|reverse)\.ab1"
Solutions:
Fix Naming Convention:
# Rename to standard pattern mv sample_forward.ab1 sample_F.ab1 mv sample_reverse.ab1 sample_R.ab1
Manual Pairing: Edit configuration to specify custom pairing rules
⚙️ Processing and Analysis Issues¶
Quality Filtering Removes All Sequences¶
Error: No sequences passed quality filtering
Diagnostic:
# Check original sequence quality
python -c "
from Bio import SeqIO
record = SeqIO.read('input/sample_F.ab1', 'abi')
qualities = record.letter_annotations['phred_quality']
print(f'Average quality: {sum(qualities)/len(qualities):.1f}')
print(f'Min quality: {min(qualities)}')
print(f'Max quality: {max(qualities)}')
"
Solutions:
Lower Quality Threshold:
# Try with lower threshold python -m src.sanger_pipeline.cli.main run-pipeline \ --input-dir ./input \ --output-dir ./output_lowq \ --quality-threshold 10
Check Input Quality:
# Generate quality plots first python -m src.sanger_pipeline.cli.main convert \ --input-dir ./input \ --output-dir ./converted \ --quality-filter
Adjust Minimum Length:
# In configuration file quality_threshold: 15 min_sequence_length: 25
Alignment Failures¶
Error: MAFFT alignment failed
or Empty alignment result
Diagnostic:
# Test MAFFT manually
mafft --version
mafft --auto test_sequences.fasta
Solutions:
Check Sequence Compatibility:
# Ensure sequences are from same organism/region head -20 output/filtered/sample*_filtered.fasta
Manual Alignment Test:
# Test alignment manually cat output/filtered/sample_F_filtered.fasta output/filtered/sample_R_filtered.fasta > test_align.fasta mafft test_align.fasta > aligned_test.fasta
Alternative Alignment Parameters:
Modify pipeline to use different MAFFT parameters for difficult sequences.
No HVS Regions Detected¶
Error: No HVS regions found in consensus sequence
Diagnostic:
# Check sequence content and length
python -c "
from Bio import SeqIO
for record in SeqIO.parse('output/consensus/sample_consensus.fasta', 'fasta'):
print(f'Sequence ID: {record.id}')
print(f'Length: {len(record.seq)}')
print(f'First 100 bp: {record.seq[:100]}')
"
Solutions:
Check Sequence Type: - Ensure sequences are mitochondrial DNA - Verify they contain hypervariable regions
Adjust HVS Coordinates:
# Modify in configuration hvs_regions: HVS1: start: 16000 # More permissive coordinates end: 16400
Manual Region Identification: Use BLAST or other tools to identify actual sequence content.
🧬 Ancient DNA Analysis Issues¶
Bootstrap Analysis Fails¶
Error: Bootstrap analysis failed
or Insufficient data for bootstrap
Diagnostic:
# Check sequence count and length
python -c "
from Bio import SeqIO
sequences = list(SeqIO.parse('output/final/sample_final.fasta', 'fasta'))
print(f'Number of sequences: {len(sequences)}')
for i, seq in enumerate(sequences):
print(f'Sequence {i+1}: {len(seq.seq)} bp')
"
Solutions:
Reduce Bootstrap Iterations:
# In configuration bootstrap_iterations: 1000 # Reduced from 10000
Check Minimum Requirements: - Sequences should be >50bp - Need sufficient sequence data for statistical analysis
Alternative Analysis:
# Run damage analysis with different parameters python -m src.sanger_pipeline.cli.main analyze-damage \ --input-dir ./output/final \ --output-dir ./damage_simple \ --iterations 1000
Unrealistic Damage Scores¶
Issue: Damage scores don’t match expected values
Diagnostic:
# Examine damage analysis details
python -c "
import json
with open('output/damage_analysis/sample_damage_analysis.json') as f:
data = json.load(f)
print('Damage score:', data['damage_score'])
print('C→T rate:', data['c_to_t_rate'])
print('G→A rate:', data['g_to_a_rate'])
print('Background rate:', data['background_rate'])
"
Solutions:
Check Reference Expectations: - Modern DNA: damage score <0.2 - Ancient DNA: damage score >0.3 - Consider sample age and preservation
Validate with Controls: - Include known modern samples - Include known ancient samples - Compare patterns across samples
Review Sample History: - Check extraction methods - Review storage conditions - Consider contamination sources
📊 Report Generation Issues¶
Report Generation Fails¶
Error: Failed to generate QC report
Diagnostic:
# Check output directory structure
tree output/
# Test report generation with verbose output
python -m src.sanger_pipeline.cli.main generate-report \
--output-dir ./output \
--verbose
Solutions:
Check Dependencies:
# Ensure all required files exist ls -la output/damage_analysis/ ls -la output/final/
Permissions:
# Check write permissions touch output/reports/test_file.html rm output/reports/test_file.html
Manual Report Generation:
# Test report generation in Python from src.sanger_pipeline.utils.report_generator import ReportGenerator generator = ReportGenerator() report_path = generator.generate_report("./output")
Browser Won’t Open Report¶
Issue: Report generates but browser doesn’t open
Solutions:
# Open report manually
open output/reports/qc_report_*.html # macOS
xdg-open output/reports/qc_report_*.html # Linux
# Or specify browser
firefox output/reports/qc_report_*.html
🐛 Performance and Memory Issues¶
Slow Processing¶
Issue: Pipeline takes very long to run
Diagnostic:
# Monitor resource usage
top -p $(pgrep python) # Linux
# or
Activity Monitor # macOS
Solutions:
Reduce Bootstrap Iterations:
bootstrap_iterations: 1000 # Instead of 10000
Process Smaller Batches:
# Split large datasets mkdir batch1 batch2 mv input/sample1*.ab1 batch1/ mv input/sample2*.ab1 batch2/
Increase Quality Threshold:
quality_threshold: 25 # Higher threshold = less data to process
Memory Issues¶
Error: MemoryError
or system becomes unresponsive
Solutions:
Process Individual Samples:
# Process one sample at a time for sample in input/*_F.ab1; do base=$(basename "$sample" _F.ab1) mkdir "temp_$base" cp "input/${base}_F.ab1" "input/${base}_R.ab1" "temp_$base/" python -m src.sanger_pipeline.cli.main run-pipeline \ --input-dir "temp_$base" \ --output-dir "output_$base" done
Adjust System Limits:
# Increase memory limits (Linux) ulimit -v 4000000 # 4GB virtual memory limit
🔍 Advanced Debugging¶
Enable Debug Logging¶
# Run with maximum verbosity
python -m src.sanger_pipeline.cli.main run-pipeline \
--input-dir ./input \
--output-dir ./output \
--verbose \
--log-file debug.log
# Examine log file
tail -f debug.log
Python Debugging¶
# Debug pipeline programmatically
import logging
logging.basicConfig(level=logging.DEBUG)
from src.sanger_pipeline.core.pipeline import SangerPipeline
pipeline = SangerPipeline(debug=True)
try:
results = pipeline.run("./input", "./output")
except Exception as e:
import traceback
traceback.print_exc()
Isolate Problem Stage¶
# Test each stage individually
# 1. Test AB1 conversion only
python -m src.sanger_pipeline.cli.main convert \
--input-dir ./input \
--output-dir ./test_convert
# 2. Test damage analysis only
python -m src.sanger_pipeline.cli.main analyze-damage \
--input-dir ./existing_sequences \
--output-dir ./test_damage
📞 Getting Help¶
Before Asking for Help¶
Check this troubleshooting guide for your specific issue
Run diagnostic commands to gather information
Try simple solutions first (restart, reinstall, etc.)
Prepare detailed information about your problem
Information to Include¶
When reporting issues, include:
Error messages (complete text, not screenshots)
Command that failed (exact command line)
Configuration file (if using custom config)
System information:
# Gather system info python --version pip list | grep -i bio mafft --version uname -a # Linux/macOS
Sample data characteristics: - Number of AB1 files - Approximate file sizes - Expected sample types (modern/ancient)
Where to Get Help¶
GitHub Issues: https://github.com/allyssonallan/sanger_adna_damage/issues
Documentation: Check all relevant documentation sections
Community Discussions: GitHub Discussions for general questions
Creating Good Bug Reports¶
## Bug Report Template
**Problem Description**: Brief description of what's wrong
**Expected Behavior**: What should happen
**Actual Behavior**: What actually happens
**Error Message**:
```
Paste exact error message here
```
**Command Used**:
```bash
python -m src.sanger_pipeline.cli.main run-pipeline --input-dir ./input --output-dir ./output
```
**Environment**:
- OS: macOS 12.0 / Ubuntu 20.04 / Windows 11
- Python version: 3.9.7
- Pipeline version: 1.0.0
- MAFFT version: 7.490
**Configuration** (if using custom config):
```yaml
quality_threshold: 20
# ... other relevant settings
```
**Sample Data**:
- Number of AB1 files: 4
- Sample types: Ancient DNA from bone
**Additional Context**: Any other relevant information
This comprehensive troubleshooting guide should help you diagnose and solve most common issues with the Sanger DNA Damage Analysis Pipeline.