Changelog¶

All notable changes to the Sanger DNA Damage Analysis Pipeline will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Version 2.0.0 - 2025-08-18¶

Enhanced Quality Control Release

Added¶

Enhanced Quality Control Pipeline - Comprehensive aDNA-specific quality control system
aDNA Sequence Cleaner - Advanced removal of ancient DNA artifacts and ambiguous nucleotides
Improved FASTA to HSD Converter - Enhanced conversion with configurable quality thresholds
HSD Diversity Analyzer - Comprehensive genetic diversity assessment and sample comparison
Quality Filtering - Advanced filtering with 70% default quality threshold
Sample Prioritization - Automated identification of highest-quality samples
Diversity Metrics - Detailed reports on variant counts, sample similarity, and genetic diversity
Quality Flags - Automatic detection of potential quality issues and low-quality samples
Enhanced Documentation - Complete documentation for new quality control features

Changed¶

Pipeline Output Structure - Enhanced output with quality-controlled files
HSD Conversion - Improved regional hybrid method now default (52.4 avg variants/sample)
Quality Assessment - More sophisticated quality metrics and reporting
Sample Processing - Better handling of ancient DNA sequences and artifacts
Documentation - Updated with enhanced pipeline workflows and best practices

Fixed¶

Quality Control - Improved artifact detection and removal for aDNA samples
Variant Calling - Better handling of ambiguous nucleotides and alignment artifacts
Sample Retention - Optimal balance between quality and sample retention (~60% retention)
Performance - Optimized processing for large sample sets

Unreleased¶

Added¶

Comprehensive Sphinx documentation with step-by-step guides
Advanced API documentation with autodoc integration
Interactive HTML QC reports with Bootstrap 5 and Chart.js
HVS region-aware processing with independent region handling
Statistical damage assessment with bootstrap validation
Command-line interface with comprehensive options
Configuration system with YAML-based parameter management
Ancient DNA damage analysis with position-based detection
Quality control with Phred score filtering and visualization
Batch processing capabilities for multiple samples
Error handling and recovery mechanisms
Performance monitoring and optimization tools

Changed¶

Updated terminology from “authentication” to “assessment” throughout codebase
Improved damage analysis with more robust statistical methods
Enhanced report generation with modern web technologies
Refactored pipeline architecture for better modularity
Optimized memory usage for large datasets
Improved error messages and user feedback

Fixed¶

Configuration parameter integration (damage_threshold usage)
File pairing logic for AB1 read detection
Bootstrap analysis convergence issues
Memory leaks in large dataset processing
Cross-platform compatibility issues

Removed¶

Legacy Quarto QMD reporting template
Deprecated authentication terminology
Unused configuration parameters

Security¶

Input validation for all file operations
Sanitization of user-provided paths and parameters
Protection against path traversal vulnerabilities

Version 1.0.0 - 2024-01-15¶

Added¶

Initial release of the Sanger DNA Damage Analysis Pipeline
Complete AB1 to consensus sequence workflow
Ancient DNA damage pattern detection and analysis
Quality control and filtering capabilities
HVS region processing for mitochondrial DNA
Basic command-line interface
Configuration management system
Bootstrap statistical validation
Report generation functionality
Documentation and usage guides

Features¶

AB1 Processing: Convert AB1 files to FASTA with quality filtering
Sequence Alignment: Align forward and reverse reads using MAFFT
Consensus Building: Generate consensus sequences for HVS regions
Damage Analysis: Detect C→T and G→A transitions characteristic of ancient DNA
Statistical Validation: Bootstrap analysis with configurable iterations
Quality Control: Comprehensive quality assessment and visualization
Report Generation: HTML reports with analysis summaries
Batch Processing: Process multiple samples efficiently
Modular Design: Extensible architecture for custom workflows

Supported Formats¶

Input: AB1 trace files from Sanger sequencing
Output: FASTA sequences, JSON analysis results, HTML reports
Configuration: YAML configuration files
Reports: Interactive HTML with embedded visualizations

Dependencies¶

Python 3.8+
BioPython for sequence processing
MAFFT for sequence alignment
Standard scientific Python stack (numpy, matplotlib)
Modern web technologies for reporting (Bootstrap, Chart.js)

Documentation¶

Complete installation guide
Quick start tutorial
Comprehensive API reference
Troubleshooting guide
Contributing guidelines
Step-by-step usage tutorials

Testing¶

Unit tests for core functionality
Integration tests for complete workflows
Performance benchmarks
Cross-platform compatibility testing

Known Issues¶

Large datasets may require significant memory
Bootstrap analysis can be time-consuming with high iteration counts
Some AB1 files from older sequencers may have compatibility issues

Future Plans¶

Performance optimizations for large-scale studies
Additional statistical methods for damage assessment
Integration with other ancient DNA analysis tools
Enhanced visualization and reporting options
Support for additional sequencing formats
Cloud computing integration
Machine learning-based quality assessment

Migration Notes¶

This is the initial release, so no migration is required. Future versions will include migration guides for any breaking changes.

Acknowledgments¶

BioPython community for sequence processing tools
MAFFT developers for alignment algorithms
Scientific community for feedback and testing
Contributors and early adopters

Support¶

GitHub Issues: Bug reports and feature requests
Documentation: Comprehensive guides and API reference
Community: GitHub Discussions for questions and help