Changelog

All notable changes to the Sanger DNA Damage Analysis Pipeline will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Version 2.0.0 - 2025-08-18

Enhanced Quality Control Release

Added

  • Enhanced Quality Control Pipeline - Comprehensive aDNA-specific quality control system

  • aDNA Sequence Cleaner - Advanced removal of ancient DNA artifacts and ambiguous nucleotides

  • Improved FASTA to HSD Converter - Enhanced conversion with configurable quality thresholds

  • HSD Diversity Analyzer - Comprehensive genetic diversity assessment and sample comparison

  • Quality Filtering - Advanced filtering with 70% default quality threshold

  • Sample Prioritization - Automated identification of highest-quality samples

  • Diversity Metrics - Detailed reports on variant counts, sample similarity, and genetic diversity

  • Quality Flags - Automatic detection of potential quality issues and low-quality samples

  • Enhanced Documentation - Complete documentation for new quality control features

Changed

  • Pipeline Output Structure - Enhanced output with quality-controlled files

  • HSD Conversion - Improved regional hybrid method now default (52.4 avg variants/sample)

  • Quality Assessment - More sophisticated quality metrics and reporting

  • Sample Processing - Better handling of ancient DNA sequences and artifacts

  • Documentation - Updated with enhanced pipeline workflows and best practices

Fixed

  • Quality Control - Improved artifact detection and removal for aDNA samples

  • Variant Calling - Better handling of ambiguous nucleotides and alignment artifacts

  • Sample Retention - Optimal balance between quality and sample retention (~60% retention)

  • Performance - Optimized processing for large sample sets

Unreleased

Added

  • Comprehensive Sphinx documentation with step-by-step guides

  • Advanced API documentation with autodoc integration

  • Interactive HTML QC reports with Bootstrap 5 and Chart.js

  • HVS region-aware processing with independent region handling

  • Statistical damage assessment with bootstrap validation

  • Command-line interface with comprehensive options

  • Configuration system with YAML-based parameter management

  • Ancient DNA damage analysis with position-based detection

  • Quality control with Phred score filtering and visualization

  • Batch processing capabilities for multiple samples

  • Error handling and recovery mechanisms

  • Performance monitoring and optimization tools

Changed

  • Updated terminology from “authentication” to “assessment” throughout codebase

  • Improved damage analysis with more robust statistical methods

  • Enhanced report generation with modern web technologies

  • Refactored pipeline architecture for better modularity

  • Optimized memory usage for large datasets

  • Improved error messages and user feedback

Fixed

  • Configuration parameter integration (damage_threshold usage)

  • File pairing logic for AB1 read detection

  • Bootstrap analysis convergence issues

  • Memory leaks in large dataset processing

  • Cross-platform compatibility issues

Removed

  • Legacy Quarto QMD reporting template

  • Deprecated authentication terminology

  • Unused configuration parameters

Security

  • Input validation for all file operations

  • Sanitization of user-provided paths and parameters

  • Protection against path traversal vulnerabilities

Version 1.0.0 - 2024-01-15

Added

  • Initial release of the Sanger DNA Damage Analysis Pipeline

  • Complete AB1 to consensus sequence workflow

  • Ancient DNA damage pattern detection and analysis

  • Quality control and filtering capabilities

  • HVS region processing for mitochondrial DNA

  • Basic command-line interface

  • Configuration management system

  • Bootstrap statistical validation

  • Report generation functionality

  • Documentation and usage guides

Features

  • AB1 Processing: Convert AB1 files to FASTA with quality filtering

  • Sequence Alignment: Align forward and reverse reads using MAFFT

  • Consensus Building: Generate consensus sequences for HVS regions

  • Damage Analysis: Detect C→T and G→A transitions characteristic of ancient DNA

  • Statistical Validation: Bootstrap analysis with configurable iterations

  • Quality Control: Comprehensive quality assessment and visualization

  • Report Generation: HTML reports with analysis summaries

  • Batch Processing: Process multiple samples efficiently

  • Modular Design: Extensible architecture for custom workflows

Supported Formats

  • Input: AB1 trace files from Sanger sequencing

  • Output: FASTA sequences, JSON analysis results, HTML reports

  • Configuration: YAML configuration files

  • Reports: Interactive HTML with embedded visualizations

Dependencies

  • Python 3.8+

  • BioPython for sequence processing

  • MAFFT for sequence alignment

  • Standard scientific Python stack (numpy, matplotlib)

  • Modern web technologies for reporting (Bootstrap, Chart.js)

Documentation

  • Complete installation guide

  • Quick start tutorial

  • Comprehensive API reference

  • Troubleshooting guide

  • Contributing guidelines

  • Step-by-step usage tutorials

Testing

  • Unit tests for core functionality

  • Integration tests for complete workflows

  • Performance benchmarks

  • Cross-platform compatibility testing

Known Issues

  • Large datasets may require significant memory

  • Bootstrap analysis can be time-consuming with high iteration counts

  • Some AB1 files from older sequencers may have compatibility issues

Future Plans

  • Performance optimizations for large-scale studies

  • Additional statistical methods for damage assessment

  • Integration with other ancient DNA analysis tools

  • Enhanced visualization and reporting options

  • Support for additional sequencing formats

  • Cloud computing integration

  • Machine learning-based quality assessment

Migration Notes

This is the initial release, so no migration is required. Future versions will include migration guides for any breaking changes.

Acknowledgments

  • BioPython community for sequence processing tools

  • MAFFT developers for alignment algorithms

  • Scientific community for feedback and testing

  • Contributors and early adopters

Support

  • GitHub Issues: Bug reports and feature requests

  • Documentation: Comprehensive guides and API reference

  • Community: GitHub Discussions for questions and help