Contributing¶

Thank you for your interest in contributing to the Sanger DNA Damage Analysis Pipeline! This guide will help you get started with contributing code, documentation, or other improvements.

🤝 Ways to Contribute¶

There are many ways to contribute to this project:

Bug Reports: Help us identify and fix issues
Feature Requests: Suggest new functionality or improvements
Code Contributions: Submit bug fixes, new features, or optimizations
Documentation: Improve existing docs or add new guides
Testing: Help test the pipeline with different data types
Community Support: Help other users in discussions and issues

📋 Getting Started¶

Prerequisites¶

Before contributing, ensure you have:

Python 3.8+ installed
Git version control system
Basic familiarity with the pipeline (complete the Quick Start Guide guide)
Development tools (IDE/editor of your choice)

Setting Up Development Environment¶

Fork the Repository

Click the “Fork” button on the GitHub repository page to create your own copy.

Clone Your Fork

git clone https://github.com/YOUR_USERNAME/sanger_adna_damage.git
cd sanger_adna_damage

Set Up Remote

# Add upstream remote
git remote add upstream https://github.com/allyssonallan/sanger_adna_damage.git

# Verify remotes
git remote -v

Create Development Environment

# Create virtual environment
python3 -m venv dev_env
source dev_env/bin/activate  # On Windows: dev_env\\Scripts\\activate

# Install development dependencies
pip install -r requirements.txt
pip install -e .

# Install development tools (if available)
pip install -r requirements-dev.txt

Verify Installation

# Test the pipeline
python -m src.sanger_pipeline.cli.main --help

# Run tests (if available)
pytest tests/

🐛 Bug Reports¶

Found a Bug?¶

Before creating a bug report:

Check existing issues to avoid duplicates
Try the latest version to see if it’s already fixed
Follow troubleshooting guide to rule out common issues

Creating a Bug Report¶

Use this template for bug reports:

## Bug Report

**Description**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. Run command: `python -m src.sanger_pipeline.cli.main ...`
2. With input files: `...`
3. See error: `...`

**Expected Behavior**
What you expected to happen.

**Screenshots/Logs**
If applicable, add error messages or log outputs.

**Environment**
- OS: [e.g. macOS 12.0, Ubuntu 20.04]
- Python version: [e.g. 3.9.7]
- Pipeline version: [e.g. 1.0.0]
- MAFFT version: [e.g. 7.490]

**Additional Context**
Any other context about the problem.

💡 Feature Requests¶

Suggesting New Features¶

Feature requests are welcome! Before submitting:

Check existing requests to avoid duplicates
Consider the scope - does it fit the pipeline’s goals?
Think about implementation - is it technically feasible?

Feature Request Template¶

## Feature Request

**Is your feature request related to a problem?**
A clear description of what the problem is.

**Describe the solution you'd like**
A clear description of what you want to happen.

**Describe alternatives you've considered**
Alternative solutions or features you've considered.

**Use Cases**
Specific examples of how this feature would be used.

**Additional Context**
Any other context, mockups, or examples.

💻 Code Contributions¶

Development Workflow¶

Create a Branch

# Sync with upstream
git fetch upstream
git checkout main
git merge upstream/main

# Create feature branch
git checkout -b feature/my-new-feature

Make Changes
- Follow the coding standards (see below)
- Write tests for new functionality
- Update documentation as needed
- Commit changes with clear messages

Test Your Changes

# Run tests
pytest tests/

# Test with sample data
python -m src.sanger_pipeline.cli.main run-pipeline \
    --input-dir ./test_data \
    --output-dir ./test_output

Push and Create Pull Request

# Push to your fork
git push origin feature/my-new-feature

# Create pull request on GitHub

Coding Standards¶

Python Style: * Follow PEP 8 style guide * Use meaningful variable and function names * Add docstrings to all public functions and classes * Keep functions focused and concise

Example Function:

def calculate_damage_score(sequences: List[str], positions: int = 20) -> float:
    """Calculate ancient DNA damage score from sequences.

    Args:
        sequences: List of DNA sequences to analyze
        positions: Number of positions to analyze from each end

    Returns:
        Damage score between 0 and 1

    Raises:
        ValueError: If sequences list is empty or positions < 1
    """
    if not sequences:
        raise ValueError("Sequences list cannot be empty")

    if positions < 1:
        raise ValueError("Positions must be >= 1")

    # Implementation here
    return damage_score

Documentation Style: * Use Google-style docstrings * Include type hints for function parameters and returns * Document all parameters, return values, and exceptions

Testing Standards: * Write unit tests for all new functions * Include integration tests for new features * Test error conditions and edge cases * Aim for >80% code coverage

Example Test:

import pytest
from src.sanger_pipeline.utils.damage_analyzer import calculate_damage_score

def test_calculate_damage_score_valid_input():
    """Test damage score calculation with valid sequences."""
    sequences = ["ATCGATCG", "TTCGATCA", "ATCGATCG"]
    score = calculate_damage_score(sequences)
    assert 0.0 <= score <= 1.0

def test_calculate_damage_score_empty_input():
    """Test that empty sequences raise ValueError."""
    with pytest.raises(ValueError, match="Sequences list cannot be empty"):
        calculate_damage_score([])

Commit Message Guidelines¶

Write clear, descriptive commit messages:

Short (50 chars or less) summary of changes

More detailed explanatory text, if necessary. Wrap it to about 72
characters. The blank line separating the summary from the body is
critical (unless you omit the body entirely).

Further paragraphs come after blank lines.

- Bullet points are okay, too
- Use a hyphen or asterisk for the bullet

Examples:

Add bootstrap analysis for damage assessment

Fix AB1 conversion error with corrupted files

Update documentation for new HVS region feature

Improve performance of consensus building algorithm

📝 Documentation Contributions¶

Types of Documentation¶

User Guides: Help users accomplish specific tasks
Tutorials: Step-by-step learning experiences
API Documentation: Technical reference for developers
How-To Guides: Solutions to common problems

Documentation Standards¶

Use clear, concise language
Include code examples for technical content
Test all code examples to ensure they work
Use reStructuredText (.rst) format for Sphinx
Follow the established documentation structure

Building Documentation Locally¶

# Install documentation dependencies
pip install sphinx sphinx_rtd_theme

# Build documentation
cd docs/
make html

# View documentation
open _build/html/index.html  # macOS
xdg-open _build/html/index.html  # Linux

🧪 Testing Guidelines¶

Test Categories¶

Unit Tests: * Test individual functions in isolation * Mock external dependencies * Fast execution (<1 second per test)

Integration Tests: * Test component interactions * Use real dependencies where appropriate * Medium execution time (1-10 seconds per test)

End-to-End Tests: * Test complete workflows * Use real AB1 files (small test dataset) * Slower execution (10+ seconds per test)

Running Tests¶

# Run all tests
pytest

# Run specific test file
pytest tests/test_damage_analyzer.py

# Run with coverage
pytest --cov=src

# Run only fast tests
pytest -m "not slow"

Test Data¶

Use small, synthetic test files when possible
Include a few real AB1 files for integration testing
Document the source and characteristics of test data
Keep test data in the tests/data/ directory

🔄 Pull Request Process¶

Pull Request Checklist¶

Before submitting a pull request:

[ ] Code follows the project’s coding standards
[ ] All tests pass locally
[ ] New code has appropriate test coverage
[ ] Documentation is updated (if applicable)
[ ] Commit messages are clear and descriptive
[ ] Branch is up-to-date with main branch

Pull Request Template¶

## Description
Brief description of what this PR does.

## Type of Change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Documentation update

## Testing
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] I have tested this with real AB1 files

## Checklist
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings

## Screenshots/Examples
If applicable, add examples of the changes.

Review Process¶

Automated Checks: CI/CD will run tests and checks
Code Review: Maintainers will review your code
Feedback: Address any requested changes
Approval: Once approved, your PR will be merged

🏷️ Release Process¶

Version Numbering¶

We follow Semantic Versioning (SemVer):

MAJOR: Breaking changes (e.g., 1.0.0 → 2.0.0)
MINOR: New features, backward compatible (e.g., 1.0.0 → 1.1.0)
PATCH: Bug fixes, backward compatible (e.g., 1.0.0 → 1.0.1)

Changelog¶

We maintain a changelog following the “Keep a Changelog” format:

# Changelog

## [Unreleased]
### Added
- New bootstrap analysis feature

### Changed
- Improved performance of consensus building

### Fixed
- Fixed AB1 conversion bug with certain file types

## [1.0.0] - 2024-01-15
### Added
- Initial release of the pipeline
- Complete AB1 to consensus workflow
- Ancient DNA damage analysis

🌟 Recognition¶

Contributors¶

We recognize all contributors in:

README.md contributors section
Documentation acknowledgments
Release notes
Git commit history

Types of Recognition¶

Code Contributors: Direct code contributions
Bug Reporters: High-quality bug reports
Documentation Contributors: Documentation improvements
Community Contributors: Helping others, discussions

📞 Communication¶

Where to Discuss¶

GitHub Issues: Bug reports, feature requests
GitHub Discussions: General questions, ideas, help
Pull Requests: Code review and discussion

Communication Guidelines¶

Be respectful and inclusive
Use clear, concise language
Provide context and examples
Search existing discussions before posting
Stay on topic

Code of Conduct¶

We are committed to providing a welcoming and inclusive environment. All contributors are expected to:

Use welcoming and inclusive language
Be respectful of differing viewpoints and experiences
Gracefully accept constructive criticism
Focus on what is best for the community
Show empathy towards other community members

🎯 Getting Started with Your First Contribution¶

Good First Issues¶

Look for issues labeled: * good first issue: Easy problems to get started * help wanted: Issues where we’d appreciate help * documentation: Documentation improvements needed

Simple Contribution Ideas¶

Fix Typos: Documentation or code comments
Add Examples: More usage examples in documentation
Improve Error Messages: Make error messages more helpful
Add Tests: Increase test coverage
Performance Improvements: Optimize slow operations

Steps for First Contribution¶

Find an Issue: Look through open issues for something interesting
Comment: Let maintainers know you’re working on it
Ask Questions: Don’t hesitate to ask for clarification
Start Small: Begin with a small, focused change
Learn from Feedback: Use code review as a learning opportunity

Thank you for contributing to the Sanger DNA Damage Analysis Pipeline! Your contributions help make this tool better for the entire research community.