# Risk Models Specification This document outlines the requirements and specifications for implementing risk models in the Sentinel cancer risk assessment system. ## Overview Risk models in Sentinel are designed to calculate cancer risk scores using structured user input data. All risk models must follow a consistent architecture, use the new `UserInput` structure, implement proper validation, and maintain comprehensive test coverage. ## Core Architecture ### Base Class All risk models must inherit from `RiskModel` in `src/sentinel/risk_models/base.py`: ```python from sentinel.risk_models.base import RiskModel class YourRiskModel(RiskModel): def __init__(self): super().__init__("your_model_name") ``` ### Required Methods Every risk model must implement these abstract methods: ```python def compute_score(self, user: UserInput) -> str: """Compute the risk score for a given user profile. Args: user: The user profile containing demographics, medical history, etc. Returns: str: Risk percentage as a string or an N/A message if inapplicable. Raises: ValueError: If required inputs are missing or invalid. """ def cancer_type(self) -> str: """Return the cancer type this model assesses.""" return "breast" # or "lung", "prostate", etc. def description(self) -> str: """Return a detailed description of the model.""" def interpretation(self) -> str: """Return guidance on how to interpret the results.""" def references(self) -> list[str]: """Return list of reference citations.""" ``` ## UserInput Structure ### Required Imports ```python from typing import Annotated from pydantic import Field from sentinel.risk_models.base import RiskModel from sentinel.user_input import ( # Import specific enums and models you need CancerType, ChronicCondition, Demographics, Ethnicity, FamilyMemberCancer, FamilyRelation, FamilySide, RelationshipDegree, Sex, SymptomEntry, UserInput, # ... other specific imports ) ``` ### UserInput Hierarchy The `UserInput` class follows a hierarchical structure: ``` UserInput ├── demographics: Demographics │ ├── age_years: int │ ├── sex: Sex (enum) │ ├── ethnicity: Ethnicity | None │ └── anthropometrics: Anthropometrics │ ├── height_cm: float | None │ └── weight_kg: float | None ├── lifestyle: Lifestyle │ ├── smoking: SmokingHistory │ └── alcohol: AlcoholConsumption ├── personal_medical_history: PersonalMedicalHistory │ ├── chronic_conditions: list[ChronicCondition] │ ├── previous_cancers: list[CancerType] │ ├── genetic_mutations: list[GeneticMutation] │ ├── tyrer_cuzick_polygenic_risk_score: float | None │ └── # ... other fields ├── female_specific: FemaleSpecific | None │ ├── menstrual: MenstrualHistory │ ├── parity: ParityHistory │ └── breast_health: BreastHealthHistory ├── symptoms: list[SymptomEntry] └── family_history: list[FamilyMemberCancer] ``` ## REQUIRED_INPUTS Specification ### Structure Every risk model must define a `REQUIRED_INPUTS` class attribute using Pydantic's `Annotated` types with `Field` constraints: ```python REQUIRED_INPUTS: dict[str, tuple[type, bool]] = { "demographics.age_years": (Annotated[int, Field(ge=18, le=100)], True), "demographics.sex": (Sex, True), "demographics.ethnicity": (Ethnicity | None, False), "demographics.anthropometrics.height_cm": (Annotated[float, Field(gt=0)], False), "demographics.anthropometrics.weight_kg": (Annotated[float, Field(gt=0)], False), "female_specific.menstrual.age_at_menarche": (Annotated[int, Field(ge=8, le=25)], False), "personal_medical_history.tyrer_cuzick_polygenic_risk_score": (Annotated[float, Field(gt=0)], False), "family_history": (list, False), # list[FamilyMemberCancer] "symptoms": (list, False), # list[SymptomEntry] } ``` ### Field Constraints Use appropriate `Field` constraints for validation: - `ge=X`: Greater than or equal to X - `le=X`: Less than or equal to X - `gt=X`: Greater than X - `lt=X`: Less than X ### Required vs Optional - `True`: Field is required for the model - `False`: Field is optional but validated if present ## Input Validation ### Validation in compute_score Every `compute_score` method must start with input validation: ```python def compute_score(self, user: UserInput) -> str: """Compute the risk score for a given user profile.""" # Validate inputs first is_valid, errors = self.validate_inputs(user) if not is_valid: raise ValueError(f"Invalid inputs for {self.name}: {'; '.join(errors)}") # Continue with model-specific logic... ``` ### Model-Specific Validation Add additional validation as needed: ```python # Check sex applicability if user.demographics.sex != Sex.FEMALE: return "N/A: Model is only applicable to female patients." # Check age range if not (35 <= user.demographics.age_years <= 85): return "N/A: Age is outside the validated range." # Check required data availability if user.female_specific is None: return "N/A: Missing female-specific information required for model." ``` ## Extending UserInput ### When to Extend If a risk model requires fields or enums that don't exist in `UserInput`, **do not** use replacement values or hacks. Instead, propose extending `UserInput`: 1. **Missing Enums**: Add new values to existing enums (e.g., `ChronicCondition`, `SymptomType`) 2. **Missing Fields**: Add new fields to appropriate sections (e.g., `PersonalMedicalHistory`, `BreastHealthHistory`) 3. **Missing Models**: Create new Pydantic models if needed ### Extension Process 1. **Identify Missing Elements**: Document what's needed for the model 2. **Propose Extension**: Suggest specific additions to `UserInput` 3. **Implement Extension**: Add the new fields/enums to `src/sentinel/user_input.py` 4. **Update Tests**: Add tests for new fields in `tests/test_user_input.py` 5. **Update Model**: Use the new fields in your risk model 6. **Run Tests**: Ensure all tests pass ### Example Extensions ```python # Adding new ChronicCondition enum values class ChronicCondition(str, Enum): # ... existing values ENDOMETRIAL_POLYPS = "endometrial_polyps" ANAEMIA = "anaemia" # Adding new fields to PersonalMedicalHistory class PersonalMedicalHistory(StrictBaseModel): # ... existing fields tyrer_cuzick_polygenic_risk_score: float | None = Field( None, gt=0, description="Tyrer-Cuzick polygenic risk score as relative risk multiplier", ) # Adding new fields to BreastHealthHistory class BreastHealthHistory(StrictBaseModel): # ... existing fields lobular_carcinoma_in_situ: bool | None = Field( None, description="History of lobular carcinoma in situ (LCIS) diagnosis", ) ``` ## Data Access Patterns ### Demographics ```python age = user.demographics.age_years sex = user.demographics.sex ethnicity = user.demographics.ethnicity height_cm = user.demographics.anthropometrics.height_cm weight_kg = user.demographics.anthropometrics.weight_kg ``` ### Female-Specific Data ```python if user.female_specific is not None: fs = user.female_specific menarche_age = fs.menstrual.age_at_menarche menopause_age = fs.menstrual.age_at_menopause num_births = fs.parity.num_live_births first_birth_age = fs.parity.age_at_first_live_birth num_biopsies = fs.breast_health.num_biopsies atypical_hyperplasia = fs.breast_health.atypical_hyperplasia lcis = fs.breast_health.lobular_carcinoma_in_situ ``` ### Medical History ```python chronic_conditions = user.personal_medical_history.chronic_conditions previous_cancers = user.personal_medical_history.previous_cancers genetic_mutations = user.personal_medical_history.genetic_mutations polygenic_score = user.personal_medical_history.tyrer_cuzick_polygenic_risk_score ``` ### Family History ```python for member in user.family_history: if member.cancer_type == CancerType.BREAST: relation = member.relation age_at_diagnosis = member.age_at_diagnosis degree = member.degree side = member.side ``` ### Symptoms ```python for symptom in user.symptoms: symptom_type = symptom.symptom_type severity = symptom.severity duration_days = symptom.duration_days ``` ## Enum Usage ### Always Use Enums Never use string literals. Always use the appropriate enums: ```python # ✅ Correct if user.demographics.sex == Sex.FEMALE: if member.cancer_type == CancerType.BREAST: if member.relation == FamilyRelation.MOTHER: if member.degree == RelationshipDegree.FIRST: if member.side == FamilySide.MATERNAL: # ❌ Incorrect if user.demographics.sex == "female": if member.cancer_type == "breast": if member.relation == "mother": ``` ### Enum Mapping When you need to map enums to model-specific codes: ```python def _race_code_from_ethnicity(ethnicity: Ethnicity | None) -> int: """Map ethnicity enum to model-specific race code.""" if not ethnicity: return 1 # Default if ethnicity == Ethnicity.BLACK: return 2 if ethnicity in {Ethnicity.ASIAN, Ethnicity.PACIFIC_ISLANDER}: return 3 if ethnicity == Ethnicity.HISPANIC: return 6 return 1 # Default to White ``` ## Testing Requirements ### Test File Structure Create comprehensive test files following this pattern: ```python import pytest from sentinel.user_input import ( # Import all needed models and enums Anthropometrics, BreastHealthHistory, CancerType, Demographics, Ethnicity, FamilyMemberCancer, FamilyRelation, FamilySide, FemaleSpecific, Lifestyle, MenstrualHistory, ParityHistory, PersonalMedicalHistory, RelationshipDegree, Sex, SmokingHistory, SmokingStatus, UserInput, ) from sentinel.risk_models import YourRiskModel # Ground truth test cases GROUND_TRUTH_CASES = [ { "name": "test_case_name", "input": UserInput( demographics=Demographics( age_years=40, sex=Sex.FEMALE, ethnicity=Ethnicity.WHITE, anthropometrics=Anthropometrics(height_cm=165.0, weight_kg=65.0), ), lifestyle=Lifestyle( smoking=SmokingHistory(status=SmokingStatus.NEVER), ), personal_medical_history=PersonalMedicalHistory(), female_specific=FemaleSpecific( menstrual=MenstrualHistory(age_at_menarche=13), parity=ParityHistory(num_live_births=1, age_at_first_live_birth=25), breast_health=BreastHealthHistory(), ), family_history=[ FamilyMemberCancer( relation=FamilyRelation.MOTHER, cancer_type=CancerType.BREAST, age_at_diagnosis=55, degree=RelationshipDegree.FIRST, side=FamilySide.MATERNAL, ) ], ), "expected": 1.5, # Expected risk percentage }, # ... more test cases ] class TestYourRiskModel: """Test suite for YourRiskModel.""" def setup_method(self): """Initialize model instance for testing.""" self.model = YourRiskModel() @pytest.mark.parametrize("case", GROUND_TRUTH_CASES, ids=lambda x: x["name"]) def test_ground_truth_validation(self, case): """Test against ground truth results.""" user_input = case["input"] expected_risk = case["expected"] actual_risk_str = self.model.compute_score(user_input) if "N/A" in actual_risk_str: pytest.fail(f"Model returned N/A: {actual_risk_str}") actual_risk = float(actual_risk_str) assert actual_risk == pytest.approx(expected_risk, abs=0.01) def test_validation_errors(self): """Test that model raises ValueError for invalid inputs.""" # Test invalid age user_input = UserInput( demographics=Demographics( age_years=30, # Below minimum sex=Sex.FEMALE, anthropometrics=Anthropometrics(height_cm=165.0, weight_kg=65.0), ), # ... rest of input ) with pytest.raises(ValueError, match=r"Invalid inputs for.*:"): self.model.compute_score(user_input) def test_inapplicable_cases(self): """Test cases where model returns N/A.""" # Test male patient user_input = UserInput( demographics=Demographics( age_years=50, sex=Sex.MALE, # Wrong sex anthropometrics=Anthropometrics(height_cm=175.0, weight_kg=70.0), ), # ... rest of input ) score = self.model.compute_score(user_input) assert "N/A" in score ``` ### Test Coverage Requirements - **Ground Truth Validation**: Test against known reference values - **Input Validation**: Test that invalid inputs raise `ValueError` - **Edge Cases**: Test boundary conditions and edge cases - **Inapplicable Cases**: Test cases where model should return "N/A" - **Enum Usage**: Test that all enums are used correctly - **Family History**: Test various family relationship combinations - **Error Handling**: Test error conditions and exception handling ## Code Quality Requirements ### Pre-commit Hooks All code must pass these pre-commit hooks: - **unimport**: Remove unused imports - **ruff format**: Code formatting - **ruff check**: Linting and style checks - **pylint**: Code quality analysis - **darglint**: Docstring validation - **pydocstyle**: Docstring style checks - **codespell**: Spell checking ### Code Style - Use type hints throughout - Write clear, concise docstrings - Follow PEP 8 style guidelines - Use meaningful variable names - Add comments for complex logic - Handle edge cases gracefully ### Error Handling ```python def compute_score(self, user: UserInput) -> str: """Compute the risk score for a given user profile.""" try: # Validate inputs is_valid, errors = self.validate_inputs(user) if not is_valid: raise ValueError(f"Invalid inputs for {self.name}: {'; '.join(errors)}") # Model-specific validation if user.demographics.sex != Sex.FEMALE: return "N/A: Model is only applicable to female patients." # Calculate risk risk = self._calculate_risk(user) return f"{risk:.2f}" except Exception as e: return f"N/A: Error calculating risk - {e!s}" ``` ## Migration Checklist When adapting an existing risk model to the new structure: - [ ] Update imports to use new `user_input` module - [ ] Add `REQUIRED_INPUTS` with Pydantic validation - [ ] Refactor `compute_score` to use new `UserInput` structure - [ ] Replace string literals with enums - [ ] Update parameter extraction logic - [ ] Add input validation at start of `compute_score` - [ ] Update all test cases to use new `UserInput` structure - [ ] Run full test suite to ensure 100% pass rate - [ ] Run pre-commit hooks to ensure code quality - [ ] Document any `UserInput` extensions needed - [ ] Update model documentation and references ## Examples ### Complete Risk Model Template ```python """Your cancer risk model implementation.""" from typing import Annotated from pydantic import Field from sentinel.risk_models.base import RiskModel from sentinel.user_input import ( CancerType, Demographics, Ethnicity, FamilyMemberCancer, FamilyRelation, RelationshipDegree, Sex, UserInput, ) class YourRiskModel(RiskModel): """Compute cancer risk using the Your model.""" def __init__(self): super().__init__("your_model") REQUIRED_INPUTS: dict[str, tuple[type, bool]] = { "demographics.age_years": (Annotated[int, Field(ge=18, le=100)], True), "demographics.sex": (Sex, True), "demographics.ethnicity": (Ethnicity | None, False), "family_history": (list, False), # list[FamilyMemberCancer] } def compute_score(self, user: UserInput) -> str: """Compute the risk score for a given user profile.""" # Validate inputs first is_valid, errors = self.validate_inputs(user) if not is_valid: raise ValueError(f"Invalid inputs for Your: {'; '.join(errors)}") # Model-specific validation if user.demographics.sex != Sex.FEMALE: return "N/A: Model is only applicable to female patients." # Extract parameters age = user.demographics.age_years ethnicity = user.demographics.ethnicity # Count family history family_count = sum( 1 for member in user.family_history if member.cancer_type == CancerType.BREAST and member.degree == RelationshipDegree.FIRST ) # Calculate risk (example) risk = self._calculate_risk(age, family_count, ethnicity) return f"{risk:.2f}" def _calculate_risk(self, age: int, family_count: int, ethnicity: Ethnicity | None) -> float: """Calculate the actual risk value.""" # Implementation here return 1.5 # Example def cancer_type(self) -> str: return "breast" def description(self) -> str: return "Your model description here." def interpretation(self) -> str: return "Interpretation guidance here." def references(self) -> list[str]: return ["Your reference here."] ``` This specification ensures consistency, maintainability, and quality across all risk models in the Sentinel system.