ExposureGuard-PolicyNet

DOI

Most PHI classifiers answer one question: is this PHI or not? This model answers a different question: given everything we know about this patient's exposure history so far, what should we do right now?

Five possible answers. One output per event.


What it does

Takes a 17-dimensional exposure state vector and predicts the appropriate masking policy for the current event. The state captures cumulative risk, cross-modal linkage signals, pseudonym versioning, and modality context.

Policy When it fires
raw Risk low, single modality, early in stream
weak Moderate risk, partial masking sufficient
pseudo Risk above 0.65, pseudonymization required
redact Threshold crossed via exposure accumulation
adaptive_rewrite Threshold crossed via cross-modal linkage

The adaptive_rewrite vs redact distinction is the key contribution. Both fire when risk crosses the threshold, but the cause matters. Cross-modal linkage means the same patient has appeared across two or more modalities and the records have been linked. That scenario calls for a full synthetic rewrite downstream via SynthRewrite-T5, not just redaction. Exposure accumulation, where risk built up within a single modality stream, calls for redact.


Usage

from inference import predict

result = predict({
    "risk": 0.72,
    "risk_before": 0.43,
    "effective_units": 9,
    "units_factor": 0.362,
    "recency_factor": 0.81,
    "link_bonus": 0.20,
    "degree": 2,
    "confidence": 0.362,
    "pseudonym_version": 1,
    "triggered": True,
    "cross_modal_matches": ["text"],
    "modality": "asr",
})

print(result["policy"])      # adaptive_rewrite
print(result["confidence"])  # float
print(result["all_scores"])  # scores for all 5 policies

Loading directly from HuggingFace:

from huggingface_hub import hf_hub_download
import torch
import torch.nn as nn

class PolicyNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(17, 64),  nn.LayerNorm(64),  nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(64, 128), nn.LayerNorm(128), nn.ReLU(), nn.Dropout(0.2),
            nn.Linear(128, 64), nn.ReLU(),
            nn.Linear(64, 5),
        )
    def forward(self, x):
        return self.net(x)

weights = hf_hub_download("vkatg/exposureguard-policynet", "pytorch_model.bin")
model   = PolicyNet()
model.load_state_dict(torch.load(weights, map_location="cpu", weights_only=True))
model.eval()

Input features (17 dimensions)

Feature Description
risk Current cumulative risk score
risk_before Risk before this event
delta_risk risk minus risk_before
eff_units_norm effective_units / 50
units_factor 1 - exp(-0.05 * effective_units)
recency_factor 0.5^(age_seconds / half_life)
link_bonus 0.0 / 0.20 / 0.30 for 1 / 2 / 3+ linked modalities
degree_norm distinct modality count / 5
confidence same as units_factor
pseudo_ver_norm pseudonym version / 10
triggered 1.0 if threshold crossed this event
cm_count_norm cross-modal match count / 5
mod_text ... mod_audio_proxy modality one-hot (5 dims)

Architecture

Linear(17->64) -> LayerNorm -> ReLU -> Dropout(0.2)
Linear(64->128) -> LayerNorm -> ReLU -> Dropout(0.2)
Linear(128->64) -> ReLU
Linear(64->5)

75KB weights file. No external dependencies beyond PyTorch.


Training

6,000 synthetic patient scenarios, balanced at 1,200 samples per policy class. 85/15 train/val split, AdamW, cosine LR schedule, 60 epochs. Val accuracy: 99.89%. Per-class results: raw 100%, weak 99.7%, pseudo 99.8%, redact 100%, adaptive_rewrite 100%. The small number of weak/pseudo misclassifications comes from the adjacent risk ranges (0.35-0.64 and 0.65-0.84) where boundary cases are genuinely ambiguous.


Where it fits

DCPG Risk Scorer
      |
  ExposureGuard-PolicyNet    <- this model
      |
  +---+-------------------+
  |                       |
adaptive_rewrite        redact / pseudo / weak / raw
  |
SynthRewrite-T5

Related


Citation

@software{exposureguard_policynet,
  title  = {ExposureGuard-PolicyNet: Stateful Privacy Policy Selection for Streaming Multimodal Clinical Data},
  author = {Ganti, Venkata Krishna Azith Teja},
  doi    = {10.5281/zenodo.18865882},
  url    = {https://huggingface.co/vkatg/exposureguard-policynet},
  note   = {US Provisional Patent filed 2025-07-05}
}

Trained on fully synthetic data. Not validated for clinical use.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train vkatg/exposureguard-policynet

Space using vkatg/exposureguard-policynet 1

Evaluation results

  • accuracy on streaming-phi-deidentification-benchmark
    self-reported
    0.999