ExposureGuard-PolicyNet
Most PHI classifiers answer one question: is this PHI or not? This model answers a different question: given everything we know about this patient's exposure history so far, what should we do right now?
Five possible answers. One output per event.
What it does
Takes a 17-dimensional exposure state vector and predicts the appropriate masking policy for the current event. The state captures cumulative risk, cross-modal linkage signals, pseudonym versioning, and modality context.
| Policy | When it fires |
|---|---|
raw |
Risk low, single modality, early in stream |
weak |
Moderate risk, partial masking sufficient |
pseudo |
Risk above 0.65, pseudonymization required |
redact |
Threshold crossed via exposure accumulation |
adaptive_rewrite |
Threshold crossed via cross-modal linkage |
The adaptive_rewrite vs redact distinction is the key contribution. Both fire when risk crosses the threshold, but the cause matters. Cross-modal linkage means the same patient has appeared across two or more modalities and the records have been linked. That scenario calls for a full synthetic rewrite downstream via SynthRewrite-T5, not just redaction. Exposure accumulation, where risk built up within a single modality stream, calls for redact.
Usage
from inference import predict
result = predict({
"risk": 0.72,
"risk_before": 0.43,
"effective_units": 9,
"units_factor": 0.362,
"recency_factor": 0.81,
"link_bonus": 0.20,
"degree": 2,
"confidence": 0.362,
"pseudonym_version": 1,
"triggered": True,
"cross_modal_matches": ["text"],
"modality": "asr",
})
print(result["policy"]) # adaptive_rewrite
print(result["confidence"]) # float
print(result["all_scores"]) # scores for all 5 policies
Loading directly from HuggingFace:
from huggingface_hub import hf_hub_download
import torch
import torch.nn as nn
class PolicyNet(nn.Module):
def __init__(self):
super().__init__()
self.net = nn.Sequential(
nn.Linear(17, 64), nn.LayerNorm(64), nn.ReLU(), nn.Dropout(0.2),
nn.Linear(64, 128), nn.LayerNorm(128), nn.ReLU(), nn.Dropout(0.2),
nn.Linear(128, 64), nn.ReLU(),
nn.Linear(64, 5),
)
def forward(self, x):
return self.net(x)
weights = hf_hub_download("vkatg/exposureguard-policynet", "pytorch_model.bin")
model = PolicyNet()
model.load_state_dict(torch.load(weights, map_location="cpu", weights_only=True))
model.eval()
Input features (17 dimensions)
| Feature | Description |
|---|---|
risk |
Current cumulative risk score |
risk_before |
Risk before this event |
delta_risk |
risk minus risk_before |
eff_units_norm |
effective_units / 50 |
units_factor |
1 - exp(-0.05 * effective_units) |
recency_factor |
0.5^(age_seconds / half_life) |
link_bonus |
0.0 / 0.20 / 0.30 for 1 / 2 / 3+ linked modalities |
degree_norm |
distinct modality count / 5 |
confidence |
same as units_factor |
pseudo_ver_norm |
pseudonym version / 10 |
triggered |
1.0 if threshold crossed this event |
cm_count_norm |
cross-modal match count / 5 |
mod_text ... mod_audio_proxy |
modality one-hot (5 dims) |
Architecture
Linear(17->64) -> LayerNorm -> ReLU -> Dropout(0.2)
Linear(64->128) -> LayerNorm -> ReLU -> Dropout(0.2)
Linear(128->64) -> ReLU
Linear(64->5)
75KB weights file. No external dependencies beyond PyTorch.
Training
6,000 synthetic patient scenarios, balanced at 1,200 samples per policy class. 85/15 train/val split, AdamW, cosine LR schedule, 60 epochs. Val accuracy: 99.89%. Per-class results: raw 100%, weak 99.7%, pseudo 99.8%, redact 100%, adaptive_rewrite 100%. The small number of weak/pseudo misclassifications comes from the adjacent risk ranges (0.35-0.64 and 0.65-0.84) where boundary cases are genuinely ambiguous.
Where it fits
DCPG Risk Scorer
|
ExposureGuard-PolicyNet <- this model
|
+---+-------------------+
| |
adaptive_rewrite redact / pseudo / weak / raw
|
SynthRewrite-T5
Related
- phi-exposure-guard: full system
- dcpg-cross-modal-phi-risk-scorer: produces the risk score and trigger inputs
- exposureguard-dcpg-encoder: graph encoder upstream
- exposureguard-fedcrdt-distill: federated risk scoring
- exposureguard-synthrewrite-t5: downstream rewriter for adaptive_rewrite decisions
- exposureguard-dagplanner: remediation planner
- streaming-phi-deidentification-benchmark: benchmark dataset
- multimodal-phi-masking-benchmark: PHI masking dataset
Citation
@software{exposureguard_policynet,
title = {ExposureGuard-PolicyNet: Stateful Privacy Policy Selection for Streaming Multimodal Clinical Data},
author = {Ganti, Venkata Krishna Azith Teja},
doi = {10.5281/zenodo.18865882},
url = {https://huggingface.co/vkatg/exposureguard-policynet},
note = {US Provisional Patent filed 2025-07-05}
}
Trained on fully synthetic data. Not validated for clinical use.
Datasets used to train vkatg/exposureguard-policynet
Space using vkatg/exposureguard-policynet 1
Evaluation results
- accuracy on streaming-phi-deidentification-benchmarkself-reported0.999