---
language: en
license: mit
library_name: transformers
tags:
- text-classification
- phishing-detection
- security
- bert
- email-security
pipeline_tag: text-classification
base_model: bert-base-uncased
datasets:
- custom
metrics:
- accuracy
- f1
widget:
- text: "URGENT: Your account will be suspended! Click here immediately to verify your information."
  example_title: "Phishing Example"
- text: "Thank you for your purchase. Your order will be shipped within 2-3 business days."
  example_title: "Legitimate Example"
- text: "Verify your PayPal account now or it will be closed permanently!"
  example_title: "Phishing - Payment Scam"
- text: "Your meeting reminder: Team standup at 10 AM tomorrow."
  example_title: "Legitimate - Meeting Reminder"
model-index:
- name: SecuriSense Phishing Detector
  results:
  - task:
      type: text-classification
      name: Text Classification
    metrics:
    - type: accuracy
      value: 0.9953
      name: Accuracy
    - type: f1
      value: 0.995
      name: F1 Score
---

# SecuriSense: Phishing Email Detection Model

## Model Description

SecuriSense is a fine-tuned BERT-base model specialized in detecting phishing emails with **99.54% accuracy**. The model analyzes email text to classify messages as either legitimate or phishing attempts.

**Developed by:** Alfred Dads D. Nodado, Joshua D. Famor, Hanna Keziah T. Sato  
**Institution:** Mapua Malayan College Mindanao  
**Base Model:** bert-base-uncased  
**Language:** English

## Intended Use

This model is designed to:
- Classify email text as legitimate (LABEL_0) or phishing (LABEL_1)
- Assist in email security systems
- Educational purposes for cybersecurity awareness
- Integration into email filtering applications

**Primary Use:** Phishing detection in email security systems  
**Out-of-scope:** Non-email text classification, multilingual detection

## Training Data

The model was trained on a combined dataset of:
- Phishing Email Dataset: 18,650 samples from Kaggle
- University of Twente Validation Dataset: 1,000+ samples
- **Total:** 19,650+ labeled emails

The dataset includes both phishing attempts and legitimate emails with various characteristics:
- Urgency indicators
- Authority claims
- Financial requests
- Emotional manipulation patterns

## Performance

| Metric | Score |
|--------|-------|
| Accuracy | 99.54% |
| Precision | 99.73% |
| Recall | 99.40% |
| F1 Score | 99.56% |

## How to Use

### Quick Start with Pipeline
```python
from transformers import pipeline

# Load the model
classifier = pipeline(
    "text-classification",
    model="Auguzcht/securisense-phishing-detection"
)

# Classify an email
email_text = "URGENT: Your account will be suspended! Click here to verify."
result = classifier(email_text)

print(result)
# Output: [{'label': 'Phishing', 'score': 0.9987}]
```

### Advanced Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "Auguzcht/securisense-phishing-detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare input
text = "Thank you for your purchase. Order #12345 will ship soon."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

# Get prediction
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions).item()
    confidence = predictions[0][predicted_class].item()

# Map to label
label = model.config.id2label[predicted_class]
print(f"Label: {label}, Confidence: {confidence:.4f}")
```

### React/JavaScript Usage
```javascript
async function detectPhishing(emailText) {
  const response = await fetch(
    "https://api-inference.huggingface.co/models/Auguzcht/securisense-phishing-detection",
    {
      headers: { Authorization: `Bearer ${HF_API_TOKEN}` },
      method: "POST",
      body: JSON.stringify({ inputs: emailText }),
    }
  );
  
  const result = await response.json();
  return result;
}

// Usage
const email = "URGENT: Verify your account now!";
const prediction = await detectPhishing(email);
console.log(prediction);
```

## Label Mapping

- **LABEL_0** / **"Legitimate"**: Safe, legitimate email
- **LABEL_1** / **"Phishing"**: Phishing attempt or malicious email

## Limitations

- Trained primarily on English emails
- May not detect novel phishing techniques not present in training data
- Requires clear text input (HTML should be stripped)
- Performance may vary on domain-specific jargon

## Ethical Considerations

- This model is a tool to assist in security, not a replacement for human judgment
- False negatives (missed phishing) can occur - always maintain multiple security layers
- Should be used as part of comprehensive email security strategy

## Citation
```bibtex
@misc{securisense2025,
  title={SecuriSense: Phishing Detection ML Pipeline},
  author={Nodado, Alfred Dads D. and Famor, Joshua D. and Sato, Hanna Keziah T.},
  year={2025},
  institution={Mapua Malayan College Mindanao},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/Auguzcht/securisense-phishing-detection}}
}
```

## Contact

For questions or issues, please open an issue on the model repository or contact the authors through their institution.

## License

MIT License - See LICENSE file for details