securisense-phishing-detection / README.md

Auguzcht

Upload SecuriSense model with proper inference support

e4cf656 verified 3 months ago

preview code

raw

history blame contribute delete

5.45 kB

metadata

language: en
license: mit
library_name: transformers
tags:
  - text-classification
  - phishing-detection
  - security
  - bert
  - email-security
pipeline_tag: text-classification
base_model: bert-base-uncased
datasets:
  - custom
metrics:
  - accuracy
  - f1
widget:
  - text: >-
      URGENT: Your account will be suspended! Click here immediately to verify
      your information.
    example_title: Phishing Example
  - text: >-
      Thank you for your purchase. Your order will be shipped within 2-3
      business days.
    example_title: Legitimate Example
  - text: Verify your PayPal account now or it will be closed permanently!
    example_title: Phishing - Payment Scam
  - text: 'Your meeting reminder: Team standup at 10 AM tomorrow.'
    example_title: Legitimate - Meeting Reminder
model-index:
  - name: SecuriSense Phishing Detector
    results:
      - task:
          type: text-classification
          name: Text Classification
        metrics:
          - type: accuracy
            value: 0.9953
            name: Accuracy
          - type: f1
            value: 0.995
            name: F1 Score

SecuriSense: Phishing Email Detection Model

Model Description

SecuriSense is a fine-tuned BERT-base model specialized in detecting phishing emails with 99.54% accuracy. The model analyzes email text to classify messages as either legitimate or phishing attempts.

Developed by: Alfred Dads D. Nodado, Joshua D. Famor, Hanna Keziah T. Sato
Institution: Mapua Malayan College Mindanao
Base Model: bert-base-uncased
Language: English

Intended Use

This model is designed to:

Classify email text as legitimate (LABEL_0) or phishing (LABEL_1)
Assist in email security systems
Educational purposes for cybersecurity awareness
Integration into email filtering applications

Primary Use: Phishing detection in email security systems
Out-of-scope: Non-email text classification, multilingual detection

Training Data

The model was trained on a combined dataset of:

Phishing Email Dataset: 18,650 samples from Kaggle
University of Twente Validation Dataset: 1,000+ samples
Total: 19,650+ labeled emails

The dataset includes both phishing attempts and legitimate emails with various characteristics:

Urgency indicators
Authority claims
Financial requests
Emotional manipulation patterns

Performance

Metric	Score
Accuracy	99.54%
Precision	99.73%
Recall	99.40%
F1 Score	99.56%

How to Use

Quick Start with Pipeline

from transformers import pipeline

# Load the model
classifier = pipeline(
    "text-classification",
    model="Auguzcht/securisense-phishing-detection"
)

# Classify an email
email_text = "URGENT: Your account will be suspended! Click here to verify."
result = classifier(email_text)

print(result)
# Output: [{'label': 'Phishing', 'score': 0.9987}]

Advanced Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "Auguzcht/securisense-phishing-detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare input
text = "Thank you for your purchase. Order #12345 will ship soon."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

# Get prediction
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions).item()
    confidence = predictions[0][predicted_class].item()

# Map to label
label = model.config.id2label[predicted_class]
print(f"Label: {label}, Confidence: {confidence:.4f}")

React/JavaScript Usage

async function detectPhishing(emailText) {
  const response = await fetch(
    "https://huggingface.co/proxy/api-inference.huggingface.co/models/Auguzcht/securisense-phishing-detection",
    {
      headers: { Authorization: `Bearer ${HF_API_TOKEN}` },
      method: "POST",
      body: JSON.stringify({ inputs: emailText }),
    }
  );
  
  const result = await response.json();
  return result;
}

// Usage
const email = "URGENT: Verify your account now!";
const prediction = await detectPhishing(email);
console.log(prediction);

Label Mapping

LABEL_0 / "Legitimate": Safe, legitimate email
LABEL_1 / "Phishing": Phishing attempt or malicious email

Limitations

Trained primarily on English emails
May not detect novel phishing techniques not present in training data
Requires clear text input (HTML should be stripped)
Performance may vary on domain-specific jargon

Ethical Considerations

This model is a tool to assist in security, not a replacement for human judgment
False negatives (missed phishing) can occur - always maintain multiple security layers
Should be used as part of comprehensive email security strategy

Citation

@misc{securisense2025,
  title={SecuriSense: Phishing Detection ML Pipeline},
  author={Nodado, Alfred Dads D. and Famor, Joshua D. and Sato, Hanna Keziah T.},
  year={2025},
  institution={Mapua Malayan College Mindanao},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/Auguzcht/securisense-phishing-detection}}
}

Contact

For questions or issues, please open an issue on the model repository or contact the authors through their institution.

License

MIT License - See LICENSE file for details