--- language: en license: mit library_name: transformers tags: - text-classification - phishing-detection - security - bert - email-security pipeline_tag: text-classification base_model: bert-base-uncased datasets: - custom metrics: - accuracy - f1 widget: - text: "URGENT: Your account will be suspended! Click here immediately to verify your information." example_title: "Phishing Example" - text: "Thank you for your purchase. Your order will be shipped within 2-3 business days." example_title: "Legitimate Example" - text: "Verify your PayPal account now or it will be closed permanently!" example_title: "Phishing - Payment Scam" - text: "Your meeting reminder: Team standup at 10 AM tomorrow." example_title: "Legitimate - Meeting Reminder" model-index: - name: SecuriSense Phishing Detector results: - task: type: text-classification name: Text Classification metrics: - type: accuracy value: 0.9953 name: Accuracy - type: f1 value: 0.995 name: F1 Score --- # SecuriSense: Phishing Email Detection Model ## Model Description SecuriSense is a fine-tuned BERT-base model specialized in detecting phishing emails with **99.54% accuracy**. The model analyzes email text to classify messages as either legitimate or phishing attempts. **Developed by:** Alfred Dads D. Nodado, Joshua D. Famor, Hanna Keziah T. Sato **Institution:** Mapua Malayan College Mindanao **Base Model:** bert-base-uncased **Language:** English ## Intended Use This model is designed to: - Classify email text as legitimate (LABEL_0) or phishing (LABEL_1) - Assist in email security systems - Educational purposes for cybersecurity awareness - Integration into email filtering applications **Primary Use:** Phishing detection in email security systems **Out-of-scope:** Non-email text classification, multilingual detection ## Training Data The model was trained on a combined dataset of: - Phishing Email Dataset: 18,650 samples from Kaggle - University of Twente Validation Dataset: 1,000+ samples - **Total:** 19,650+ labeled emails The dataset includes both phishing attempts and legitimate emails with various characteristics: - Urgency indicators - Authority claims - Financial requests - Emotional manipulation patterns ## Performance | Metric | Score | |--------|-------| | Accuracy | 99.54% | | Precision | 99.73% | | Recall | 99.40% | | F1 Score | 99.56% | ## How to Use ### Quick Start with Pipeline ```python from transformers import pipeline # Load the model classifier = pipeline( "text-classification", model="Auguzcht/securisense-phishing-detection" ) # Classify an email email_text = "URGENT: Your account will be suspended! Click here to verify." result = classifier(email_text) print(result) # Output: [{'label': 'Phishing', 'score': 0.9987}] ``` ### Advanced Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer model_name = "Auguzcht/securisense-phishing-detection" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Prepare input text = "Thank you for your purchase. Order #12345 will ship soon." inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) # Get prediction with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_class = torch.argmax(predictions).item() confidence = predictions[0][predicted_class].item() # Map to label label = model.config.id2label[predicted_class] print(f"Label: {label}, Confidence: {confidence:.4f}") ``` ### React/JavaScript Usage ```javascript async function detectPhishing(emailText) { const response = await fetch( "https://api-inference.huggingface.co/models/Auguzcht/securisense-phishing-detection", { headers: { Authorization: `Bearer ${HF_API_TOKEN}` }, method: "POST", body: JSON.stringify({ inputs: emailText }), } ); const result = await response.json(); return result; } // Usage const email = "URGENT: Verify your account now!"; const prediction = await detectPhishing(email); console.log(prediction); ``` ## Label Mapping - **LABEL_0** / **"Legitimate"**: Safe, legitimate email - **LABEL_1** / **"Phishing"**: Phishing attempt or malicious email ## Limitations - Trained primarily on English emails - May not detect novel phishing techniques not present in training data - Requires clear text input (HTML should be stripped) - Performance may vary on domain-specific jargon ## Ethical Considerations - This model is a tool to assist in security, not a replacement for human judgment - False negatives (missed phishing) can occur - always maintain multiple security layers - Should be used as part of comprehensive email security strategy ## Citation ```bibtex @misc{securisense2025, title={SecuriSense: Phishing Detection ML Pipeline}, author={Nodado, Alfred Dads D. and Famor, Joshua D. and Sato, Hanna Keziah T.}, year={2025}, institution={Mapua Malayan College Mindanao}, publisher={Hugging Face}, howpublished={\url{https://huggingface.co/Auguzcht/securisense-phishing-detection}} } ``` ## Contact For questions or issues, please open an issue on the model repository or contact the authors through their institution. ## License MIT License - See LICENSE file for details