Real Estate Domain Translator (Phi-3)
This is a fine-tuned Phi-3-mini model specialized for translating natural language queries into domain-aware Snowflake SQL for real estate investment analysis.
Model Details
- Base Model: microsoft/Phi-3-mini-4k-instruct
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Domain: Real Estate Investment Analysis
- Training Data: 154 examples extracted from real estate query patterns
- Training Loss Improvement: 84% (1.62 โ 0.26)
Capabilities
The model understands real estate domain terminology including:
- Markets: Atlanta, Dallas, Houston, Charlotte, Orlando, etc.
- Disposition Categories: Project Skyline, DQ, DiMaggio Kicks, etc.
- Rental Metrics: CCMModelRent, CMOR1mFwd, TenantRent, ModelRent, IndexRent
- Property Attributes: SQFT, Year Built, Beds, Baths, UnitStatus, DailyStatus
- Valuation Metrics: CoreLogicAVM, PricePercentile50, XGBPrice, BayesPriceMean
- Financial Metrics: GrossPrice, NetPrice, RegularBPO, QuickSaleBPO
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
# Load model
base_model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct", trust_remote_code=True)
model = PeftModel.from_pretrained(base_model, "bgarvey1/real-estate-domain-translator-phi3")
tokenizer = AutoTokenizer.from_pretrained("bgarvey1/real-estate-domain-translator-phi3")
# Generate SQL
prompt = "### Domain: Real Estate Analysis\n### Task: Translate to SQL\n### Input: Show me the top 5 markets by unit count\n### Output:\n"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
Example Translations
Input: "Show me the top 5 markets by unit count"
Output: SELECT "Market", SUM("Count") as unit_count FROM FKH_ML_DATA.ML_DATA.STATIC_TABLE_CACHE GROUP BY "Market" ORDER BY unit_count DESC LIMIT 5
Input: "List Project Skyline properties that are listed"
Output: SELECT "DispoCategory", "DispoStatus" FROM FKH_ML_DATA.ML_DATA.DISPO_CACHE WHERE "DispoCategory" = 'Project Skyline' AND "DispoStatus" = 'Listed'
Training Details
- Training Steps: 186 steps across 3 epochs
- LoRA Configuration: r=16, alpha=32, dropout=0.1
- Target Modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
- Training Data: Real estate domain queries with business logic patterns
- Validation: Domain-specific evaluation on real estate terminology
This model serves as an intelligent translator between general language models and real estate domain-specific terminology and business logic.
- Downloads last month
- 1
Model tree for bgarvey1/real-estate-domain-translator-phi3
Base model
microsoft/Phi-3-mini-4k-instruct