Spaces:

LLDDWW
/

MedCard

Sleeping

LLDDWW Claude commited on Oct 13, 2025

Commit

63c2769

1 Parent(s): dcb7540

perf: switch to Gemma-2-2B for faster inference

- Replace MedGemma-4B with Gemma-2-2B (2x smaller, much faster)
- Reduce max_new_tokens from 1536 to 768
- Add timing logs to track OCR and analysis performance
- Target: <30s total processing time

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

Files changed (1) hide show

app.py +16 -6

app.py CHANGED Viewed

@@ -17,8 +17,8 @@ HF_TOKEN = os.getenv("HF_TOKEN")
 if HF_TOKEN:
     login(token=HF_TOKEN.strip())
-# 약 정보 분석 모델 ID (의료 전문)
-MED_MODEL_ID = "google/medgemma-4b-it"
 # 전역 모델 변수 (한 번만 로드)
 OCR_READER = None
@@ -35,7 +35,7 @@ def load_models():
         print("✅ EasyOCR loaded!")
     if MED_MODEL is None:
-        print("🔄 Loading MedGemma-4B for medical analysis (8bit quantization)...")
         MED_MODEL = AutoModelForCausalLM.from_pretrained(
             MED_MODEL_ID,
             torch_dtype=torch.bfloat16,
@@ -69,10 +69,14 @@ def _extract_json_block(text: str) -> Optional[str]:
 @spaces.GPU(duration=120)
 def analyze_medication_image(image: Image.Image) -> Tuple[str, str]:
     """이미지에서 OCR 추출 후 약 정보 분석"""
     try:
         # Step 1: OCR - EasyOCR로 빠르게 텍스트 추출
         img_array = np.array(image)
         ocr_results = OCR_READER.readtext(img_array)
         if not ocr_results:
             return "텍스트를 찾을 수 없습니다.", ""
@@ -82,6 +86,7 @@ def analyze_medication_image(image: Image.Image) -> Tuple[str, str]:
         ocr_text = "\n".join([text for _, text, _ in ocr_results])
         # Step 2: 약 정보 분석 - MedGemma로 의료 정보 제공
         analysis_prompt = f"""다음은 약 봉투나 처방전에서 추출한 텍스트입니다:
@@ -116,7 +121,7 @@ def analyze_medication_image(image: Image.Image) -> Tuple[str, str]:
         with torch.no_grad():
             outputs = MED_MODEL.generate(
                 **inputs,
-                max_new_tokens=1536,
                 temperature=0.7,
                 top_p=0.9,
                 do_sample=True
@@ -124,6 +129,11 @@ def analyze_medication_image(image: Image.Image) -> Tuple[str, str]:
         analysis_text = MED_TOKENIZER.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
         return ocr_text.strip(), analysis_text.strip()
     except Exception as e:
@@ -363,8 +373,8 @@ with gr.Blocks(theme=gr.themes.Soft(), css=CUSTOM_CSS) as demo:
     - AI가 생성한 정보이므로 정확하지 않을 수 있습니다
     **🤖 기술 스택**
-    - EasyOCR (한글+영어, 초고속 OCR - 1초 이내!)
-    - Google MedGemma-4B-IT (8bit 양자화, 의료 전문 모델)
     **🔑 설정 방법**
     - Hugging Face Spaces의 Settings → Repository secrets에서 `HF_TOKEN` 추가 필요

 if HF_TOKEN:
     login(token=HF_TOKEN.strip())
+# 약 정보 분석 모델 ID (빠른 추론을 위해 경량 모델 사용)
+MED_MODEL_ID = "google/gemma-2-2b-it"
 # 전역 모델 변수 (한 번만 로드)
 OCR_READER = None
         print("✅ EasyOCR loaded!")
     if MED_MODEL is None:
+        print("🔄 Loading Gemma-2-2B for medical analysis (8bit quantization)...")
         MED_MODEL = AutoModelForCausalLM.from_pretrained(
             MED_MODEL_ID,
             torch_dtype=torch.bfloat16,
 @spaces.GPU(duration=120)
 def analyze_medication_image(image: Image.Image) -> Tuple[str, str]:
     """이미지에서 OCR 추출 후 약 정보 분석"""
+    import time
     try:
         # Step 1: OCR - EasyOCR로 빠르게 텍스트 추출
+        start_time = time.time()
         img_array = np.array(image)
         ocr_results = OCR_READER.readtext(img_array)
+        ocr_time = time.time() - start_time
+        print(f"⏱️ OCR took {ocr_time:.2f}s")
         if not ocr_results:
             return "텍스트를 찾을 수 없습니다.", ""
         ocr_text = "\n".join([text for _, text, _ in ocr_results])
         # Step 2: 약 정보 분석 - MedGemma로 의료 정보 제공
+        analysis_start = time.time()
         analysis_prompt = f"""다음은 약 봉투나 처방전에서 추출한 텍스트입니다:
         with torch.no_grad():
             outputs = MED_MODEL.generate(
                 **inputs,
+                max_new_tokens=768,
                 temperature=0.7,
                 top_p=0.9,
                 do_sample=True
         analysis_text = MED_TOKENIZER.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
+        analysis_time = time.time() - analysis_start
+        total_time = time.time() - start_time
+        print(f"⏱️ Medical analysis took {analysis_time:.2f}s")
+        print(f"⏱️ Total processing time: {total_time:.2f}s")
         return ocr_text.strip(), analysis_text.strip()
     except Exception as e:
     - AI가 생성한 정보이므로 정확하지 않을 수 있습니다
     **🤖 기술 스택**
+    - EasyOCR (한글+영어, 초고속 OCR)
+    - Google Gemma-2-2B-IT (8bit 양자화, 빠른 의료 정보 분석)
     **🔑 설정 방법**
     - Hugging Face Spaces의 Settings → Repository secrets에서 `HF_TOKEN` 추가 필요