Parameters are contradictory use_cache=False

#66

by lemon0703 - opened Aug 28, 2025

Aug 28, 2025

No matter how you set the use_cache parameter, an error occurs.
(TypeError: Gemma3ForConditionalGeneration.init() got an unexpected keyword argument 'use_cache')
use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False.
why?

BalakrishnaCh

Google org Sep 1, 2025

Hi @lemon0703 ,

Thanks for reaching out to us, the following are the only valid parameters for the from_pretrained method with their default values from Gemma3ForConditionalGeneration class:

def from_pretrained(
cls: type[SpecificPreTrainedModelType],
pretrained_model_name_or_path: Optional[Union[str, os.PathLike]],
*model_args,
config: Optional[Union[PretrainedConfig, str, os.PathLike]] = None,
cache_dir: Optional[Union[str, os.PathLike]] = None,
ignore_mismatched_sizes: bool = False,
force_download: bool = False,
local_files_only: bool = False,
token: Optional[Union[str, bool]] = None,
revision: str = "main",
use_safetensors: Optional[bool] = None,
weights_only: bool = True,
**kwargs,
)

Thanks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment