When using Whisper, pipeline notifies that generation_config default values have been modified, even for base models

I was finetuning Whisper, and later during inference I noticed a bunch of warnings that were not there when using the base model.

`generation_config` default values have been modified to match model-specific defaults: {'suppress_tokens': [1, 2, 7, 8, 9, 10, 14, 25, 26, 27, 28, 29, 31, 58, 59, 60, 61, 62, 63, 90, 91, 92, 93, 359, 503, 522, 542, 873, 893, 902, 918, 922, 931, 1350, 1853, 1982, 2460, 2627, 3246, 3253, 3268, 3536, 3846, 3961, 4183, 4667, 6585, 6647, 7273, 9061, 9383, 10428, 10929, 11938, 12033, 12331, 12562, 13793, 14157, 14635, 15265, 15618, 16553, 16604, 18362, 18956, 20075, 21675, 22520, 26130, 26161, 26435, 28279, 29464, 31650, 32302, 32470, 36865, 42863, 47425, 49870, 50254, 50258, 50358, 50359, 50360, 50361, 50362], 'begin_suppress_tokens': [220, 50257]}. If this is not desired, please set these values explicitly.
A custom logits processor of type <class 'transformers.generation.logits_process.SuppressTokensLogitsProcessor'> has been passed to `.generate()`, but it was also created in `.generate()`, given its parameterization. The custom <class 'transformers.generation.logits_process.SuppressTokensLogitsProcessor'> will take precedence. Please check the docstring of <class 'transformers.generation.logits_process.SuppressTokensLogitsProcessor'> to see related `.generate()` flags.
A custom logits processor of type <class 'transformers.generation.logits_process.SuppressTokensAtBeginLogitsProcessor'> has been passed to `.generate()`, but it was also created in `.generate()`, given its parameterization. The custom <class 'transformers.generation.logits_process.SuppressTokensAtBeginLogitsProcessor'> will take precedence. Please check the docstring of <class 'transformers.generation.logits_process.SuppressTokensAtBeginLogitsProcessor'> to see related `.generate()` flags.

Being new to finetuning, I got worried that I have messed up my generation_config.json for the finetune. However, I did not see any difference besides forced_decoder_ids (which I quickly learned that is deprecated, so it’s good that my finetune does not have it anymore) and “transformers_version”: “4.57.6” for my finetune versus “4.31.0.dev0” for the base model.

If I replace my transformers_version with “4.31.0.dev0”, the warnings disappear.

To verify that there might be something fishy going on with suppress_tokens and begin_suppress_tokens, I created a simple script that loads the base model, saves generation_config.json using the latest transformers (not v5 yet though) and then I load it back and try using the base with the new generation_config.json.

from transformers import WhisperForConditionalGeneration, AutoProcessor

model_path = "openai/whisper-small"
cache_dir="./models"

model = WhisperForConditionalGeneration.from_pretrained(model_path, cache_dir=cache_dir)
processor = AutoProcessor.from_pretrained(model_path, cache_dir=cache_dir)

# disable deprecated behavior
model.generation_config.forced_decoder_ids = None

save_directory = "./output"
model.generation_config.save_pretrained(save_directory)

Then I replace the base model’s generation_config.json with the one from “./output” and run the following script:

from transformers import pipeline

# I know this path is not supposed to be this way, 
# but I just want to reuse the same download folder
# with the replaced generation_config.json.
model_path = "./models/models--openai--whisper-small/snapshots/973afd24965f72e36ca33b3055d56a652f456b4d"

pipe = pipeline(
    "automatic-speech-recognition",
    model=model_path
)

result = pipe("./dumps/input/8467578323502011.wav", generate_kwargs={
            "language": "german",
            "task": "transcribe" 
        })
transcription = result["text"]

print(transcription)

Yep, still the only difference was transformers_version and got the same warnings for the new version, and it went away if I changed transformers_version.

Why does it detect modifications of suppress_tokens and begin_suppress_tokens, if I load the base model with the original values, as downloaded from Huggingface openai repository?

I suspect it has something to do with this description from the docs:

suppress_tokens (list[int], optional) — A list containing the non-speech tokens that will be used by the logit processor in the generate function. NON_SPEECH_TOKENS and NON_SPEECH_TOKENS_MULTI each correspond to the english-only and the multilingual model.
begin_suppress_tokens (list[int], optional, defaults to [220,50256]) — A list containing tokens that will be suppressed at the beginning of the sampling process. Initialized as the token for " " (blank_token_id) and the eos_token_id

This one looks suspicious - why is it [220,50256], if the original Openai models have 50257 here?

"begin_suppress_tokens": [
    220,
    50257
  ],

NON_SPEECH_TOKENS and NON_SPEECH_TOKENS_MULTI might also differ from the base generation_config.json because it clearly cannot have both of them.

Then I thought - if transformers is smart enough to detect the difference, then it clearly should be able to just use the internal values and don’t care about the values in the file at all. So I removed suppress_tokens and begin_suppress_tokens from generation_config.json. Great, now the only warning is:

return_token_timestamps` is deprecated for WhisperFeatureExtractor and will be removed in Transformers v5. Use `return_attention_mask` instead, as the number of frames can be inferred from it.

Still, it’s a bit sad that something in transformers can generate generation_config.json that it later complains about. And also 50257 → 50256 difference is puzzling. Which number is right? Could this difference bite me someday?

1 Like

Seems mostly specific to Whisper’s Transformers implementation and doesn’t seem likely to cause much harm in the future…?


What is happening in your case

1) The “default values have been modified…” warning is about merging, not corruption

Newer transformers versions can merge a passed/loaded GenerationConfig with model-specific defaults, and warn when it has to “patch” values like suppress_tokens and begin_suppress_tokens. This shows up a lot with Whisper because those fields are Whisper-specific decoding defaults and are sometimes stored/derived from the model/config and then copied into generation config paths. (GitHub)

A common trigger is: some generation parameters are present in the model config rather than the generation config, and Transformers warns when it moves/aligns them. That exact explanation is given in the Optimum Whisper issue that reproduces this warning. (GitHub)

Practical meaning: the model will still use the intended Whisper defaults; the warning is telling you that the library had to reconcile “global defaults” vs “model defaults.” (Hugging Face)


2) The 50256 vs 50257 difference is model-family dependent

For multilingual Whisper checkpoints (like openai/whisper-small), the published generation_config.json uses:

  • eos_token_id: 50257
  • begin_suppress_tokens: [220, 50257] (Hugging Face)

For English-only checkpoints (like openai/whisper-small.en), the published generation_config.json uses 50256 as the EOS and correspondingly uses it in begin_suppress_tokens. (Hugging Face)

So both are “right” depending on whether you’re using multilingual vs English-only Whisper. If you hardcode the wrong one, you could suppress the wrong EOS at the beginning (rarely catastrophic, but it can change early decoding behavior).


3) The “custom logits processor … also created in .generate()” warnings

These come from passing a SuppressTokensLogitsProcessor / SuppressTokensAtBeginLogitsProcessor explicitly while Whisper/generate() also creates them internally from suppress_tokens and begin_suppress_tokens. (Hugging Face)

This is commonly seen in Whisper contexts (base or fine-tuned) and is usually just telling you about precedence, not an error. (Hugging Face)


4) return_token_timestamps deprecation

That warning is separate: it is a known deprecation in the Whisper feature extractor path, with guidance to use return_attention_mask instead (v5 removal). (GitHub)


Why changing "transformers_version" makes the warnings disappear

Treat this as an implementation detail: Transformers uses metadata like transformers_version as part of its compatibility/migration handling for configs. Changing it can route you away from the code path that emits the warning, but it’s not a robust or recommended fix.

Best practice: don’t edit transformers_version in cache files to silence warnings. Instead, make your intended generation behavior explicit (next section).


Best practices (recommended)

A) Don’t modify the HF cache snapshot directory

Always load from a normal model folder (your fine-tune output dir) and keep config.json + generation_config.json together as saved by save_pretrained() / Trainer. This avoids surprising “mixed provenance” config states.

B) Keep Whisper’s suppression defaults unless you have a reason

suppress_tokens and begin_suppress_tokens exist specifically for Whisper decoding; removing them changes behavior. The processors are explicitly documented and “originally created for Whisper.” (Hugging Face)

C) If you want silence and stability: set values explicitly

The most robust approach is to pass the model’s own values explicitly rather than relying on merging heuristics:

gen = pipe.model.generation_config
result = pipe(audio, generate_kwargs={
    "language": "german",
    "task": "transcribe",
    "suppress_tokens": gen.suppress_tokens,
    "begin_suppress_tokens": gen.begin_suppress_tokens,
})

This directly addresses what the warning asks (“set these values explicitly”).

D) Don’t pass custom suppress-token logits processors unless you truly need to

If you pass custom SuppressTokens* processors, Whisper/generate() may also build them from config, causing the duplicate warnings. (Hugging Face)

E) Avoid mixing max_length and max_new_tokens

Many Whisper configs include max_length defaults (often 448). If you additionally set max_new_tokens, you’ll get the “both set” warning. This is normal generation behavior; pick one. (Hugging Face)

F) For timestamps: plan the v5 change now

If you use token timestamps, switch to return_attention_mask per the deprecation warning. (GitHub)


What “could bite you someday?”

  1. Hardcoding 50256/50257: could mismatch multilingual vs English-only Whisper (use the checkpoint’s own eos_token_id / begin_suppress_tokens). (Hugging Face)
  2. Deleting suppression lists: can change decoding behavior because those processors are Whisper-specific and intended. (Hugging Face)
  3. Relying on metadata hacks (transformers_version edits): may break across upgrades.

Minimal “do this” checklist for fine-tuned Whisper

  • Save and load from your fine-tune output directory using save_pretrained().
  • Use language + task (avoid legacy forced_decoder_ids where possible; it’s deprecated and shows up in deprecation warnings). (GitHub)
  • Pass suppress_tokens and begin_suppress_tokens explicitly if you want stability and fewer warnings. (Hugging Face)

Thank you for the explanations. Yeah, I guess no harm in general, the model (and the finetune) works as expected. It’s just strange that roundtripping even the default base model through model.generation_config.save_pretrained and pipeline pipe() suddenly causes these warnings, as if save_pretrained is not aware of what pipeline expects. The “custom logits processor” warning seems to be just a side effect of this suppress_tokens consolidation - if no suppress_tokens in the config file, then both warnings disappear.

So, I guess it boils down to the question, why model.generation_config.save_pretrained saves generation_config.json that later causes warnings in the pipeline. It would be reasonable to expect that save_pretrained would clean up / normalize suppress_tokens to correspond to the actual model, taking into account multilingual aspect, if needed, considering that transformers library has the hardcoded arrays in the Python code. Or am I missing something in my save_pretrained code and it needs more arguments to save proper generation_config.json that would not cause the warnings?

1 Like

Now that you mention it, that is odd behavior for a library…

It seems the Whisper implementation in Transformers is in a transitional phase regarding which settings reside in config.json and generation_config.json…

Manually cleaning the JSON might cause issues with older Transformers implementations, so suppressing the warning might be the more stable approach…?


Why save_pretrained() → pipeline() can start warning on a base Whisper model

This behavior is largely a version/compatibility + “defaults merging” effect, not a sign that your saved generation_config.json is “wrong”.

1) save_pretrained() is a serializer, not a “pipeline-normalizer”

model.generation_config.save_pretrained(...) writes out the current GenerationConfig object (plus metadata like transformers_version). It does not try to “canonicalize” Whisper settings against internal hardcoded tables or against what pipeline() will do later.

At inference time, generate() has its own rules for filling in “unset” values:

  • Transformers explicitly documents that generation config fields that are still None are overridden during generation, and if you want different behavior you should set them explicitly. (Hugging Face)

So “what gets saved” and “what generate() decides to fill/override at runtime” are intentionally separate concerns.

2) Whisper is a special case because some generation params historically live in the model config

In several Whisper-related stacks, you can see Transformers logging:

  • “Moving the following attributes in the config to the generation config …”
  • then the exact warning you saw: “generation_config default values have been modified to match model-specific defaults …” (GitHub)

That shows what’s going on: depending on version, Transformers may treat some Whisper generation settings as coming from config.json and then “consolidate” them into generation_config at runtime, which can trigger “defaults modified” warnings.

3) The warnings are not unique to your finetune

The same pair of warnings (generation_config … modified… plus the two “custom logits processor … also created” lines) has been observed in Whisper usage outside your setup. (GitHub)


Why the “custom logits processor” warning disappears if you remove suppress_tokens keys

Whisper uses suppress_tokens and begin_suppress_tokens to construct two logits processors.

  • If those fields are present, generate() can create those processors automatically.
  • If something else (pipeline/wrapper code) also passes a processor list, Transformers detects duplication and warns (your “custom logits processor … but it was also created” messages). (GitHub)

When you delete suppress_tokens / begin_suppress_tokens, you prevent the auto-creation path, so the duplicate warning goes away. The tradeoff is: you’re also changing how suppression is applied unless the caller re-injects suppression another way.


“Shouldn’t save_pretrained normalize suppress_tokens using the hardcoded arrays?”

Not necessarily, because the model checkpoint is allowed to define its own suppress_tokens / begin_suppress_tokens (and multilingual vs English-only differs). The docs explicitly describe these fields as model-dependent (English-only vs multilingual have different non-speech token lists). (Hugging Face)

So “normalizing” to internal constants could be wrong for some checkpoints or future variants.


Best practice (stable behavior, minimal surprises)

1) Save the whole model directory, not just generation config

Prefer:

model.save_pretrained(out_dir)
processor.save_pretrained(out_dir)

This keeps config.json and generation_config.json consistent for the version that produced them.

2) Inference: pick one owner for suppression

To avoid the duplicate-logits-processor warnings, ensure suppression is applied exactly once:

  • Most common: don’t pass custom logits_processor; let Whisper/generate() create suppression processors from generation_config.
  • If you do pass custom suppression processors, then also disable auto-creation by setting suppress_tokens=None / begin_suppress_tokens=None in the config you pass to generate() (otherwise you risk duplication).

3) Make your intent explicit rather than relying on “defaults merging”

Transformers’ docs note that None fields can be overridden during generation; explicitly setting values avoids “defaults got patched” situations. (Hugging Face)


Are you missing an argument to save_pretrained()?

No. There isn’t a “save in pipeline-compatible canonical form” flag. The warnings come from runtime merging/migration logic and (sometimes) duplicate ownership of suppression processors, not from you omitting a parameter in save_pretrained(). (GitHub)

Yeah, for distributing finetunes it definitely makes sense to stick to the old-style configs, just to play safe.
But for internal and personal use when I know I’ll be using newer transformers and don’t want to see those pesky warnings, I think it’s ok to remove all the legacy config settings that pipeline can fill in with the correct defaults from its internal constants.

1 Like