Remove the setting of _attn_implementation from llama_bidirectional_model

by nvidia-oliver-holworthy - opened Nov 6, 2025

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

-1

nvidia-oliver-holworthy

NVIDIA org Nov 6, 2025

Remove _attn_implementation from LlamaBidirectionalModel constructor.
Following the transformers 4.48.0 release which inclided a refactor of attention implementations. (transformers/pull/35235🚨All attention refactor🚨), In transformers versions 4.47.0 this line did not have an impact on the attention implementation. Following 4.48.0 this line was activated by the new attention initialization and activated "eager" attention instead of "sdpa".

Remove the setting of _attn_implementation from llama_bidirectional_model137fa5fc

simonschoe

Nov 29, 2025

•

edited Nov 29, 2025

@nvidia-oliver-holworthy does this model also support flash_attention or does the support only apply to 8b research model?

nvidia-oliver-holworthy changed pull request status to merged 6 days ago

nvidia-oliver-holworthy

NVIDIA org 6 days ago

@simonschoe Yes this model also supports flash_attention_2. Note that this change was related to trying to support a wider set of transformers versions. (We have previouly supported only transfomers==4.47.1)

The full fix to get this model working in more versions of transformers will be merged in #16

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment