Remove the setting of _attn_implementation from llama_bidirectional_model

#3

Remove _attn_implementation from LlamaBidirectionalModel constructor.
Following the transformers 4.48.0 release which inclided a refactor of attention implementations. (transformers/pull/35235🚨All attention refactor🚨), In transformers versions 4.47.0 this line did not have an impact on the attention implementation. Following 4.48.0 this line was activated by the new attention initialization and activated "eager" attention instead of "sdpa".

@nvidia-oliver-holworthy does this model also support flash_attention or does the support only apply to 8b research model?

nvidia-oliver-holworthy changed pull request status to merged

@simonschoe Yes this model also supports flash_attention_2. Note that this change was related to trying to support a wider set of transformers versions. (We have previouly supported only transfomers==4.47.1)

The full fix to get this model working in more versions of transformers will be merged in #16

Sign up or log in to comment