You’re right. It seems summarization pipeline was deleted, and there’s “no replacement”
.
I never thought something like this would happen…
What’s happening in your environment
1) pipeline(model="facebook/bart-large-cnn") is inferring the task as "summarization"
facebook/bart-large-cnn is explicitly tagged as a Summarization model on the Hub, and its model card even shows the classic pipeline("summarization", ...) snippet. (Hugging Face)
So when you call:
summarizer = pipeline(model="facebook/bart-large-cnn")
Transformers tries to infer the task from the model’s metadata → summarization → then checks whether "summarization" exists in the pipeline task registry.
2) In Transformers v5, the old seq2seq “text2text” pipelines were removed
Transformers v5 removed Text2TextGenerationPipeline and the related SummarizationPipeline / TranslationPipeline. (GitHub)
That is exactly consistent with your error:
Unknown task summarization, available tasks are [...]
Your environment is “correct” (Transformers 5.1.0 imported from site-packages, torch installed), but the task name is not registered in v5, so the pipeline factory cannot build it.
Why this is so confusing (docs/course mismatch)
Even though v5 removed those pipeline classes, several official pages still show v4-style pipeline usage for summarization:
- The model card for
facebook/bart-large-cnn shows pipeline("summarization", ...). (Hugging Face)
- The Summarization task guide says “the simplest way… is to use it in a
pipeline()” and shows pipeline("summarization", ...), while also providing a manual generate() approach. (Hugging Face)
- The Pipeline tutorial page (v5.1.0 selector visible) still includes a summarization example. (Hugging Face)
So the course notebook (and some docs) are effectively written for the v4 pipeline interface, but you’re running v5 where that interface was removed.
Practical solutions / workarounds
Option A (best if you want the HF Learn notebook unchanged): pin Transformers to v4
This matches the course material and restores pipeline("summarization").
pip install -U "transformers<5"
This is the same approach some downstream tooling adopted temporarily because Transformers v5 removed those pipeline classes. (GitHub)
Option B (best if you want to stay on v5): use generate() (the supported path)
This is the underlying mechanism the old summarization pipeline used. The summarization task guide explicitly shows how to “manually replicate the results of the pipeline” using generate(). (Hugging Face)
Minimal v5-compatible example:
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_id = "facebook/bart-large-cnn"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device).eval()
text = "America has changed dramatically during recent years..."
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
max_length=1024, # BART-large encoder limit
).to(device)
with torch.inference_mode():
ids = model.generate(
**inputs,
max_new_tokens=60,
num_beams=4,
length_penalty=2.0,
no_repeat_ngram_size=3,
early_stopping=True,
)
summary = tokenizer.decode(ids[0], skip_special_tokens=True)
print(summary)
Option C (if you want “pipeline-like” ergonomics on v5)
Create a small wrapper that returns [{ "summary_text": ... }] so your notebook output looks like the v4 pipeline output. (This is essentially Option B plus formatting.)
Summary in one sentence
Your code fails because Transformers v5 no longer registers the "summarization" pipeline task (seq2seq text2text pipelines were removed), but the course + some docs/model cards still show the old v4 pipeline usage, so v5 throws Unknown task summarization. (GitHub)
"""
facebook/bart-large-cnn summarization on 🤗 Transformers v5 (v4-pipeline-like wrapper)
Why this code exists:
- Transformers v5 removed the old SummarizationPipeline / Text2TextGenerationPipeline, so pipeline("summarization") no longer works.
Ref (migration guide): https://github.com/huggingface/transformers/blob/main/MIGRATION_GUIDE_V5.md#pipelines
- This wrapper mimics the v4 pipeline call style: summarizer(text, **kwargs) -> [{"summary_text": "..."}]
Model + its summarization defaults (num_beams, length_penalty, etc.):
- https://huggingface.co/facebook/bart-large-cnn
- Config shows task_specific_params["summarization"] and max_position_embeddings=1024:
https://huggingface.co/facebook/bart-large-cnn/blob/main/config.json
Deps (install in Colab / venv):
pip install -U "transformers[torch]"
# optional: (not required here) accelerate # helps advanced device_map/offload patterns
"""
from __future__ import annotations
import textwrap
from typing import Any, Dict, List, Union
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
def _pick_device_and_dtype() -> tuple[torch.device, torch.dtype]:
"""CPU/GPU safe; float32 if CPU; lower VRAM if GPU (fp16)."""
if torch.cuda.is_available():
return torch.device("cuda"), torch.float16 # T4-safe
return torch.device("cpu"), torch.float32
def _load_bart_cnn(model_id: str) -> tuple[Any, Any, torch.device]:
"""
Load model+tokenizer with low peak RAM where possible.
Falls back to CPU if GPU OOM occurs.
"""
device, dtype = _pick_device_and_dtype()
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
# low_cpu_mem_usage reduces peak CPU RAM during load
model = AutoModelForSeq2SeqLM.from_pretrained(
model_id,
torch_dtype=dtype,
low_cpu_mem_usage=True,
)
try:
model.to(device)
except RuntimeError as e:
# If GPU runs out of memory, fall back to CPU float32 for reliability.
if "CUDA out of memory" in str(e) or "cuda" in str(e).lower():
if torch.cuda.is_available():
torch.cuda.empty_cache()
device = torch.device("cpu")
model = model.to(device, dtype=torch.float32)
else:
raise
model.eval()
return tokenizer, model, device
class V4StyleSummarizer:
"""
Callable wrapper that behaves like the old v4 summarization pipeline:
summarizer(text_or_texts, **gen_kwargs) -> [{"summary_text": "..."}]
"""
def __init__(self, model_id: str = "facebook/bart-large-cnn") -> None:
self.model_id = model_id
self.tokenizer, self.model, self.device = _load_bart_cnn(model_id)
# Pull the classic summarization defaults shipped with the checkpoint (if present).
# See: https://huggingface.co/facebook/bart-large-cnn/blob/main/config.json
self._defaults: Dict[str, Any] = {}
tsp = getattr(self.model.config, "task_specific_params", None) or {}
self._defaults.update(tsp.get("summarization", {}))
# Tokenization defaults for BART-large(-cnn):
# max_position_embeddings=1024 → keep encoder input <= 1024 tokens.
self._max_input_tokens = int(getattr(self.model.config, "max_position_embeddings", 1024))
def __call__(
self,
texts: Union[str, List[str]],
*,
# v4 pipeline commonly accepted generation kwargs; keep signature simple.
max_new_tokens: int = 60,
**gen_kwargs: Any,
) -> List[Dict[str, str]]:
single_input = isinstance(texts, str)
if single_input:
texts = [texts] # type: ignore[assignment]
# Tokenize with truncation for safety on long inputs.
batch = self.tokenizer(
texts, # type: ignore[arg-type]
return_tensors="pt",
padding=True,
truncation=True,
max_length=self._max_input_tokens,
).to(self.device)
# Start from model’s checkpoint defaults, then allow user overrides.
# Note: v4 summarization pipeline used max_length/min_length a lot; here we prefer max_new_tokens
# unless you explicitly pass max_length.
merged = dict(self._defaults)
merged.update(gen_kwargs)
# Prefer explicit max_new_tokens for easier control over output length (especially across versions).
merged.setdefault("max_new_tokens", max_new_tokens)
# If CPU, reduce beams unless the user explicitly requests otherwise (less RAM/compute).
if self.device.type == "cpu":
merged.setdefault("num_beams", 2)
with torch.inference_mode():
out_ids = self.model.generate(**batch, **merged)
summaries = self.tokenizer.batch_decode(
out_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
)
results = [{"summary_text": s} for s in summaries]
return results if not single_input else [results[0]]
def main() -> None:
summarizer = V4StyleSummarizer("facebook/bart-large-cnn")
text = textwrap.dedent(
"""
America has changed dramatically during recent years. Not only has the number of
graduates in traditional engineering disciplines such as mechanical, civil,
electrical, chemical, and aeronautical engineering declined, but in most of
the premier American universities engineering curricula now concentrate on
and encourage largely the study of engineering science. As a result, there
are declining offerings in engineering subjects dealing with infrastructure,
the environment, and related issues, and greater concentration on high
technology subjects, largely supporting increasingly complex scientific
developments. While the latter is important, it should not be at the expense
of more traditional engineering.
Rapidly developing economies such as China and India, as well as other
industrial countries in Europe and Asia, continue to encourage and advance
the teaching of engineering. Both China and India, respectively, graduate
six and eight times as many traditional engineers as does the United States.
Other industrial countries at minimum maintain their output, while America
suffers an increasingly serious decline in the number of engineering graduates
and a lack of well-educated engineers.
"""
).strip()
# v4-like call style: returns [{"summary_text": "..."}]
out = summarizer(text, max_new_tokens=60)
print(out[0]["summary_text"])
if __name__ == "__main__":
main()