Summarization task is not recognized in pipeline()

I have been trying to run the example notebook for summarization in the “Transformers, what can they do?” section of the first chapter, directly from the Google Colab link provided in the course. However, when I try to run the summarization example, I get the error “task not available.” The output shows a list of available tasks, but “summarization” is not included.

I would appreciate any guidance on this error. It is quite frustrating to find that the first notebook one tries to run does not work properly.

I have checked the version of the transformers package and it is the latest (5.1.0).

Thanks in advance.

Lorenzo Fernandez

1 Like

the version of the transformers package and it is the latest (5.1.0).

I suspect that major update from Transformers v4 to v5 is the primary cause.

If you don’t want to change your code, you can pin to v4 (pip install “transformers<5”). However, if the solution for v5 works, that would be preferable:


What the error actually means

When you call:

from transformers import pipeline
summarizer = pipeline("summarization")

pipeline() looks up the string "summarization" in an internal registry of supported tasks. If it can’t find it, it raises “task not available” and prints the registry keys it did load.

So this specific failure is not “the model can’t summarize” — it’s “your runtime didn’t load a Transformers build/config where the summarization task alias is registered”, even though the Hugging Face course explicitly expects it to exist. (Hugging Face)


Why "summarization" can be missing (most likely causes)

1) You’re not running the environment you think you are (Colab / kernel state)

In notebooks, it’s easy to get into a state where:

  • pip shows one version, but Python imported another copy already in memory
  • you upgraded/downgraded packages after importing transformers and didn’t restart the runtime

How to detect
Run this in the same cell right before you call pipeline():

import transformers
from transformers import pipeline

print("transformers version:", transformers.__version__)
print("transformers file:", transformers.__file__)
print("pipeline module:", pipeline.__module__)

If transformers.__file__ points somewhere unexpected (or you see anything like /content/transformers.py), you’re not using the library you think you are.

Fix
In Colab, after installing/upgrading, always Restart runtime (Runtime → Restart runtime), then re-run cells from the top.


2) Version/dependency mismatch in Transformers v5 (PyTorch too old / conflicts)

Transformers v5 tightened requirements. The PyPI installation guidance states Transformers works with Python 3.9+ and PyTorch 2.4+ and recommends installing via pip install "transformers[torch]". (PyPI)

In managed environments (Colab, Databricks, etc.), you can end up with:

  • Transformers upgraded to v5.x
  • but PyTorch kept at an older pinned version (because other preinstalled packages constrain it)

This kind of mismatch can produce partial imports / altered availability of components, and the pipeline task registry you see can differ from what the course assumes.

How to check

import torch, transformers
print(torch.__version__)
print(transformers.__version__)

Fix options

  • Option A (stay on v5): upgrade torch to meet the requirement (may require uninstalling conflicting torch/torchvision/torchaudio first in Colab).
  • Option B (most stable for courses): pin Transformers to a compatible v4 release that matches the course material better.

3) You’re using a changed/removed API (common in the v5 transition)

A related (but different) breakage in the v5 era: some classes like SummarizationPipeline were removed/relocated, which broke third-party libraries that imported them directly. (PyPI)
That’s not your exact symptom, but it’s a sign that v5 introduced pipeline-related breaking changes and some course notebooks / integrations may lag behind.


Quick workarounds that bypass "summarization" entirely

Even if "summarization" isn’t recognized as a task string, you can still create a working summarizer in two reliable ways.

Workaround A: Let the model define the task (recommended)

Transformers docs note you can omit the task if the model on the Hub already defines it. (Hugging Face)

from transformers import pipeline

# Choose a summarization checkpoint explicitly
summarizer = pipeline(model="facebook/bart-large-cnn")
print(summarizer("Long text here...", max_new_tokens=60))

This often works even when the task alias lookup is broken, because it skips the "summarization" registry key.

Workaround B: Use "text2text-generation" with a summarization model

Summarization is fundamentally a seq2seq text-to-text generation task. (Hugging Face)

from transformers import pipeline

summarizer = pipeline("text2text-generation", model="facebook/bart-large-cnn")
print(summarizer("Long text here...", max_new_tokens=60))

Output field names may differ (often generated_text instead of summary_text), but the result is still a summary.


A practical “fix-it” sequence for your exact Colab case

Run these steps in order.

Step 1 — Confirm what you imported

import transformers
print(transformers.__version__)
print(transformers.__file__)

If this doesn’t show what you expect, restart runtime and/or reinstall cleanly.

Step 2 — Check torch compatibility (important for Transformers v5)

import torch
print(torch.__version__)

If < 2.4, expect issues with v5 per the installation guidance. (PyPI)

Step 3 — Clean reinstall (then restart runtime)

In a fresh cell:

pip uninstall -y transformers
pip install -U "transformers[torch]"

Then restart runtime and test again.

Step 4 — If the notebook still fails, use Workaround A

This will usually get you unblocked immediately:

from transformers import pipeline
summarizer = pipeline(model="facebook/bart-large-cnn")
summarizer("...", max_new_tokens=60)

Similar cases / issues reported online (context)

“Unknown task summarization” (usually older Transformers or mismatched environment)

A classic StackOverflow thread shows pipeline("summarization") failing with a task-not-found style error; the resolution was effectively “you’re on a version that doesn’t include it / upgrade Transformers”. (Stack Overflow)
Your case is “v5 but missing alias”, but the debugging logic is the same: verify the imported package and kernel state.

v5 pipeline API breakage affecting downstream libraries

An MLflow issue documents that pipeline classes such as SummarizationPipeline were removed in Transformers 5.0, breaking code that imported them directly. (PyPI)
This supports the idea that the v5 transition is a realistic source of “things that used to work in notebooks suddenly don’t”.

Your exact report exists on the HF forums (Feb 2026)

The same symptom (Colab + HF course notebook + Transformers 5.1.0 + missing "summarization") has been posted to the HF forums. (Hugging Face Forums)


Good references (guides / docs / tutorials)

  • HF course page that expects pipeline("summarization") to work (the notebook you’re following) (Hugging Face)
  • Transformers pipeline tutorial (includes a summarization example using a specific checkpoint like google/pegasus-billsum) (Hugging Face)
  • Task guide: Summarization (background + practical training/inference patterns) (Hugging Face)
  • AWS tutorial using pipeline("summarization") in practice (deployment-oriented context) (Amazon Web Services, Inc.)
  • HF task page (Summarization) with inputs/outputs and model discovery (Hugging Face)

One more common pitfall after you fix the task: long inputs

Even once the pipeline runs, many summarization models have input length limits. For long documents you typically need:

  • chunking (split text, summarize chunks, then summarize summaries), or
  • a long-context summarization model (LED, LongT5, Pegasus-X, etc.)

HF forum threads discuss this “summarization on long text” issue and model suggestions. (Hugging Face Forums)


Bottom line for your situation

Given that the HF course explicitly lists "summarization" as a pipeline task (Hugging Face) and the same failure is being reported with Transformers 5.1.0 in Colab (Hugging Face Forums), the most productive path is:

  1. verify import path + restart runtime,
  2. check torch >= 2.4 for v5, (PyPI)
  3. if still broken, bypass the alias with pipeline(model="facebook/bart-large-cnn") (Workaround A). (Hugging Face)

Thank you for your quick answer. I have tested your solution proposal, but the result is the same. The code I have executed is the following:

!pip uninstall transformers
!pip install datasets evaluate transformers[torch]
import transformers
print(transformers.__version__)
print(transformers.__file__)
5.1.0
/usr/local/lib/python3.12/dist-packages/transformers/__init__.py
import torch
print(torch.__version__)
2.9.0+cu126
from transformers import pipeline
summarizer = pipeline(model="facebook/bart-large-cnn")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
""", max_new_tokens=60)

and the output has been the same:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipython-input-1819463455.py in <cell line: 0>()
      1 from transformers import pipeline
----> 2 summarizer = pipeline(model="facebook/bart-large-cnn")
      3 summarizer(
      4     """
      5     America has changed dramatically during recent years. Not only has the number of

2 frames
/usr/local/lib/python3.12/dist-packages/transformers/pipelines/base.py in check_task(self, task)
   1354             raise KeyError(f"Invalid translation task {task}, use 'translation_XX_to_YY' format")
   1355 
-> 1356         raise KeyError(
   1357             f"Unknown task {task}, available tasks are {self.get_supported_tasks() + ['translation_XX_to_YY']}"
   1358         )

KeyError: "Unknown task summarization, available tasks are ['any-to-any', 'audio-classification', 'automatic-speech-recognition', 'depth-estimation', 'document-question-answering', 'feature-extraction', 'fill-mask', 'image-classification', 'image-feature-extraction', 'image-segmentation', 'image-text-to-text', 'image-to-image', 'keypoint-matching', 'mask-generation', 'ner', 'object-detection', 'question-answering', 'sentiment-analysis', 'table-question-answering', 'text-classification', 'text-generation', 'text-to-audio', 'text-to-speech', 'token-classification', 'video-classification', 'visual-question-answering', 'vqa', 'zero-shot-audio-classification', 'zero-shot-classification', 'zero-shot-image-classification', 'zero-shot-object-detection', 'translation_XX_to_YY']"

However, if I use transformer<5 everything runs without problems.

Of course, this error is not a major issue, but since it appears in the notebooks used for the tutorial, leaving it unfixed could be a source of frustration for newcomers.

Thank you for your time. Sincerely.

Lorenzo Fernandez

1 Like

You’re right. It seems summarization pipeline was deleted, and there’s “no replacement” :scream:.
I never thought something like this would happen…


What’s happening in your environment

1) pipeline(model="facebook/bart-large-cnn") is inferring the task as "summarization"

facebook/bart-large-cnn is explicitly tagged as a Summarization model on the Hub, and its model card even shows the classic pipeline("summarization", ...) snippet. (Hugging Face)

So when you call:

summarizer = pipeline(model="facebook/bart-large-cnn")

Transformers tries to infer the task from the model’s metadata → summarization → then checks whether "summarization" exists in the pipeline task registry.

2) In Transformers v5, the old seq2seq “text2text” pipelines were removed

Transformers v5 removed Text2TextGenerationPipeline and the related SummarizationPipeline / TranslationPipeline. (GitHub)

That is exactly consistent with your error:

Unknown task summarization, available tasks are [...]

Your environment is “correct” (Transformers 5.1.0 imported from site-packages, torch installed), but the task name is not registered in v5, so the pipeline factory cannot build it.


Why this is so confusing (docs/course mismatch)

Even though v5 removed those pipeline classes, several official pages still show v4-style pipeline usage for summarization:

  • The model card for facebook/bart-large-cnn shows pipeline("summarization", ...). (Hugging Face)
  • The Summarization task guide says “the simplest way… is to use it in a pipeline()” and shows pipeline("summarization", ...), while also providing a manual generate() approach. (Hugging Face)
  • The Pipeline tutorial page (v5.1.0 selector visible) still includes a summarization example. (Hugging Face)

So the course notebook (and some docs) are effectively written for the v4 pipeline interface, but you’re running v5 where that interface was removed.


Practical solutions / workarounds

Option A (best if you want the HF Learn notebook unchanged): pin Transformers to v4

This matches the course material and restores pipeline("summarization").

pip install -U "transformers<5"

This is the same approach some downstream tooling adopted temporarily because Transformers v5 removed those pipeline classes. (GitHub)

Option B (best if you want to stay on v5): use generate() (the supported path)

This is the underlying mechanism the old summarization pipeline used. The summarization task guide explicitly shows how to “manually replicate the results of the pipeline” using generate(). (Hugging Face)

Minimal v5-compatible example:

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id = "facebook/bart-large-cnn"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device).eval()

text = "America has changed dramatically during recent years..."

inputs = tokenizer(
    text,
    return_tensors="pt",
    truncation=True,
    max_length=1024,  # BART-large encoder limit
).to(device)

with torch.inference_mode():
    ids = model.generate(
        **inputs,
        max_new_tokens=60,
        num_beams=4,
        length_penalty=2.0,
        no_repeat_ngram_size=3,
        early_stopping=True,
    )

summary = tokenizer.decode(ids[0], skip_special_tokens=True)
print(summary)

Option C (if you want “pipeline-like” ergonomics on v5)

Create a small wrapper that returns [{ "summary_text": ... }] so your notebook output looks like the v4 pipeline output. (This is essentially Option B plus formatting.)


Summary in one sentence

Your code fails because Transformers v5 no longer registers the "summarization" pipeline task (seq2seq text2text pipelines were removed), but the course + some docs/model cards still show the old v4 pipeline usage, so v5 throws Unknown task summarization. (GitHub)


"""
facebook/bart-large-cnn summarization on 🤗 Transformers v5 (v4-pipeline-like wrapper)

Why this code exists:
- Transformers v5 removed the old SummarizationPipeline / Text2TextGenerationPipeline, so pipeline("summarization") no longer works.
  Ref (migration guide): https://github.com/huggingface/transformers/blob/main/MIGRATION_GUIDE_V5.md#pipelines
- This wrapper mimics the v4 pipeline call style: summarizer(text, **kwargs) -> [{"summary_text": "..."}]

Model + its summarization defaults (num_beams, length_penalty, etc.):
- https://huggingface.co/facebook/bart-large-cnn
- Config shows task_specific_params["summarization"] and max_position_embeddings=1024:
  https://huggingface.co/facebook/bart-large-cnn/blob/main/config.json

Deps (install in Colab / venv):
  pip install -U "transformers[torch]"
  # optional: (not required here) accelerate  # helps advanced device_map/offload patterns
"""

from __future__ import annotations

import textwrap
from typing import Any, Dict, List, Union

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM


def _pick_device_and_dtype() -> tuple[torch.device, torch.dtype]:
    """CPU/GPU safe; float32 if CPU; lower VRAM if GPU (fp16)."""
    if torch.cuda.is_available():
        return torch.device("cuda"), torch.float16  # T4-safe
    return torch.device("cpu"), torch.float32


def _load_bart_cnn(model_id: str) -> tuple[Any, Any, torch.device]:
    """
    Load model+tokenizer with low peak RAM where possible.
    Falls back to CPU if GPU OOM occurs.
    """
    device, dtype = _pick_device_and_dtype()

    tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)

    # low_cpu_mem_usage reduces peak CPU RAM during load
    model = AutoModelForSeq2SeqLM.from_pretrained(
        model_id,
        torch_dtype=dtype,
        low_cpu_mem_usage=True,
    )

    try:
        model.to(device)
    except RuntimeError as e:
        # If GPU runs out of memory, fall back to CPU float32 for reliability.
        if "CUDA out of memory" in str(e) or "cuda" in str(e).lower():
            if torch.cuda.is_available():
                torch.cuda.empty_cache()
            device = torch.device("cpu")
            model = model.to(device, dtype=torch.float32)
        else:
            raise

    model.eval()
    return tokenizer, model, device


class V4StyleSummarizer:
    """
    Callable wrapper that behaves like the old v4 summarization pipeline:
      summarizer(text_or_texts, **gen_kwargs) -> [{"summary_text": "..."}]
    """

    def __init__(self, model_id: str = "facebook/bart-large-cnn") -> None:
        self.model_id = model_id
        self.tokenizer, self.model, self.device = _load_bart_cnn(model_id)

        # Pull the classic summarization defaults shipped with the checkpoint (if present).
        # See: https://huggingface.co/facebook/bart-large-cnn/blob/main/config.json
        self._defaults: Dict[str, Any] = {}
        tsp = getattr(self.model.config, "task_specific_params", None) or {}
        self._defaults.update(tsp.get("summarization", {}))

        # Tokenization defaults for BART-large(-cnn):
        # max_position_embeddings=1024 → keep encoder input <= 1024 tokens.
        self._max_input_tokens = int(getattr(self.model.config, "max_position_embeddings", 1024))

    def __call__(
        self,
        texts: Union[str, List[str]],
        *,
        # v4 pipeline commonly accepted generation kwargs; keep signature simple.
        max_new_tokens: int = 60,
        **gen_kwargs: Any,
    ) -> List[Dict[str, str]]:
        single_input = isinstance(texts, str)
        if single_input:
            texts = [texts]  # type: ignore[assignment]

        # Tokenize with truncation for safety on long inputs.
        batch = self.tokenizer(
            texts,  # type: ignore[arg-type]
            return_tensors="pt",
            padding=True,
            truncation=True,
            max_length=self._max_input_tokens,
        ).to(self.device)

        # Start from model’s checkpoint defaults, then allow user overrides.
        # Note: v4 summarization pipeline used max_length/min_length a lot; here we prefer max_new_tokens
        # unless you explicitly pass max_length.
        merged = dict(self._defaults)
        merged.update(gen_kwargs)

        # Prefer explicit max_new_tokens for easier control over output length (especially across versions).
        merged.setdefault("max_new_tokens", max_new_tokens)

        # If CPU, reduce beams unless the user explicitly requests otherwise (less RAM/compute).
        if self.device.type == "cpu":
            merged.setdefault("num_beams", 2)

        with torch.inference_mode():
            out_ids = self.model.generate(**batch, **merged)

        summaries = self.tokenizer.batch_decode(
            out_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
        )

        results = [{"summary_text": s} for s in summaries]
        return results if not single_input else [results[0]]


def main() -> None:
    summarizer = V4StyleSummarizer("facebook/bart-large-cnn")

    text = textwrap.dedent(
        """
        America has changed dramatically during recent years. Not only has the number of
        graduates in traditional engineering disciplines such as mechanical, civil,
        electrical, chemical, and aeronautical engineering declined, but in most of
        the premier American universities engineering curricula now concentrate on
        and encourage largely the study of engineering science. As a result, there
        are declining offerings in engineering subjects dealing with infrastructure,
        the environment, and related issues, and greater concentration on high
        technology subjects, largely supporting increasingly complex scientific
        developments. While the latter is important, it should not be at the expense
        of more traditional engineering.

        Rapidly developing economies such as China and India, as well as other
        industrial countries in Europe and Asia, continue to encourage and advance
        the teaching of engineering. Both China and India, respectively, graduate
        six and eight times as many traditional engineers as does the United States.
        Other industrial countries at minimum maintain their output, while America
        suffers an increasingly serious decline in the number of engineering graduates
        and a lack of well-educated engineers.
        """
    ).strip()

    # v4-like call style: returns [{"summary_text": "..."}]
    out = summarizer(text, max_new_tokens=60)
    print(out[0]["summary_text"])


if __name__ == "__main__":
    main()