Components

A Scrapy component is any class whose objects are built using build_from_crawler().

That includes the classes that you may assign to the following settings:

Third-party Scrapy components may also let you define additional Scrapy components, usually configurable through settings, to modify their behavior.

Initializing from the crawler

Any Scrapy component may optionally define the following class method:

classmethod from_crawler(cls, crawler: scrapy.crawler.Crawler, *args, **kwargs)

Return an instance of the component based on crawler.

args and kwargs are component-specific arguments that some components receive. However, most components do not get any arguments, and instead use settings.

If a component class defines this method, this class method is called to create any instance of the component.

The crawler object provides access to all Scrapy core components like settings and signals, allowing the component to access them and hook its functionality into Scrapy.

Settings

Components can be configured through settings.

Components can read any setting from the settings attribute of the Crawler object they can get for initialization. That includes both built-in and custom settings.

For example:

class MyExtension:
    @classmethod
    def from_crawler(cls, crawler):
        settings = crawler.settings
        return cls(settings.getbool("LOG_ENABLED"))

    def __init__(self, log_is_enabled=False):
        if log_is_enabled:
            print("log is enabled!")

Components do not need to declare their custom settings programmatically. However, they should document them, so that users know they exist and how to use them.

It is a good practice to prefix custom settings with the name of the component, to avoid collisions with custom settings of other existing (or future) components. For example, an extension called WarcCaching could prefix its custom settings with WARC_CACHING_.

Another good practice, mainly for components meant for component priority dictionaries, is to provide a boolean setting called <PREFIX>_ENABLED (e.g. WARC_CACHING_ENABLED) to allow toggling that component on and off without changing the component priority dictionary setting. You can usually check the value of such a setting during initialization, and if False, raise NotConfigured.

When choosing a name for a custom setting, it is also a good idea to have a look at the names of built-in settings, to try to maintain consistency with them.

Enforcing requirements

Sometimes, your components may only be intended to work under certain conditions. For example, they may require a minimum version of Scrapy to work as intended, or they may require certain settings to have specific values.

In addition to describing those conditions in the documentation of your component, it is a good practice to raise an exception from the __init__ method of your component if those conditions are not met at run time.

In the case of downloader middlewares, extensions, item pipelines, and spider middlewares, you should raise NotConfigured, passing a description of the issue as a parameter to the exception so that it is printed in the logs, for the user to see. For other components, feel free to raise whatever other exception feels right to you; for example, RuntimeError would make sense for a Scrapy version mismatch, while ValueError may be better if the issue is the value of a setting.

If your requirement is a minimum Scrapy version, you may use scrapy.__version__ to enforce your requirement. For example:

from packaging.version import parse as parse_version

import scrapy


class MyComponent:
    def __init__(self):
        if parse_version(scrapy.__version__) < parse_version("2.7"):
            raise RuntimeError(
                f"{MyComponent.__qualname__} requires Scrapy 2.7 or "
                f"later, which allow defining the process_spider_output "
                f"method of spider middlewares as an asynchronous "
                f"generator."
            )

API reference

The following function can be used to create an instance of a component class:

scrapy.utils.misc.build_from_crawler(objcls: type[T], crawler: Crawler, /, *args: Any, **kwargs: Any) T[source]

Construct a class instance using its from_crawler or from_settings constructor.

Added in version 2.12.

*args and **kwargs are forwarded to the constructor.

Raises TypeError if the resulting instance is None.

The following function can also be useful when implementing a component, to report the import path of the component class, e.g. when reporting problems:

scrapy.utils.python.global_object_name(obj: Any) str[source]

Return the full import path of the given object.

>>> from scrapy import Request
>>> global_object_name(Request)
'scrapy.http.request.Request'
>>> global_object_name(Request.replace)
'scrapy.http.request.Request.replace'