asyncio
Added in version 2.0.
Scrapy has partial support for asyncio
. After you install the
asyncio reactor, you may use asyncio
and
asyncio
-powered libraries in any coroutine.
Installing the asyncio reactor
To enable asyncio
support, your TWISTED_REACTOR
setting needs
to be set to 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'
,
which is the default value.
If you are using CrawlerRunner
, you also need to
install the AsyncioSelectorReactor
reactor manually. You can do that using
install_reactor()
:
install_reactor("twisted.internet.asyncioreactor.AsyncioSelectorReactor")
Handling a pre-installed reactor
twisted.internet.reactor
and some other Twisted imports install the default
Twisted reactor as a side effect. Once a Twisted reactor is installed, it is
not possible to switch to a different reactor at run time.
If you configure the asyncio Twisted reactor and, at run time, Scrapy complains that a different reactor is already installed, chances are you have some such imports in your code.
You can usually fix the issue by moving those offending module-level Twisted imports to the method or function definitions where they are used. For example, if you have something like:
from twisted.internet import reactor
def my_function():
reactor.callLater(...)
Switch to something like:
def my_function():
from twisted.internet import reactor
reactor.callLater(...)
Alternatively, you can try to manually install the asyncio reactor, with install_reactor()
, before
those imports happen.
Integrating Deferred code and asyncio code
Coroutine functions can await on Deferreds by wrapping them into
asyncio.Future
objects. Scrapy provides two helpers for this:
- scrapy.utils.defer.deferred_to_future(d: Deferred[_T]) Future[_T] [source]
Added in version 2.6.0.
Return an
asyncio.Future
object that wraps d.When using the asyncio reactor, you cannot await on
Deferred
objects from Scrapy callables defined as coroutines, you can only await onFuture
objects. WrappingDeferred
objects intoFuture
objects allows you to wait on them:class MySpider(Spider): ... async def parse(self, response): additional_request = scrapy.Request('https://example.org/price') deferred = self.crawler.engine.download(additional_request) additional_response = await deferred_to_future(deferred)
- scrapy.utils.defer.maybe_deferred_to_future(d: Deferred[_T]) Deferred[_T] | Future[_T] [source]
Added in version 2.6.0.
Return d as an object that can be awaited from a Scrapy callable defined as a coroutine.
What you can await in Scrapy callables defined as coroutines depends on the value of
TWISTED_REACTOR
:When using the asyncio reactor, you can only await on
asyncio.Future
objects.When not using the asyncio reactor, you can only await on
Deferred
objects.
If you want to write code that uses
Deferred
objects but works with any reactor, use this function on allDeferred
objects:class MySpider(Spider): ... async def parse(self, response): additional_request = scrapy.Request('https://example.org/price') deferred = self.crawler.engine.download(additional_request) additional_response = await maybe_deferred_to_future(deferred)
Tip
If you don’t need to support reactors other than the default
AsyncioSelectorReactor
, you
can use deferred_to_future()
, otherwise you
should use maybe_deferred_to_future()
.
Tip
If you need to use these functions in code that aims to be compatible
with lower versions of Scrapy that do not provide these functions,
down to Scrapy 2.0 (earlier versions do not support
asyncio
), you can copy the implementation of these functions
into your own code.
Coroutines and futures can be wrapped into Deferreds (for example, when a Scrapy API requires passing a Deferred to it) using the following helpers:
- scrapy.utils.defer.deferred_from_coro(o: _CT) Deferred [source]
- scrapy.utils.defer.deferred_from_coro(o: _T) _T
Converts a coroutine or other awaitable object into a Deferred, or returns the object as is if it isn’t a coroutine.
- scrapy.utils.defer.deferred_f_from_coro_f(coro_f: Callable[_P, Coroutine[Any, Any, _T]]) Callable[_P, Deferred[_T]] [source]
Converts a coroutine function into a function that returns a Deferred.
The coroutine function will be called at the time when the wrapper is called. Wrapper args will be passed to it. This is useful for callback chains, as callback functions are called with the previous callback result.
Enforcing asyncio as a requirement
If you are writing a component that requires asyncio
to work, use scrapy.utils.reactor.is_asyncio_reactor_installed()
to
enforce it as a requirement. For
example:
from scrapy.utils.reactor import is_asyncio_reactor_installed
class MyComponent:
def __init__(self):
if not is_asyncio_reactor_installed():
raise ValueError(
f"{MyComponent.__qualname__} requires the asyncio Twisted "
f"reactor. Make sure you have it configured in the "
f"TWISTED_REACTOR setting. See the asyncio documentation "
f"of Scrapy for more information."
)
- scrapy.utils.reactor.is_asyncio_reactor_installed() bool [source]
Check whether the installed reactor is
AsyncioSelectorReactor
.Raise a
RuntimeError
if no reactor is installed.Changed in version 2.13: In earlier Scrapy versions this function silently installed the default reactor if there was no reactor installed. Now it raises an exception to prevent silent problems in this case.
Windows-specific notes
The Windows implementation of asyncio
can use two event loop
implementations, ProactorEventLoop
(default) and
SelectorEventLoop
. However, only
SelectorEventLoop
works with Twisted.
Scrapy changes the event loop class to SelectorEventLoop
automatically when you change the TWISTED_REACTOR
setting or call
install_reactor()
.
Note
Other libraries you use may require
ProactorEventLoop
, e.g. because it supports
subprocesses (this is the case with playwright), so you cannot use
them together with Scrapy on Windows (but you should be able to use
them on WSL or native Linux).
Using custom asyncio loops
You can also use custom asyncio event loops with the asyncio reactor. Set the
ASYNCIO_EVENT_LOOP
setting to the import path of the desired event
loop class to use it instead of the default asyncio event loop.
Switching to a non-asyncio reactor
If for some reason your code doesn’t work with the asyncio reactor, you can use
a different reactor by setting the TWISTED_REACTOR
setting to its
import path (e.g. 'twisted.internet.epollreactor.EPollReactor'
) or to
None
, which will use the default reactor for your platform.