Queuing background tasks in Django
RQ and Django-RQ
RQ is a lightweight Python library for managing background tasks. It allows developers to offload time-consuming operations - such as sending emails, processing images, or performing heavy calculations - to background workers. These workers process the tasks asynchronously, improving the performance of your application by keeping the request-response cycle fast and responsive.
DjangoRQ is an integration package that simplifies using RQ with Django . It provides tools for managing task queues, workers, and even a built-in admin interface to monitor tasks.
The Dependency Issue in Task Scheduling
In Django, task scheduling often occurs during the request-response cycle. For example, a user action triggers a database transaction, and a background task is enqueued to handle additional processing.
@transaction.atomic
@login_required
def schedule_long_running_task(request):
instance = MyModel.objects.create(some_data=1)
job = long_running_task.delay(mymodel_pk=instance.pk)
# some additional processing, delays or fail
return render(request, "task_scheduled.html")
However, this can create a significant issue:
-
Asynchronous Execution
Since RQ tasks are processed independently, there’s no guarantee that the database state will be ready when the worker starts.
In the example above the
instance
model will not be stored into database until end of transaction, andlong_running_task
can fail due to a missing record. -
Transaction Dependency
If the RQ task starts executing before the database transaction is committed, the task may access incomplete or inconsistent data, leading to errors.
Let’s assume that
instance
is not created but just updated in the view. Database table will not be updated until end of transaction, solong_running_task
could operate on the state of previous record (depending on the table-level lock mode). -
Computation when transaction fails
Enqueuing tasks without ensuring the database transaction has committed can also cause heavy processing, where
long_running_task
will operate on previous data.
Solving the Problem
To resolve this, you can leverage Django
transaction.on_commit
,
which ensures that tasks are enqueued only after the database transaction has
successfully committed. This approach provides a clean and reliable solution,
avoiding the pitfalls of premature task execution while maintaining the
responsiveness of the request-response cycle.
The example could look like this:
@transaction.atomic
@login_required
def schedule_long_running_task(request):
instance = MyModel.objects.create(some_data=1)
def enqueue_task():
# this code is executed in a non-transactional state
job = long_running_task.delay(mymodel_pk=instance.pk)
transaction.on_commit(enqueue_task)
# some additional processing, delays or fail
return render(request, "task_scheduled.html")
This change will guarantee that long_running_task
will work with proper data.
Final words
Outside request-response
The transaction.on_commit
could be used outside request-response
flow. It could be useful when you need to separate your app logic from
transport/presentation layer. That’s why it is better than
request_finished
signal.
Gotchas
If transaction.on_commit
is executed outside transaction,
the callback function will be executed immediately.