The soft time limit for celery was not activated

I am facing an issue with a celery task where the soft limit is set at 10 and the hard limit at 32:

from celery.exceptions import SoftTimeLimitExceeded
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

@app.task(bind=True, acks_late=False, time_limit=32, soft_time_limit=10)
def my_task(self, **kwargs):
    try:
       if 'twisted.internet.reactor' in sys.modules:
            del sys.modules['twisted.internet.reactor']
        settings = get_project_settings()
        process = CrawlerProcess(settings)
        process.crawl(**kwargs)
        process.start()

    except SoftTimeLimitExceeded as te:

        print('Time Exceeded...')

The code mentioned above runs correctly. However, when the crawl operation exceeds the soft limit, no exception is triggered. The crawl operation continues until the hard limit is reached, causing this error to be displayed:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/billiard/pool.py", line 684, in on_hard_timeout
    raise TimeLimitExceeded(job._timeout)
billiard.exceptions.TimeLimitExceeded: TimeLimitExceeded(32,)

I have tried catching this error within the task but was unsuccessful. To test, I replaced the process.start() command with time.sleep(50) to create a delay without starting any crawl operation:

@app.task(bind=True, acks_late=False, time_limit=32, soft_time_limit=10)
def my_task(self, **kwargs):
    try:
       if 'twisted.internet.reactor' in sys.modules:
            del sys.modules['twisted.internet.reactor']
        settings = get_project_settings()
        process = CrawlerProcess(settings)
        process.crawl(**kwargs)
        time.sleep(50)

    except SoftTimeLimitExceeded as te:
        print('Time Exceeded...') 

The catch occurs for SoftTimeLimitExceeded. What could be the reason for this?

Versions

celery==5.2.7

Scrapy==2.6.1

Answer №1

Experiencing the same issue on my end.

I suspect that the error "SoftTimeLimitExceeded" is being caught in your script, preventing it from being raised externally.

You should review your script for any expected Exceptions and either remove them or limit their scope.

 settings = get_project_settings()
 process = CrawlerProcess(settings)
 process.crawl(**kwargs)

This is just my suggestion. I am testing it out on my end, and will continue to provide updates here.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Exploring the dynamic capabilities of Pandas with the use of .cut and

Having both dataframes, here is the first one: x = pd.Series( ["(-20, -10]", "(-140, -130]", "(0, 10]"], dtype = "category") y = pd.Series( ["(0, 50]", "(100, 150]", "(-50, 0]"], dtype = "category") df_xyz = pd.DataFrame({'x_bin': x, 'y_bi ...

Python encountered an ibm_db exception: [IBM][CLI Driver] SQL4917N The element "SQLE_CLIENT_INFO_WRKSTNNAME" in the option array is invalid. SQLCODE=-4917

Encountering an issue when attempting to connect to db2 using the ibm_db python package leads to the following error message. [IBM][CLI Driver] SQL4917N Element "SQLE_CLIENT_INFO_WRKSTNNAME" in the option array is not valid. SQLCODE=-4917 This problem s ...

Modifying the index value in a list within a Tower of Lists

Looking to implement a unique Queue in Python using this specific code snippet data = queue.Queue(maxsize=4) lists = [None] * 3 for i in range(4): data.put(lists) This code sets up a Queue containing 4 Lists, each with three None elements: > print( ...

Having difficulty modifying the custom_field in Jira using Python

Encountering issues while trying to update fields for an problem in Jira using Python. Upon examining the JSON raw data, I found that it is located at: {fields:{'customfield_10000':'some text'} Despite attempting various methods like ...

Navigate through URLs without using wildcards

Can someone please help me with this Python code? It's meant to loop through URLs using a wildcard* that replaces the unique id for each match. The issue I'm facing is that the text to be wildcarded sits between the invariant part of the URL and ...

Encountering a 403 error while trying to access the G Suite Admin SDK through the google-api-python-client

Trying to create a basic script to retrieve the user list from my Google G Suite domain using the Admin SDK Directory API with google-api-python-client. Despite going through numerous documentations and trying multiple requests, I keep encountering this er ...

Error encountered when trying to import a file from a specific directory in Python: `ModuleNotFoundError`

Encountering a moduleNotFoundError issue while trying to import a file from the root directory. Here is the directory structure for my Flask App: index.py auth_ - server.py Pages - home.py I can successfully import home.py from the Pages directory, b ...

Algorithm making inaccurate predictions due to flawed machine learning model

My dataset contains two columns: procedure name and corresponding CPT codes. There are 3000 rows with a total of 5 classes of CPT codes. As part of my project, I am working on building a classification model using this data. However, when providing input ...

Bringing in text using pandas in Python

I need help regarding importing a table into a pandas dataframe. One of the strings in the table contains the special character 'NF-κB' with the 'kappa' symbol. However, when I use pd.read_table to import the table from 'table_pro ...

Is it possible to scrape using Python Beautiful Soup only when the text matches?

I am currently learning how to use beautiful soup by watching videos and trying examples. However, I am facing a challenge where the examples have well-structured HTML layouts and do not search for specific words anywhere. What I want to achieve is to prin ...

Using Python and Selenium to automate searches in the Facebook search bar

My usual method of using selenium to login to Facebook has hit a snag. I am attempting to input keywords into the search bar and retrieve the results, but it seems the new version of the search bar has thrown me off my game. Previously, I could locate th ...

What could be the reason for Python not handling errors in list comprehension?

I seem to have incorrectly placed the "else 0" at the end of a simple list comprehension. I wrapped it inside a try block, expecting the exception to be caught and trigger the print statement. However, instead of that happening, I am getting a SyntaxError: ...

The `get_attribute` function in Python's Selenium module

When using the get_attribute() method in Python Selenium, an error pops up: Are you referring to 'getattribute'? What is the purpose of this function? I am attempting to retrieve the class attribute of the parent element to confirm if I have ...

Combining Starlette and pydantic for optimal performance

Could you provide guidance on using Pydantic to extract user input in a Starlette application? I would appreciate any assistance. class Post(BaseModel): title:str content:str @app.route("/createposts",methods=["POST"]) async def crea ...

The Iterative Minimax Algorithm for Tic Tac Toe

I've been experimenting with creating a tic-tac-toe game using the mini-max algorithm. I started by setting up the board and two players, then converted one player into the algorithm. In my attempt, I referenced a javascript implementation. However, d ...

Ways to spin characters in a python text

Implement a function called rearrange_characters that takes a string as an input, rearranges all the characters in the string sequentially from the start to end index, and then returns a list containing all the combinations in uppercase. ` def rearrange_c ...

Tips on preventing built-in functions from appearing when using the dir function on my modules

Is there a way to prevent built-ins from appearing when using the dir function on a module I have created? For example, I do not want built-in libraries like os, sys, random, struct, time, hashlib, etc. to be displayed. >>> import endesive.pdf. ...

Default tags in Django Sentry are configured to categorize and label specific

Using python-raven to automatically capture all 500 errors in my django project has been a success. Additionally, I have successfully forwarded some exceptions with a special tag for easier filtering. However, the challenge lies in not being able to filter ...

Unable to assign a value to a variable

def simulate_tournament(teams): """Function to simulate a tournament and return the name of the winning team.""" # Check if only one team is left if len(teams) == 1: winner = teams[0] # Assign the only el ...

Showing the heading again after a new page starts

Currently, I am using WeasyPrint to generate a document and facing an issue with long sections that may span multiple pages. When a section breaks across pages, the name of the section does not display after the page break as intended. The following examp ...