Retrieve search results from Bing using Python

I am currently working on a project to develop a Python-based chatbot that can retrieve search results from Bing. However, my efforts have been hindered by the outdated Python 2 code and reliance on Google API in most available resources online. The catch is, I am located in China where access to YouTube, Google, Azure, Microsoft Docs, and all related platforms is restricted. My ideal output for the search results would look like this:

This is the title
https://this-is-the-link.com

This is the second title
https://this-is-the-second-link.com

Below is the snippet of code I have tried so far:

import requests
import bs4
import re
import urllib.request
from bs4 import BeautifulSoup
page = urllib.request.urlopen("https://www.bing.com/search?q=programming")
soup = BeautifulSoup(page.read())
links = soup.findAll("a")
for link in links:
    print(link["href"])

However, the current output I receive is not what I expected:

/?FORM=Z9FD1
javascript:void(0);
javascript:void(0);
/rewards/dashboard
/rewards/dashboard
javascript:void(0);
/?scope=web&FORM=HDRSC1
/images/search?q=programming&FORM=HDRSC2
/videos/search?q=programming&FORM=HDRSC3
/maps?q=programming&FORM=HDRSC4
/news/search?q=programming&FORM=HDRSC6
/shop?q=programming&FORM=SHOPTB
http://go.microsoft.com/fwlink/?LinkId=521839
http://go.microsoft.com/fwlink/?LinkID=246338
https://go.microsoft.com/fwlink/?linkid=868922
http://go.microsoft.com/fwlink/?LinkID=286759
https://go.microsoft.com/fwlink/?LinkID=617297

If anyone has insights or suggestions on how to improve this (I'm using Python 3.6.9 on Ubuntu), I would greatly appreciate it.

Answer №1

It appears that the issue lies within the HTTP request headers, not the code you've written. By default, urllib uses Python-urllib/{version} as the value for the User-Agent header, making it easy for websites to detect automated requests. To remedy this, you can specify a custom user agent by passing a Request object as the first parameter of urlopen():

from urllib.parse import urlencode, urlunparse
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup

query = "programming"
url = urlunparse(("https", "www.bing.com", "/search", "", urlencode({"q": query}), ""))
custom_user_agent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0"
req = Request(url, headers={"User-Agent": custom_user_agent})
page = urlopen(req)
# Further code remains unchanged
soup = BeautifulSoup(page.read())
links = soup.findAll("a")
for link in links:
    print(link["href"])

P.S. Be sure to check out the comment left by @edd under your question.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

What's the best way to determine which of the two forms has been submitted in Django?

On my homepage, I have both a log_in and sign_up form. Initially, the log_in form is displayed by default, but when a user clicks on the Sign Up button, the sign_up form appears. These toggles switch depending on which button the user clicks. from django ...

Encountered CSRF validation error while working with a Python Django backend in conjunction with React frontend using Axios for making POST requests

I recently completed a tutorial at and now I'm attempting to add a POST functionality to it. Despite obtaining the csrf from cookies and including it in the "csrfmiddlewaretoken" variable alongside a test message in json format for the axios function ...

Learn how to retrieve values from a .json file in real-time and then perform comparisons with user input using Python

I have a JSON file structured like this: [ { "name": { "common": "Aruba", "official": "Aruba", "native": { "nld": { "official ...

Flask does not provide a direct boolean value for checkboxes

After struggling for a week, I am still lost on where to make changes in my code. I need the checkbox to return a boolean value in my Flask application. Below are snippets of the relevant code: mycode.py import os, sqlite3 from flask import Flask, flash ...

What is the best way to extract the singular PDF link from a webpage?

Currently, I am attempting to utilize Selenium in Java to access DOM elements. However, I have encountered an issue while testing the code: Exception in thread "main" org.openqa.selenium.StaleElementReferenceException: stale element reference: element is n ...

Button click event is not being triggered by Ajax rendering

I am facing an issue with my Django template that showcases scheduled classes for our training department. Each item in the list has a roster button which, when clicked, should display the class roster in a div. This functionality works perfectly. However, ...

Having trouble grouping data on website due to duplicate names

Currently in the process of scraping data from various websites and looping through it. Specifically focusing on scraping the head section. This is an example illustrating the structure of the data: const head = { rel_data: [ { rel: &quo ...

Is it possible to access a hidden JavaScript variable in Selenium?

Is there a way to extract the array "o" that contains data used for drawing a polygon? Simply using driver.execute("return o") or console.log doesn't seem to work. Any suggestions on how to achieve this? const zt = function(e, t, n, r) { c ...

Scraping JavaScript Content Webpages with VBA

I'm attempting to extract a table from the Drainage Services Department website. I've written the VBA code below, but it doesn't seem to be working. I suspect that the issue lies in the fact that this particular table is generated using Java ...

Executing a JavaScript code in a Python webdriver: A step-by-step guide

Using Selenium 2 Python webdriver: I encountered an issue where I needed to click on a hidden element due to a hover effect. In search of solutions to unhide and select the element, I came across the following examples: Example in Java: JavascriptExecut ...

Error message stating that there is no property 'collection' in Firestore when using Firebase v9 modular syntax in Firebase Firestore

Working on a React application that makes use of Firebase Firestore for handling database operations, I recently upgraded to Firebase version 9 and adopted the modular syntax for importing Firebase services. Nevertheless, when attempting to utilize the co ...

Invoke a Python function from JavaScript

As I ask this question, I acknowledge that it may have been asked many times before. If I missed the answers due to my ignorance, I apologize. I have a hosting plan that restricts me from installing Django, which provided a convenient way to set up a REST ...

Exploring the capabilities of arrays within Ajax

Below is the original code I wrote in JavaScript: var wt_val = []; for (i = 0; i<human_wt.length; i++){ var mult; mult = data_list[basket_list[button_port_name][i]].map(x => x*(wt[i]/100)); wt_val.push(mult); ...