Retrieving Base64 Images in Python Selenium: Step-by-Step Guide

Question

Retrieving Base64 Images in Python Selenium: Step-by-Step Guide

Trying to fetch base64 captcha images using Python Selenium has been a challenge.

The issue I'm encountering is that I can only access the HTML right before the images are loaded.

Here are the steps I've taken:

# importing necessary packages

from selenium.webdriver import EdgeOptions
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.microsoft import EdgeChromiumDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = EdgeOptions()
options.add_argument("--headless")
options.add_argument('disable-gpu')
driver = webdriver.Edge(service=Service(EdgeChromiumDriverManager().install()), options=options)

# accessing the website
driver.get("https://boards.4channel.org/o/")
# opening a post
driver.execute_script("document.getElementsByClassName('mobilePostFormToggle mobile hidden button')[0].click()")
# clicking on this ID
driver.execute_script("document.getElementById('t-load').click()")

# Getting the initial html - before javascript
html1 = driver.page_source

# Retrieving the html after javascript execution
html2 = driver.execute_script("return document.documentElement.innerHTML;")

# Checking for base64 images
'Loading' in html1  # True
'Loading' in html2  # True

# Further verification for base64 images
'data:image/png;base64' in html1  # False
'data:image/png;base64' in html2  # False

The relevant HTML object seems to be:

<button id="t-load" type="button" data-board="o" data-tid="0" style="font-size: 11px; padding: 0px; width: 90px; box-sizing: border-box; margin: 0px 6px 0px 0px; vertical-align: middle; height: 18px;">Get Captcha</button>

python selenium selenium-webdriver

Answer 1

Answer №1

After carefully reviewing the question and OP's comments, it becomes evident that the main challenge lies in obtaining the base64 captcha image presented on the screen. This image is actually comprised of two base64 images with different sizes, making the task of retrieving, decoding, and merging them into an exact replica quite complex. However, the solution provided below addresses this issue effectively:

from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
from PIL import Image

import undetected_chromedriver as uc


options = uc.ChromeOptions()
options.add_argument("--no-sandbox")
options.add_argument('--disable-notifications')
options.add_argument("--window-size=1280,720")
options.add_argument('--ignore-certificate-errors')
options.add_argument('--allow-running-insecure-content')
# options.add_argument('--headless')

browser = uc.Chrome(options=options)

wait = WebDriverWait(browser, 20)
url = 'https://boards.4channel.org/o/'
browser.get(url) 

wait.until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT, 'Start a New Thread'))).click()
t.sleep(1)
wait.until(EC.element_to_be_clickable((By.XPATH, '//button[@id="t-load"]'))).click()
captcha_img_background = wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@id="t-bg"]')))
captcha_img_background.screenshot('full_captcha_image.png')
print('got captcha!')
b64img_background = wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@id="t-bg"]'))).get_attribute('style').split('url("data:image/png;base64,')[1].split('");')[0]
bgimgdata = base64.b64decode(b64img_background)
with open('bg_image.png', 'wb') as f:
    f.write(bgimgdata)
print('also saved the base64 image as bg_image.png')

This code snippet allows for capturing the complete captcha image as it appears on the screen, encompassing both background and foreground elements. This can prove useful for tasks such as ML training data creation.

UPDATE: The code has been revised to illustrate the process of decoding and storing a base64 image (specifically focusing on saving the background image that may require horizontal scrolling).

Switching to undetected chromedriver was necessary when both Firefox and Chrome failed to render the captcha images successfully.

For more information on undetected chromedriver, refer to the documentation here: https://github.com/ultrafunkamsterdam/undetected-chromedriver

To explore Selenium further, check out their official documentation: https://www.selenium.dev/documentation/

Answer 2

After carefully reviewing the question and OP's comments, it becomes evident that the main challenge lies in obtaining the base64 captcha image presented on the screen. This image is actually comprised of two base64 images with different sizes, making the task of retrieving, decoding, and merging them into an exact replica quite complex. However, the solution provided below addresses this issue effectively:

from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
from PIL import Image

import undetected_chromedriver as uc


options = uc.ChromeOptions()
options.add_argument("--no-sandbox")
options.add_argument('--disable-notifications')
options.add_argument("--window-size=1280,720")
options.add_argument('--ignore-certificate-errors')
options.add_argument('--allow-running-insecure-content')
# options.add_argument('--headless')

browser = uc.Chrome(options=options)

wait = WebDriverWait(browser, 20)
url = 'https://boards.4channel.org/o/'
browser.get(url) 

wait.until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT, 'Start a New Thread'))).click()
t.sleep(1)
wait.until(EC.element_to_be_clickable((By.XPATH, '//button[@id="t-load"]'))).click()
captcha_img_background = wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@id="t-bg"]')))
captcha_img_background.screenshot('full_captcha_image.png')
print('got captcha!')
b64img_background = wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@id="t-bg"]'))).get_attribute('style').split('url("data:image/png;base64,')[1].split('");')[0]
bgimgdata = base64.b64decode(b64img_background)
with open('bg_image.png', 'wb') as f:
    f.write(bgimgdata)
print('also saved the base64 image as bg_image.png')

This code snippet allows for capturing the complete captcha image as it appears on the screen, encompassing both background and foreground elements. This can prove useful for tasks such as ML training data creation.

UPDATE: The code has been revised to illustrate the process of decoding and storing a base64 image (specifically focusing on saving the background image that may require horizontal scrolling).

Switching to undetected chromedriver was necessary when both Firefox and Chrome failed to render the captcha images successfully.

For more information on undetected chromedriver, refer to the documentation here: https://github.com/ultrafunkamsterdam/undetected-chromedriver

To explore Selenium further, check out their official documentation: https://www.selenium.dev/documentation/

Answer 3

Answer №2

Hey there, @John Stud! I ran your code in my environment without headless mode to see what was going on. Here's what I found:

The hidden mobilePostFormToggle button was not clicking on the link "[Start a New Thread]", so I made a change to successfully click on it.

I located the link with the xpath "//div[@id='togglePostFormLink']/a[text()='Start a New Thread']" and used driver.execute_script to click it.

You set a 60-second wait time before clicking on the captcha button, but in my opinion, 2-3 seconds should be enough. After clicking the captcha button, however, it may take longer to load. I ended up setting a 40-second wait time in my browser due to an error that prevented the image from launching properly.

If you need further assistance with this issue, let me know!

https://i.stack.imgur.com/1to55.png

Answer 4

Hey there, @John Stud! I ran your code in my environment without headless mode to see what was going on. Here's what I found:

The hidden mobilePostFormToggle button was not clicking on the link "[Start a New Thread]", so I made a change to successfully click on it.

I located the link with the xpath "//div[@id='togglePostFormLink']/a[text()='Start a New Thread']" and used driver.execute_script to click it.

You set a 60-second wait time before clicking on the captcha button, but in my opinion, 2-3 seconds should be enough. After clicking the captcha button, however, it may take longer to load. I ended up setting a 40-second wait time in my browser due to an error that prevented the image from launching properly.

If you need further assistance with this issue, let me know!

https://i.stack.imgur.com/1to55.png

Retrieving Base64 Images in Python Selenium: Step-by-Step Guide

Answer №1

Answer №2

Similar questions

How to Transform JSON into a List of Lists Using Python

What's with all the super fast content popping up on my browser when I begin my intern test?

Guide on launching selenium tests from an executable jar file on a separate device

Tips on utilizing json.tool in the command line to validate and format language files while preserving unicode characters

Using QListView to Customize Column Display in QTableView

Fixture in Py.test: Implement function fixture within scope fixture

Building a new DataFrame by combining multiple DataFrames based on predetermined conditions

Selenium and Python team up to effortlessly navigate a dropdown menu loop!

Automated Script for Installing Python and Executing Python Code

Is the element visible before the specified wait time? If so, will the implicit/explicit wait still wait until the specified time or click?

What could be causing issues with the functionality of specflow tags in my scenarios?

How to assign multiple values to a single key in a Python dictionary

Generate a fresh jpg file upon user triggering the route in Flask using Ajax

Is there a way to parse through a collection of JSON strings in Python without using keys?

Python sqlite3 json query in Conda environment and release environment from python.org

What could be causing my Selenium web scraping to fail on my Digital Ocean droplet?

Python code to transform a dictionary into binary format

Struggling to Save a Python Function's Output to an External .txt File

Creating a Bar Chart by Year in Matplotlib

Issue with drag and drop functionality in Selenium/Java when dealing with hidden elements