Utilizing Selenium to extract engagement data, such as likes and comments, from a photo on Facebook

Excited to obtain the specific content as outlined in the title. I have successfully figured out how to log in and retrieve photos from any profile I search for. However, I am facing an issue when trying to access comments or likes on selected photos. Despite Chromedriver clicking on the photo to display it, I am unable to extract this information. Ideally, while the photo is displayed, I would like to capture only the number of likes (highlighted in blue) and the comments visible on the right panel.

I have searched extensively but have not found any helpful tutorials or posts to address my problem.

Below is a snippet of the code showing how I am currently fetching photos:

time.sleep(4)
images = []
for category in ['photos_of', 'photos_all']:
    driver.get("https:/www.facebook.com/userBasedOnURL/" + category + "/")
    time.sleep(3)
    n_scrolls = 3
    for _ in range(1, n_scrolls):
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(3)
        anchors = driver.find_elements_by_tag_name('a')
        anchors = [a.get_attribute('href') for a in anchors]
        anchors = [a for a in anchors if str(a).startswith("https://www.facebook.com/photo")]
        
        for link in anchors:
            driver.get(link)
            time.sleep(3)
            image = driver.find_elements_by_tag_name("img")
            images.append(image[0].get_attribute("src"))
images

Thank you in advance for your help, and apologies for any language errors.

Answer №1

In order to locate the number of likes beneath a photo within a photo viewer, you can use the following XPath expression:

//div[@aria-label="Photo Viewer"]//span[@class='pcp91wgn']

Alternatively, in CSS selector format:

div[aria-label="Photo Viewer"] span[class="pcp91wgn"]

The comments section can be found using this CSS Selector:

div[aria-label="Photo Viewer"] div[class="ecm0bbzt e5nlhep0 a8c37x1j"]

or with XPath syntax:

//div[@aria-label="Photo Viewer"]//div[@class="ecm0bbzt e5nlhep0 a8c37x1j"]

Therefore, you can go through the list of links like so:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

photoview_comment_xpath = '//div[@aria-label="Photo Viewer"]//div[@class="ecm0bbzt e5nlhep0 a8c37x1j"]'
photoview_likes_amount_css = 'div[aria-label="Photo Viewer"] span[class="pcp91wgn"]'

likes = []
comments = []
for link in links:
    driver.get(link)
    like = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, photoview_likes_amount_css))).text.strip()
    like.append(like)
    comments_elements = driver.find_elements_by_xpath(photoview_comment_xpath)
    for comment in comments_elements:
        comments.append(comment.text.strip())

It's unlikely that you will find a tutorial specific to every website. The key is to understand how to identify reliable and unique locators for web elements using XPath and css selectors.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Discovering elements using Selenium in a JavaScript popup box

The issue at hand is rather straightforward. I am faced with the task of clicking on an element within a popup that has been dynamically generated by JavaScript code. The challenge arises as the page is solely accessible in Internet Explorer and the elemen ...

Setting the parameter n_batches_per_layer in tf.estimator.BoostedTreesClassifier: A guide to optimizing your model

When determining the n_batches_per_layer, what mindset should I adopt? After conducting hyperparameter tuning, I currently choose 10 from the options [1, 10, 100]. Upon further reflection, dividing my data by the batch_size gives me 6.8 batches. This lead ...

A more Pythonic approach to managing JSON responses in Python 2.7

Currently, I am using Python 2.7 to fetch JSON data from an API. Here is the code snippet I have been working on: import urllib2 URL = "www.website.com/api/" response = urllib2.urlopen(URL) data = json.load(response) my_variable = data['location&apo ...

Retrieving the most recent data in a Dask dataframe with duplicate dates in the index column

Although I'm quite familiar with pandas dataframes, Dask is new to me and I'm still trying to grasp the concept of parallelizing my code effectively. I've managed to achieve my desired outcomes using pandas and pandarallel, but now I'm ...

When setting up my profile on Python Selenium, I am unable to retrieve the link

Hello, this is my first time asking a question here. I've been searching for answers on the site but haven't had any luck so far. Could someone please assist me? I'm attempting to open Chrome with my profile, and while it opens correctly, t ...

What is the reason for not being able to use System.setProperty at the Class level?

Initially, I had no issues using System.setProperty in the main method. However, when I started using TestNG to learn Selenium, I discovered that we cannot directly write System.setProperty at the Class level. It must either be within a method or inside a ...

Unable to get CSS First Child Pseudo Element to Work Properly

I'm having trouble with this code in CSS and would appreciate any help. I can't figure out why the margin-left won't apply to the first i element. Below is the CSS I'm using: header.site-header .right_side_menu i:first-child { marg ...

Having trouble clicking on an element with Selenium? It might be either "not clickable" or "not found."

Struggling to click on the "Sign In" button on www.tradingview.com. I've utilized an implicitly_wait and attempted to locate the element by xPath and class name. Unfortunately, all my efforts result in either element not interactable or unable to loca ...

In Python, we have the concept of *args and **kwargs

Despite reviewing previous Stack Overflow posts on this subject, I am still encountering difficulties when attempting to implement these two commands in my function. I have developed a sample moving average function that I would like to execute using args ...

Having trouble uploading a file with Python via Selenium

I attempted to upload a file using Python and Selenium. Below is the code I tried: from selenium import webdriver from selenium.webdriver.common.by import By import time file = "C:\\Users\\Marcelino\\Downloads\&bsol ...

The error message states: "Python encountered a TypeError where a 'str' object does not allow item assignment when working with JSON files."

Here is the code snippet I am working on: import json with open('johns.json', 'r') as q: l = q.read() data = json.loads(l) data['john'] = '{}' data[ john ][ user ] = 'hey' This is the cont ...

What's the best way to begin printing a StringListProperty() starting at index [1]?

I am working on a model class Category(db.Model): merchandise = db.StringListProperty() content = db.StringListProperty() topics = db.StringListProperty() For example, if the merchandise list is ["merchandise","tshirt","book","poster"] ...

Python3 Selenium - Issue encountered while retrieving the text value from an element on an HTML webpage (web scraping)

On a website, I am working with the HTML code below to extract the number of jobs from a table: <span class="k-pager-info k-label">1 - 10 of 16 items</span> Although I can locate the element successfully through different methods, wh ...

Can geckodriver location be specified without using System property or the Path when not working remotely?

My application is encountering challenges when trying to set the geckodriver executable location using System.setProperty or adding it to the path. The reason for this limitation lies in the fact that my app functions as a multi-tenant system, with each te ...

Is there a way to filter a ManyToManyField based on the current User in the Browsable API in DRF?

Presently, I have 2 models named Todo and Tag. The Todo model has a ManyToMany relationship with Tag. While adding new Todos using the Browsable API, my goal is to display only the Tags added by the current user as selectable options in the multiselect fie ...

Python guide for load testing: calling various URLs

I need to run load testing on a web service that has multiple URLs using a code that is currently set up for a single URL. I want to create an array of URLs and have each thread hit all the URLs in the array. How can I modify my existing code to achieve ...

python create dictionary using tuples as composite keys

I am currently working on converting a list of tuples to a composite-key dictionary in order to enable custom sorting by either first or last name: def phone(first,last,number): directory = dict() while True: first = input('Please ent ...

succeed in script but fail in rc.local"Thriving in the script

A script called mail.py (using webpy) has been created to send the IP address of each machine. #!/usr/bin/env python #coding=utf-8 import web def send_mail(send_to, subject, body, cc=None, bcc=None): try: web.config.smtp_server = &apo ...

obtaining the index as we iterate through a list

As I iterate through a list, how can I retrieve the id of the current item in order to reference it for list methods? xl = [1,2,3] # initial list yl = [3,2] # list used to remove items from initial list for x in xl[:]: for y in yl: if x == ...

Is there a way to locate and click a button using the code in Selenium WebDriver?

There is a collection of buttons that all belong to the same class. I might need to consider using the link /account/logout'. Below is the code snippet I am experimenting with: input class="btnRed text-uppercase fo_white" value="Logout" onclick="wi ...