What is the best way to locate the XPath expression for this specific website?

I am looking to extract data from this particular website.

With a collection of numerous resumes at my disposal, my objective is to gather the skill sets mentioned in each one. Here is the webpage link for reference:

https://i.stack.imgur.com/lBoym.png

Answer №1

If you want to extract information from a website without using selenium, you can achieve that easily with BeautifulSoup. Here is the code snippet:

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.livecareer.com/resume-search/search?jt=software%20engineer').text

soup = BeautifulSoup(r, 'html.parser')

ul = soup.find('ul', class_='resume-list list-unstyled')

li_items = ul.find_all('li')[1:]

links = []

for li in li_items:
    links.append('https://www.livecareer.com/' + li.a['href'])

skills = []

for link in links:
    
    r = requests.get(link).text
    soup = BeautifulSoup(r, 'html.parser')
    div = soup.find('div', class_='field singlecolumn')
    skills.append(div.text)

print(skills)

Output:

['agile, AutoCAD, C++, CAD, Oral, data entry, database, Engineer in Training, EIT, Engineering analysis, XML, functional, GUI, HTML, JavaScript, Team leadership, Lockheed Martin, macros, Manufacturing processes, MATLAB, mechanical, meetings, Excel, Organizational skills, presentations, Process improvement, program management, programming, Project planning, Python, research, scrum, Six Sigma, Software development, Solidworks, SQL, switches, telemetry, video, Web design, website, Written communication', "Senior Outreach at Senior Center\xa0Planned and organized a joint celebration of the Chinese New Year with the collaboration of the Westborough Public Schools. \xa0Promoted cultural awareness and broke the language barrier of different races of backgrounds.Volunteer at ChurchIdentified problems and implemented a process to eliminate a data-entry camp registration process by 100% by building new online Registration Forms for registration and the student's cultural classes arrangement.Designed posters, flyers and presentation slides with graphics and photos for the different organizational events with Microsoft Word and PowerPointDeveloped a structural documentation on publishing an annual report with detailed steps and instructions on the process that are easy to follow and quickly learn by others. \xa0Implemented a 30th Anniversary Special Edition project in a commercial quality of work with excellent time management skills to meet the deadline.Cayenne SoftwareExperienced team spirit in effort of reducing the workload of bugs fixes on the software product.\u200bAllmerica Financial CompanyProvided a sole support to the Hanover 1099 system with strong commitment and responsibility. Fined tuned the system resulting in cost savings for the Allmerica Financial Company. Winner of the Gold Crown Customer Recognition award.", 'Motivated Software Engineer seeking employment as part of a dynamic software development team. Fluent in C,C++,JAVA and python.', 'Developed peer-to-peer secure file transfer system in JAVA.This involved the application of symmetric\r\n     and asymmetric key cryptography algorithms, and JAVA concepts like multi-threading, socket\r\n     programming, etc.Implemented a system to query XML in JAVA.The query language was a subset of XPath\r\n    Modeled a project "Personal Health Management System" using UML and implemented it in Visual C#.The code was tested using NUnit.Object oriented software development process was used for this\r\n     project\r\n    Developed a \'license plate game\' in C on LINUX o/s using client/server architecture.This required the\r\n     application of distributed programming concepts like Sockets, RPC, multi-threading, etc.RESEARCH PAPER:\r\n    XMorph: A Shape-Polymorphic, Domain-Specific XML Data Transformation Language,\r\n     International Conference on Data Engineering (ICDE 2010), IEEE CS, Los Angeles, USA, March 2010.', 'Performance evaluation of In-Kernel System Call Implemented and evaluated In-kernel system call using dynamic loadable kernel module on x_86_64 architecture.Re-Development free approach to migrate Java applications to cloud at College of Engineering, Pune Implemented file access sub-system of a WebJDK which leverages the File-System API provided by HTML5.This allows the use of standard Java APIs for accessing client files.2016 2013.', 'Accomplished Computer Technician with a rapidly increasing range of industry experience looking to bring strong instincts and a proven record of procedural compliance, process management and strong operational skills to a rapidly growing company. ', 'Seeking a fulltime position as a Developer / Systems Admin / DBA for a company needing a hard working, \r\ntaskoriented person with an indepth understanding of software development and database tuning.', '3 Years of experience in Information Technology with emphasis on Design, Development and End to End Implementation of Consulting based solutions with expertise on working with Object Orient Analysis and Design using Java/J2EE Technologies viz. JSP/Servlets/EJB,JDBC, Web services , Web sockets, Spring Frameworks, Spring-boot, Angular, JQuery, XML/XSLT, JSON, Integration Developer Service Component Architecture, Service Data Objects, Rational Application Developer, Test Driven Development using JUnit, Jenkins, GIT, Cloud Foundry,  Eclipse/Intelij IDE, UNIX, Gradle Scripts, DB2/Oracle/MySQL Databases.', '.NET 3.5, .NET, ASP .NET 3.5, ASP.NET 2.0, ASP.NET 3.5, AJAX, ASM, Banking, Basic, Business Objects, c, CSS, CSS 2, customer satisfaction, data analysis, Database, delivery, EBusiness, editor, Electronics, HP, HTML 4, HTML, IDE, IIS 7.0, ITIL, JavaScript, C#, C# 3.0, Windows, windows applications, 2000, 3.1, Windows 98, Enterprise, Oct, Operating systems, Oracle 9, Oracle database, PL/SQL, personnel, programming, recording, reporting, sales, Servers, Service Level Agreement, SLA, Visual SourceSafe, Visual  SourceSafe, SQL, SQL Server, technical support, TOAD, UNIX, vi, Microsoft Visual Studio, Visual studio, Windows server', 'Represent Stanford  Ballroom Dance team in various competitions in the Bay area.\r\n*Represented University of Maryland in Ballroom dance competitions in UMD, UPenn, MIT, Columbia University & Ohio \r\n*Have a keen interest in photography, especially of dancers in motion.']

You can also organize the data in a DataFrame for better readability by incorporating these lines into your code:

dictionary = {'Links': links,
              'Skills': skills}

df = pd.DataFrame(dictionary)

print(df)

Output:

                                                                                                           
                Skills                                                   Links
0  https://www.livecareer.com//resume-search/r/so...  agile, AutoCAD, C++, CAD, Oral, data entry, da...
1  https://www.livecareer.com//resume-search/r/so...  Senior Outreach at Senior Center Planned and o...
2  https://www.livecareer.com//resume-search/r/so...  Motivated Software Engineer seeking employment...
3  https://www.livecareer.com//resume-search/r/so...  Developed peer-to-peer secure file transfer sy...
4  https://www.livecareer.com//resume-search/r/so...  Performance evaluation of In-Kernel System Cal...
5  https://www.livecareer.com//resume-search/r/so...  Accomplished Computer Technician with a rapidl...
6  https://www.livecareer.com//resume-search/r/so...  Seeking a fulltime position as a Developer / S...
7  https://www.livecareer.com//resume-search/r/so...  3 Years of experience in Information Technolog...
8  https://www.livecareer.com//resume-search/r/so...  .NET 3.5, .NET, ASP .NET 3.5, ASP.NET 2.0, ASP...
9  https://www.livecareer.com//resume-search/r/so...  Represent Stanford  Ballroom Dance team in var...

I hope this information proves useful!

Answer №2

Just a thought to consider...

When using chrome, simply follow these steps:

  • Right click on the element you wish to target
  • Select "Inspect"
  • Press ctrl + f to open the search window

Now, craft your own xpath expression that will uniquely identify the desired page object.

For example:

//a[contains(text(), 'my text')] 
//div[@id='myDivID']

It's important to manually create your xpath and avoid using the "Copy Xpath" option as it can generate overly complex paths like the one below, which are prone to breaking:

//*[@id="wrapper"]/div[2]/div[2]/div[1]/aside[1]/div/div/div[2]/div/div[1]/a

If you're unfamiliar with writing xpath, refer to this resource for guidance: https://www.w3schools.com/xml/xpath_intro.asp

An issue to note is that the text is currently within a div tag when it should ideally be enclosed in a span. You could attempt the following xpath:

//div[@class='field singlecolumn']/text()

Answer №3

If you're looking to find the XPath of an html element, you can make use of your browser's Developer Tools. The guide below is specific to Chrome, but similar steps apply to other browsers as well:

  1. Simply right click on the item within the page that you wish to determine the XPath for.
  2. Next, select "Inspect" which will launch the Dev Tools, highlighting the specified element.
  3. If the highlighted element isn't the one you're seeking, explore the interactive html shown by hovering over elements to match the desired item on the main page.
  4. Right click on the html element in the navigator panel.
  5. Choose 'Copy -> Copy XPath' from the options provided.

An issue you may face when scraping these pages is that your target could potentially move around between visits. User-generated documents often have varying layouts, causing the XPath to differ and requiring a more advanced approach (such as jQuery, Selenium, Cypress) to search based on text content or navigate between parent/child elements.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

ChromeDriver for Selenium: element remains invisible

Struggling to send keys to an input field despite multiple attempts... I've experimented with various methods to wait for the element to become visible, but keep getting timeout exceptions. IWebElement userName = driver.FindElement(By.Id("UserName")) ...

Encountering the System.NotSupportedException error when using Selenium 2 with Firefox

I recently started experimenting with WebDriverBacked Selenium and encountered an issue with the code below. Unfortunately, it keeps throwing a "System.NotSupportedException: command" exception. In my attempt to use WebDriverBacked Selenium, I initiated t ...

Transferring the parameter of a WebElement from a Cucumber feature to the Step Definition function

I am currently facing an issue where I need to pass a WebElement (instead of just a String) from a Cucumber feature file to its corresponding step definition. Scenario: Test Given I want to click "myWebElement" The corresponding step definition is as ...

Selenium Webdriver: The dreaded org.openqa.selenium.NoSuchElementException strikes again

I'm encountering the following error message: org.openqa.selenium.NoSuchElementException: Unable to locate element with xpath == //*[@id='userId'] This is the HTML code I am working with: <input type="text" name="_ssoUser" id="user ...

Navigating through the information stored in a spreadsheet using Selenium/data

I am currently in the process of automating a web page where data is being entered from a spreadsheet, including information like name and date of birth. I have successfully implemented the automation for a single record in the spreadsheet. However, I am f ...

Once more, tackling basic authentication in Chrome using the driver

My current challenge stems from attempting to perform tasks using Selenium, which is exhibiting consistent behavior whether automated or done manually. I suspect this might be a broader issue with Chrome. Currently, I am trying to execute Chrome with Sele ...

Encountering ERR_SSL_PROTOCOL_ERROR with ChromeDriver even with the --ignore-certificate-errors flag

I'm attempting to perform integration tests on a local host (without HTTPS) using Selenium with ChromeDriver. Chrome insists on an HTTPS certificate, but I discovered from this question that I can bypass this requirement by using the argument --ignor ...

Is there a solution for the issue where the :after pseudo-element on a TD in IE9 does not accurately reflect the TD height?

In the design I'm currently working on, there is a requirement for a border around a table row. Due to certain complications, using the border property directly is not an option. In trying to find a solution, I have experimented with two different app ...

In what scenarios would it be beneficial to retrieve an object in the Page Factory design pattern?

After conducting extensive research on the internet, I have not been able to find the relevant information I need. Below is an example code snippet: public class HomePage { @FindBy(id = "fname") WebElement name; @FindBy(id = "email") Web ...

Selenium can locate an element by its CSS selector that comes after a specific element

How can I locate the text "Interesting" which is the first occurrence of the class b after h1.important when using Selenium? <div class="a"> <div class="b">Not interesting</div> </div> <div class="title"> <h1 c ...

Having trouble launching Firefox using Selenium WebDriver

Oops, encountered an error: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py", line 148, in __init__ self.service.start() File ...

Leveraging the Selenium IE driver with C# to efficiently publish a substantial volume of text consisting of 10,000 lines

Currently, I am working on a script to automate a repetitive task using the Selenium Internet Explorer driver in C#. Everything is running smoothly, but there is a slight delay during one particular part of the script. I am exploring if there is a faster w ...

Selenium test using JUnit experiencing failure due to driver prematurely accessing URL

My goal is to make sure this particular test passes. I am verifying if the webpage URL matches the expected URL. The problem lies in the fact that the driver captures the URL before the next page fully loads, resulting in it comparing the old URL with the ...

Combining Graphical User Interface with Scripting

I am facing a challenge while trying to integrate two separate scripts, one with a GUI and the other being a bot script. Individually, both scripts run as expected but when combined, the bot function gets called immediately without displaying the user in ...

Newer builds of selenium/node-chrome have disabled hardware acceleration

I am currently in the process of updating the browser image used for our selenium tests from node-chrome:3.141.59-20201119 to node-chrome:3.141.59-20210607. However, I have encountered an issue with the newer version as hardware acceleration is not enabled ...

org.openqa.selenium.SessionNotCreatedException: Error occurred when attempting to launch Firefox version 37 using Selenium version 3.11.0 due to incompatible capabilities

When attempting to test a website in Firefox, I encountered an error stating "The path to the driver executable must be set by the webdriver.gecko.driver system property;" Despite setting the path correctly, I am unsure of where the issue lies. Here is a s ...

Attempting to start Selenium-Grid for the initial time, but encountering difficulties connecting with my remote machines

Setting up Selenium Grid for the first time has been a bit of a challenge for me. I successfully ran the hub and a local node, but when attempting to run a node on a VM, I encountered an error message: INFO - Couldn't register this node: Error sendin ...

Guide to cloning a webdriver in Selenium

Currently, I am engaged in Web Scraping using selenium webdriver. One challenge I face is the need to navigate to numerous subpages from the main page in order to gather data. Rather than constantly returning to the main page, I am exploring the idea of ...

I encountered an issue with web automation in Python using the Selenium library

Currently exploring web automation and facing an issue that I can't seem to solve.. Code- from selenium import webdriver site = webdriver.Chrome() site.get('https://www.youtube.com') searchbar = site.find_element_by_xpath('//*[@id=&quo ...

Tips for ensuring that each dropdown list is fully processed before proceeding to the next one

I am encountering an issue with navigating drop down menus using Selenium in Python. On this particular page, when I select an option from a drop down menu, it triggers some processing that temporarily makes the other options unselectable until the process ...