I am looking to extract data from this particular website.
With a collection of numerous resumes at my disposal, my objective is to gather the skill sets mentioned in each one. Here is the webpage link for reference:
I am looking to extract data from this particular website.
With a collection of numerous resumes at my disposal, my objective is to gather the skill sets mentioned in each one. Here is the webpage link for reference:
If you want to extract information from a website without using selenium
, you can achieve that easily with BeautifulSoup
. Here is the code snippet:
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.livecareer.com/resume-search/search?jt=software%20engineer').text
soup = BeautifulSoup(r, 'html.parser')
ul = soup.find('ul', class_='resume-list list-unstyled')
li_items = ul.find_all('li')[1:]
links = []
for li in li_items:
links.append('https://www.livecareer.com/' + li.a['href'])
skills = []
for link in links:
r = requests.get(link).text
soup = BeautifulSoup(r, 'html.parser')
div = soup.find('div', class_='field singlecolumn')
skills.append(div.text)
print(skills)
Output:
['agile, AutoCAD, C++, CAD, Oral, data entry, database, Engineer in Training, EIT, Engineering analysis, XML, functional, GUI, HTML, JavaScript, Team leadership, Lockheed Martin, macros, Manufacturing processes, MATLAB, mechanical, meetings, Excel, Organizational skills, presentations, Process improvement, program management, programming, Project planning, Python, research, scrum, Six Sigma, Software development, Solidworks, SQL, switches, telemetry, video, Web design, website, Written communication', "Senior Outreach at Senior Center\xa0Planned and organized a joint celebration of the Chinese New Year with the collaboration of the Westborough Public Schools. \xa0Promoted cultural awareness and broke the language barrier of different races of backgrounds.Volunteer at ChurchIdentified problems and implemented a process to eliminate a data-entry camp registration process by 100% by building new online Registration Forms for registration and the student's cultural classes arrangement.Designed posters, flyers and presentation slides with graphics and photos for the different organizational events with Microsoft Word and PowerPointDeveloped a structural documentation on publishing an annual report with detailed steps and instructions on the process that are easy to follow and quickly learn by others. \xa0Implemented a 30th Anniversary Special Edition project in a commercial quality of work with excellent time management skills to meet the deadline.Cayenne SoftwareExperienced team spirit in effort of reducing the workload of bugs fixes on the software product.\u200bAllmerica Financial CompanyProvided a sole support to the Hanover 1099 system with strong commitment and responsibility. Fined tuned the system resulting in cost savings for the Allmerica Financial Company. Winner of the Gold Crown Customer Recognition award.", 'Motivated Software Engineer seeking employment as part of a dynamic software development team. Fluent in C,C++,JAVA and python.', 'Developed peer-to-peer secure file transfer system in JAVA.This involved the application of symmetric\r\n and asymmetric key cryptography algorithms, and JAVA concepts like multi-threading, socket\r\n programming, etc.Implemented a system to query XML in JAVA.The query language was a subset of XPath\r\n Modeled a project "Personal Health Management System" using UML and implemented it in Visual C#.The code was tested using NUnit.Object oriented software development process was used for this\r\n project\r\n Developed a \'license plate game\' in C on LINUX o/s using client/server architecture.This required the\r\n application of distributed programming concepts like Sockets, RPC, multi-threading, etc.RESEARCH PAPER:\r\n XMorph: A Shape-Polymorphic, Domain-Specific XML Data Transformation Language,\r\n International Conference on Data Engineering (ICDE 2010), IEEE CS, Los Angeles, USA, March 2010.', 'Performance evaluation of In-Kernel System Call Implemented and evaluated In-kernel system call using dynamic loadable kernel module on x_86_64 architecture.Re-Development free approach to migrate Java applications to cloud at College of Engineering, Pune Implemented file access sub-system of a WebJDK which leverages the File-System API provided by HTML5.This allows the use of standard Java APIs for accessing client files.2016 2013.', 'Accomplished Computer Technician with a rapidly increasing range of industry experience looking to bring strong instincts and a proven record of procedural compliance, process management and strong operational skills to a rapidly growing company. ', 'Seeking a fulltime position as a Developer / Systems Admin / DBA for a company needing a hard working, \r\ntaskoriented person with an indepth understanding of software development and database tuning.', '3 Years of experience in Information Technology with emphasis on Design, Development and End to End Implementation of Consulting based solutions with expertise on working with Object Orient Analysis and Design using Java/J2EE Technologies viz. JSP/Servlets/EJB,JDBC, Web services , Web sockets, Spring Frameworks, Spring-boot, Angular, JQuery, XML/XSLT, JSON, Integration Developer Service Component Architecture, Service Data Objects, Rational Application Developer, Test Driven Development using JUnit, Jenkins, GIT, Cloud Foundry, Eclipse/Intelij IDE, UNIX, Gradle Scripts, DB2/Oracle/MySQL Databases.', '.NET 3.5, .NET, ASP .NET 3.5, ASP.NET 2.0, ASP.NET 3.5, AJAX, ASM, Banking, Basic, Business Objects, c, CSS, CSS 2, customer satisfaction, data analysis, Database, delivery, EBusiness, editor, Electronics, HP, HTML 4, HTML, IDE, IIS 7.0, ITIL, JavaScript, C#, C# 3.0, Windows, windows applications, 2000, 3.1, Windows 98, Enterprise, Oct, Operating systems, Oracle 9, Oracle database, PL/SQL, personnel, programming, recording, reporting, sales, Servers, Service Level Agreement, SLA, Visual SourceSafe, Visual SourceSafe, SQL, SQL Server, technical support, TOAD, UNIX, vi, Microsoft Visual Studio, Visual studio, Windows server', 'Represent Stanford Ballroom Dance team in various competitions in the Bay area.\r\n*Represented University of Maryland in Ballroom dance competitions in UMD, UPenn, MIT, Columbia University & Ohio \r\n*Have a keen interest in photography, especially of dancers in motion.']
You can also organize the data in a DataFrame
for better readability by incorporating these lines into your code:
dictionary = {'Links': links,
'Skills': skills}
df = pd.DataFrame(dictionary)
print(df)
Output:
Skills Links
0 https://www.livecareer.com//resume-search/r/so... agile, AutoCAD, C++, CAD, Oral, data entry, da...
1 https://www.livecareer.com//resume-search/r/so... Senior Outreach at Senior Center Planned and o...
2 https://www.livecareer.com//resume-search/r/so... Motivated Software Engineer seeking employment...
3 https://www.livecareer.com//resume-search/r/so... Developed peer-to-peer secure file transfer sy...
4 https://www.livecareer.com//resume-search/r/so... Performance evaluation of In-Kernel System Cal...
5 https://www.livecareer.com//resume-search/r/so... Accomplished Computer Technician with a rapidl...
6 https://www.livecareer.com//resume-search/r/so... Seeking a fulltime position as a Developer / S...
7 https://www.livecareer.com//resume-search/r/so... 3 Years of experience in Information Technolog...
8 https://www.livecareer.com//resume-search/r/so... .NET 3.5, .NET, ASP .NET 3.5, ASP.NET 2.0, ASP...
9 https://www.livecareer.com//resume-search/r/so... Represent Stanford Ballroom Dance team in var...
I hope this information proves useful!
Just a thought to consider...
When using chrome, simply follow these steps:
Now, craft your own xpath expression that will uniquely identify the desired page object.
For example:
//a[contains(text(), 'my text')]
//div[@id='myDivID']
It's important to manually create your xpath and avoid using the "Copy Xpath" option as it can generate overly complex paths like the one below, which are prone to breaking:
//*[@id="wrapper"]/div[2]/div[2]/div[1]/aside[1]/div/div/div[2]/div/div[1]/a
If you're unfamiliar with writing xpath, refer to this resource for guidance: https://www.w3schools.com/xml/xpath_intro.asp
An issue to note is that the text is currently within a div tag when it should ideally be enclosed in a span. You could attempt the following xpath:
//div[@class='field singlecolumn']/text()
If you're looking to find the XPath of an html element, you can make use of your browser's Developer Tools. The guide below is specific to Chrome, but similar steps apply to other browsers as well:
An issue you may face when scraping these pages is that your target could potentially move around between visits. User-generated documents often have varying layouts, causing the XPath to differ and requiring a more advanced approach (such as jQuery, Selenium, Cypress) to search based on text content or navigate between parent/child elements.
Struggling to send keys to an input field despite multiple attempts... I've experimented with various methods to wait for the element to become visible, but keep getting timeout exceptions. IWebElement userName = driver.FindElement(By.Id("UserName")) ...
I recently started experimenting with WebDriverBacked Selenium and encountered an issue with the code below. Unfortunately, it keeps throwing a "System.NotSupportedException: command" exception. In my attempt to use WebDriverBacked Selenium, I initiated t ...
I am currently facing an issue where I need to pass a WebElement (instead of just a String) from a Cucumber feature file to its corresponding step definition. Scenario: Test Given I want to click "myWebElement" The corresponding step definition is as ...
I'm encountering the following error message: org.openqa.selenium.NoSuchElementException: Unable to locate element with xpath == //*[@id='userId'] This is the HTML code I am working with: <input type="text" name="_ssoUser" id="user ...
I am currently in the process of automating a web page where data is being entered from a spreadsheet, including information like name and date of birth. I have successfully implemented the automation for a single record in the spreadsheet. However, I am f ...
My current challenge stems from attempting to perform tasks using Selenium, which is exhibiting consistent behavior whether automated or done manually. I suspect this might be a broader issue with Chrome. Currently, I am trying to execute Chrome with Sele ...
I'm attempting to perform integration tests on a local host (without HTTPS) using Selenium with ChromeDriver. Chrome insists on an HTTPS certificate, but I discovered from this question that I can bypass this requirement by using the argument --ignor ...
In the design I'm currently working on, there is a requirement for a border around a table row. Due to certain complications, using the border property directly is not an option. In trying to find a solution, I have experimented with two different app ...
After conducting extensive research on the internet, I have not been able to find the relevant information I need. Below is an example code snippet: public class HomePage { @FindBy(id = "fname") WebElement name; @FindBy(id = "email") Web ...
How can I locate the text "Interesting" which is the first occurrence of the class b after h1.important when using Selenium? <div class="a"> <div class="b">Not interesting</div> </div> <div class="title"> <h1 c ...
Oops, encountered an error: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py", line 148, in __init__ self.service.start() File ...
Currently, I am working on a script to automate a repetitive task using the Selenium Internet Explorer driver in C#. Everything is running smoothly, but there is a slight delay during one particular part of the script. I am exploring if there is a faster w ...
My goal is to make sure this particular test passes. I am verifying if the webpage URL matches the expected URL. The problem lies in the fact that the driver captures the URL before the next page fully loads, resulting in it comparing the old URL with the ...
I am facing a challenge while trying to integrate two separate scripts, one with a GUI and the other being a bot script. Individually, both scripts run as expected but when combined, the bot function gets called immediately without displaying the user in ...
I am currently in the process of updating the browser image used for our selenium tests from node-chrome:3.141.59-20201119 to node-chrome:3.141.59-20210607. However, I have encountered an issue with the newer version as hardware acceleration is not enabled ...
When attempting to test a website in Firefox, I encountered an error stating "The path to the driver executable must be set by the webdriver.gecko.driver system property;" Despite setting the path correctly, I am unsure of where the issue lies. Here is a s ...
Setting up Selenium Grid for the first time has been a bit of a challenge for me. I successfully ran the hub and a local node, but when attempting to run a node on a VM, I encountered an error message: INFO - Couldn't register this node: Error sendin ...
Currently, I am engaged in Web Scraping using selenium webdriver. One challenge I face is the need to navigate to numerous subpages from the main page in order to gather data. Rather than constantly returning to the main page, I am exploring the idea of ...
Currently exploring web automation and facing an issue that I can't seem to solve.. Code- from selenium import webdriver site = webdriver.Chrome() site.get('https://www.youtube.com') searchbar = site.find_element_by_xpath('//*[@id=&quo ...
I am encountering an issue with navigating drop down menus using Selenium in Python. On this particular page, when I select an option from a drop down menu, it triggers some processing that temporarily makes the other options unselectable until the process ...