Extracting information from dynamically generated tables using Python 2.7, Beautiful Soup, and Selenium

I am in need of assistance with scraping a JavaScript generated table and saving specific data to a csv file. The tools available to me are limited to python 2.7, Beautiful Soup, and/or Selenium. Although I have referred to the code provided in question 14529849, all I am receiving is an empty list. The website I am targeting is:

and its corresponding source can be found at:

As an example, one of the data records has the following structure:

 <tr>
 <td class="flagmay"><a href="javascript:dataWin('STAGE','119901','Colorado River at Winchell')" class="tablink">Colorado River at Winchell</a></td>
<td align="left" class="flagmay">Jan 12 2016 5:55PM</td><td align="right" class="flagmay">2.48</td><td align="right" class="flagmay">4.7</td></tr>

The desired csv output format should resemble:

Station| StationID| Time | Stage| Flow

Colorado River at Winchell | 119901 | Jan 12 2016 5:55PM | 2.48 | 4.7

If anyone could provide some guidance or tips on how to achieve this task, it would be greatly appreciated. Thank you for your help.

Answer №1

Give this a shot:

In this code snippet, I am utilizing the pandas, requests, and BeautifulSoup4 libraries. I have tested it with Python versions 2.7.11 and 3.5.1.

import requests
import pandas
from bs4 import BeautifulSoup

url = 'http://hydromet.lcra.org/repstage.asp'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
tables = soup.find_all('table')

# Convert HTML table data to pandas data frames, skipping the header for easier column addition
df = pandas.read_html(str(tables[1]), skiprows={0}, flavor="bs4")[0]

# Iterate over the table to extract station IDs and store them in a dictionary object
a_links = soup.find_all('a', attrs={'class': 'tablink'})
stnid_dict = {}
for a_link in a_links:
    cid = ((a_link['href'].split("dataWin('STAGE','"))[1].split("','")[0])
    stnid_dict[a_link.text] = cid

# Add the station ID column from the stnid_dict object created above
df.loc[:, (len(df.columns)+1)] = df.loc[:, 0].apply(lambda x: stnid_dict[x])
df.columns = ['Station', 'Time', 'Stage', 'Flow', 'StationID']

# Define custom order of columns for the CSV file and skip row numbers
df.to_csv('station.csv', columns=['Station', 'StationID', 'Time', 'Stage', 'Flow'], index=False)

This piece of code generates a CSV file named station.csv at the script's location.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Each time my classes are initialized, their components undergo reinitialization

Apologies if the question is not well-formed, I am completely new to working with React. I have been attempting to create a dashboard but encountering issues with my states getting reinitialized. Below is the content of my app.js file. import './inde ...

Limiting Ant Design Date range Picker to display just a single month

insert image description here According to the documentation, the date range picker is supposed to display two months on the calendar, but it's only showing one month. I carefully reviewed the documentation and made a change from storing a single va ...

Tips for altering the background of a video on Vonage

Hello everyone, Currently, I am incorporating the Vonage API for video calling into my project. I would like to tweak the video background - does anyone have any ideas on how to accomplish this? I eagerly await your responses! Thank you in advance! ...

Trouble getting CSS to load in Webpack

I'm having some trouble setting up Webpack for the first time and I think I might be overlooking something. My goal is to use Webpack's ExtractTextPlugin to generate a CSS file in the "dist" folder, but it seems that Webpack isn't recognizi ...

What is it about the setTimeout function that allows it to not block other

Why is setTimeout considered non-blocking even though it is synchronous? And on which thread does it run if not the main thread? ...

Ensuring Image Visibility: Validating the Presence of an Image in a Specific Webpage Section

I'm currently working on writing a test that has the capability to perform the following tasks: 1. Navigate to a specific website. 2. Go to a particular page within the site's menu. 3. Once on that page, verify that the image I'm looking for ...

Every time I hit the refresh button, I find myself forcefully logged out

After switching from using localStorage to cookies in my React JS web app, I am experiencing an issue where I get logged out whenever I refresh the page. Even though the cookies are still stored in the browser, the authentication process seems to be failin ...

Experiencing problems with the Locale setting when utilizing the formatNumber function in Angular's core functionalities

I am having trouble formatting a number in Angular using the formatNumber function from the Angular documentation. Here is my code snippet: import {formatNumber} from '@angular/common'; var testNumber = 123456.23; var x = formatNumber(Numb ...

Retrieving a list of numbers separated by commas from an array

Currently, I'm retrieving data from a MYSQL database by executing the following SQL command: SELECT GROUP_CONCAT(MemberMemberId SEPARATOR ',') AS MemberMemberId FROM member_events WHERE event_date = "2000-01-01" AND Eve ...

Display chosen choices in Selenium using Python

I need help with a code that should list the selected options from the page, but nothing is being printed. Here's the HTML code snippet: <select class="chosen-select" id="tag_opts" name="tag_opts[]" tabindex="-1" multiple="true" data-placeholder=" ...

Obtain the query response time/duration using react-query

Currently utilizing the useQuery function from react-query. I am interested in determining the duration between when the query was initiated and when it successfully completed. I have been unable to identify this information using the return type or para ...

How can I automatically check all checkboxes in a row when a material table is loaded and uncheck them when clicked?

I have been working on a project using React Material Table and I am trying to figure out how to make the "select all" checkbox default checked when the page is loaded. Additionally, I want the ability to deselect any checkbox if needed. I attempted to use ...

Unable to Establish Connection Between Selenium Grid 2 Node and Hub

My computer has a local IP address of 192.168.1.x running Windows 7. Additionally, I have set up a VM with NAT IP 10.0.2.15 and a host-only IP of 192.168.70.64 also running Windows 7. The setup for my hub is quite simple: java -jar selenium-server-standal ...

Django view receives all data in QueryDict format when using Angular to send data

As I work with Angular 1.4.4 and Django 1.8, I encountered an issue when posting data from Angular to Django view. The data is received in the form of a whole key in the QueryDict. Upon evaluating request.POST in debug mode, it returns: <QueryDict: {u ...

Creating Beautiful Tabs with React Material-UI's Styling Features

I've been delving into React for a few hours now, but I'm struggling to achieve the desired outcome. My goal is to make the underline color of the Tabs white: https://i.stack.imgur.com/7m5nq.jpg And also eliminate the onClick ripple effect: ht ...

Sliding the slider using Selenium WebDriver, Java, and Firefox

I'm facing an issue with Selenium Webdriver (version 2.32.0) and Firefox (21.0) while trying to manipulate a slider. My Java code looks like this: private void selectGiftCardPrice() throws TestingException { try { WebElement slid ...

Revamp the component onclick functionality in ReactJS

Within my component, I have an onClick method defined. This method makes a call to the backend and fetches some data. However, the values retrieved from this call are only reflected after refreshing the browser. Is there a way to re-render my component wit ...

Using Angular 6's httpClient to securely post data with credentials

I am currently working with a piece of code that is responsible for posting data in order to create a new data record. This code resides within a service: Take a look at the snippet below: import { Injectable } from '@angular/core'; import { H ...

What is the most effective method for retrieving a key and value from an Axios response object?

I currently have a Mongoose schema set up to store key:value pairs in a mixed type array, represented like this: Mongoose const budgetSchema = new Schema({ earnings: Number, expenses: [mongoose.Schema.Types.Mixed] }); budget:{ earning:1000, exp ...

Every div must have at least one checkbox checked

Coding in HTML <div class="response"> <input type="checkbox" /> <input type="checkbox" /> <input type="checkbox" /> <input type="checkbox" /> </div> <div class="response"> <input type="check ...