What is the process for extracting the download button URL and parsing a CSV file in Python?

In my Python Google Colab project, I am attempting to access a CSV file from the following link:

After scrolling down slightly on the page, there is a download button visible. My goal is to extract the link using Selenium or BeautifulSoup in order to read the CSV file. The code snippet I am working with looks like this:

# Installing necessary packages
!pip install selenium
!apt-get update # Update Ubuntu for proper apt installation
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin

# Import required libraries
import pandas as pd
from selenium import webdriver
import sys

# Using Selenium to fetch and read the CSV file
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',chrome_options=chrome_options)
driver.get('https://www.macrotrends.net/stocks/charts/AAPL/apple/stock-price-history')# Enter the URL of the desired page here
btn = driver.find_element_by_tag_name('button')
btn.click()
df = pd.read_csv('##.csv')

Everything seems to be functioning properly up to the btn.click() step, but I encounter an error afterwards because I'm not able to locate the download button's link or the file name. Can anyone provide guidance on how to do this successfully? Any help would be greatly appreciated.

Answer №1

Forget about using selenium. The necessary data is actually included within the <script> tags.

import requests
from bs4 import BeautifulSoup
import json
import pandas as pd

ticker = 'AAPL'
url = 'https://www.macrotrends.net/assets/php/stock_price_history.php?t={}'.format(ticker)

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

scripts = soup.find_all('script', {'type': 'text/javascript'})
for script in scripts:
    if 'var dataDaily' in str(script):
        jsonStr = '[' + str(script).split('[', 1)[-1].split('];')[0] + ']'
        jsonData = json.loads(jsonStr)
        
df = pd.DataFrame(jsonData)
df = df.rename(columns={'o':'open','h':'high','l':'low','c':'close','d':'date','v':'volume'})
df.to_csv('MacroTrends_Data_Download_{}.csv'.format(ticker), index=False)

Result:

print(df)
             date      open      high  ...   volume     ma50    ma200
0      1980-12-12    0.1012    0.1016  ...  469.034      NaN      NaN
1      1980-12-15    0.0964    0.0964  ...  175.885      NaN      NaN
2      1980-12-16    0.0893    0.0893  ...  105.728      NaN      NaN
3      1980-12-17    0.0910    0.0915  ...   86.442      NaN      NaN
4      1980-12-18    0.0937    0.0941  ...   73.450      NaN      NaN
          ...       ...       ...  ...      ...      ...      ...
10135  2021-02-25  124.6800  126.4585  ...  148.200  131.845  112.241
10136  2021-02-26  122.5900  124.8500  ...  164.560  131.838  112.460
10137  2021-03-01  123.7500  127.9300  ...  116.308  131.840  112.716
10138  2021-03-02  128.4100  128.7200  ...  102.261  131.790  112.957
10139  2021-03-03  124.8100  125.7100  ...  111.514  131.661  113.184

[10140 rows x 8 columns]

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Leveraging sqlite memory database for selenium testing in Rails 4

Currently, my rspec tests are running smoothly using a memory sqlite database. However, when I run selenium driven tests (describe "does something", :js => true do), the web browser encounters an error stating SQLite3::SQLException: no such table: users ...

Is it possible for Python to perform in a similar manner in a bash script?

Is there a way for Python to create paths with multiple sub-paths using shortcut syntax similar to this? vars=project/{DEBIAN,usr/{bin,usr/{applications,icons,share}},computer} ...

Utilizing Selenium WebDriver to pick a date from a calendar

As a newcomer to the world of testing with Selenium WebDriver and using Java programming language, I am encountering an issue in calling the calendar element and selecting a date within it. If you'd like to take a look at the specific problem, check ...

Exploring DFS problem-solving techniques using recursion - uncovering its inner workings!

Let's tackle this challenge We have a collection of n nonnegative integers. Our goal is to manipulate these numbers by adding or subtracting them in order to reach our target number. For instance, to achieve the number 3 using [1, 1, 1, 1, 1], we hav ...

Exploring Python's Lambda characteristics

I'm trying to understand the concept of lambda expressions, closures, and scoping in Python. It's interesting how the program does not crash on the first line in this example. >>> foo = lambda x: x + b >>> foo(2) Traceback (mos ...

Using Selenium in Python to extract distinct list

I'm currently working on a script to scrape hotel.com using selenium import time import os from selenium import webdriver from selenium.webdriver.chrome.service import Service from webdriver_manager.chrome import ChromeDriverManager from selenium.webd ...

Grab the information swiftly

When running a presto command, I received the following result: a| b| c --+--+------ 1 | 3| 6 2 | 4| 5 I am aware of cursor.fetchall() to retrieve all data and cursor.fetchone() for a single row. Now, I am interested in extracting data from a sp ...

How can strings be properly formatted before being utilized as a JSON object?

My task involves converting database values into a JSON string and then updating the database with that JSON string using Python code. Here is what I have so far: json = "{" for row in cursor_mysql: mainkey = """" " \n \ / """ #for test ...

When using Chrome with Selenium, the web page is able to detect the presence of Selenium and prevents users from logging in

Attempting to log in to the specified page using python, selenium, and chrome is proving to be quite a challenge. Interestingly, you don't actually need my real username and password to replicate this issue. Simply use any random credentials and you& ...

Is there a way to configure the character encoding to UTF-8 when importing a CSV file into PHPMy

I'm encountering an issue where my results are not coming out correctly (I suspect it's related to UTF-8 encoding). This is the input data from a CSV file This is how it appears in phpMyAdmin after processing Any assistance would be greatly ap ...

Having trouble setting cookie with Selenium::Remote::Driver

I've encountered an issue while attempting to add a cookie using this particular module. Even with a simple example like this: $driver->add_cookie('foo','bar','/','my_server',0); An error message is returned: ...

Sending a returned value to another function

I'm facing an issue with a method I have implemented, where I return a variable that needs to be compared but I'm struggling to retrieve it... This is the code snippet: public char setCurrency(String currencyToSet) { WebElement currencyVal ...

YAML to JSON conversion failed due to an error: yaml: line 3: expected key not found

I have successfully set up a Docker application and now I am trying to deploy it on Heroku server. However, I encountered an error while executing the command: git push heroku master The error message reads: Enumerating objects: 466, done. Counting objec ...

Can all exceptions for requests be consistently captured? (And more broadly, for a module)

I'm dealing with some code that involves making a requests.get() call, which could potentially fail in different ways. My goal is to catch any exceptions related to requests, without necessarily needing to know the specific reason for the failure. I ...

Is it possible that Selenium struggles to locate an element on a Linux headless system, yet has no trouble doing so on a Windows headless

I am currently experiencing an issue with Selenium running headless on a Linux machine. For some reason, it is unable to locate a specific element. Interestingly, when I execute the exact same code on a Windows machine, the element is easily found without ...

generate a random click function in Selenium using Python to maintain session activity

Is there a way to keep a session alive while scraping using Selenium by performing a random click on the page after a certain amount of time? If not, are there alternative methods to maintain activity and prevent session disconnection? from selenium impo ...

Guide on utilizing the latest version of Chrome with Selenium using Python

Every time I try to use my local Chrome with Selenium, a new instance opens up without any cookies saved. from selenium import webdriver from selenium.webdriver.chrome.options import Options option = webdriver.ChromeOptions() option.add_argument(r'-- ...

Optimal Google Chrome version for Selenium automation testing

Im looking for recommendations on the most compatible version of Google Chrome for Selenium WebDriver in Java. I'm currently using WebDriver 44 on Windows 7. My code works perfectly in Firefox but encounters issues when run in Google Chrome. ...

Converting a JSON file embedded in pandas python to a CSV format

Hey there! I recently received a JSON file with the following format. Can you guide me on how to parse this JSON file and convert it into CSV? JSON File Format {'Sections': [{'MC': [[{'IsMandatory': False, 'LD&apo ...

How can the {% extends '...' %} statement in Django be made contingent on a certain condition?

Is there a way to use the same template for both AJAX and regular HTTP calls, with the only difference being that one needs to be served with the base.html html while the other does not? Any suggestions on how to achieve this? ...