Having trouble downloading a tar.gz file using Python code due to a UnicodeDecodeError?

My intention is to obtain the Java download, so I utilize the following command in the shell which executes correctly.

wget -P /data/ --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie"

However, when attempting the same command using Python, an error arises. Below is my Python code:

from resource_management import *
import os
import params
cmd = 'wget -P ' + params.java_tarball_path + ' --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz'
print cmd
Execute(cmd, user=params.monarch_user, timeout=300)

The error message received reads: "File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 198, in _call err_msg = Logger.filter_text(("Execution of '%s' returned %d. %s") % (command_alias, code, out)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1228: ordinal not in range(128)"

I have also checked the command that will be executed in Python, and it appears correct to me. "wget -P /data/ --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" "

Is there a way to perform the download using Python's Execute command?

Answer №1

My suggestion would be to use urllib2 or requests instead of resorting to Execute.

import urllib2
opener = urllib2.build_opener()
opener.addheaders.append(('Cookie', 'oraclelicense=accept-securebackup-cookie'))
f = opener.open('http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.tar.gz')
with open('jdk-7u79-linux-x64.tar.gz', 'w+') as save:
     save.write(f.read())

Answer №2

The error message is quite clear. When Logger.filter_text is called, it triggers a UnicodeError. This issue likely stems from the fact that the variable out is in unicode format. Here's a demonstration:

>>> "%s %s" % ("é", "é")   # works
'\xc3\xa9 \xc3\xa9'
>>> "%s %s" % ("é", u"é")  # doesn't work
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

To resolve this issue, consider updating the code in resource_management/core/shell.py to ensure that the out variable is converted to a string type:

Logger.filter_text(("Execution of '%s' returned %d. %s") % (command_alias, code, out.decode("utf-8")))

Answer №3

There seems to be an issue in the resource_management module where it mixes bytestrings and Unicode text. To get around this problem, you can manually download the tarball:

#!/usr/bin/env python3
import os
import requests

url = 'http://example.com/tarball.tar.gz'
response = requests.get(url)
with open(os.path.join('/data', url.split('/')[-1]), 'wb') as output_file:
    output_file.write(response.content)

This code downloads the file without loading it entirely into memory, making it suitable for large files. However, it doesn't verify the Content-Length header, so a premature interruption could result in a partial download.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Encountering an error when trying to switch between tabs and close a tab in a window

Currently, my code is designed to perform a sequence of actions: open a window, navigate to a link on the page, extract some data from that page, and then close the tab. However, I am encountering an issue with closing the tab after completing these step ...

Using Python variables within an IF statement

After searching for a solution, I realized that none of the methods I tried matched my specific issue. Currently, I am utilizing fabric with a run command to execute hostname -i remotely. Here is the snippet of code: ip = run("hostname -i") if %s in run(" ...

Upon loading a webpage using selenium, it navigates to the end but fails to retrieve all elements housed within the div

Looking to scrape all the product links from this website: . The page only loads 30 out of 100 products initially, and the rest load when clicking somewhere on the page. How can I retrieve all the product links? Any help is appreciated! from selenium im ...

Display on the terminal screen indefinitely

Recently, I delved into the world of Python and out of boredom, I decided to create a basic password generator. Below is the code snippet: import random upper = "ABCDFGHIJKLMNOPQRSTUVXYZ" lower = "abcdefghijklmnopqrstuvxwyz" numbers = ...

Executing Python scripts with CUDA acceleration on Google Colab platform

My MacBook Pro does not have GPU support, so I uploaded a directory of codes to Google Colab to utilize Cuda support. However, I am facing an issue where the code is unable to access other files in folders within the current directory. Any advice on how to ...

"Revamping" a string using a formatting operator

I'm facing a rather simple dilemma that I can't seem to find an answer for anywhere else on the internet. The issue lies with a variable set as a string containing a %s format operator. var1 = "Arbaham" var2 = "My name is: %s" % var1 print(var2) ...

Eliminating unnecessary data points from a data set

I am currently working with a dataset that contains 15,000 data points, each assigned a value of 0, 1, 2, or 3. The pattern in the data involves sequences of 25-30 occurrences of the same value, followed by 0-2 instances of another value, and then again fo ...

What is the best way to extract data from a nested web table with rows within rows?

Currently, I am attempting to extract data from a table where each row contains multiple rows of information within a single column. My goal is to scrape each individual row inside the main row and create a data frame with it. Additionally, I want to use ...

Sending an element to a field or website login using Python and Selenium is not permitted

I recently encountered an issue while trying to log into a website using Selenium scripts with a username and password. It seems that the website (Etsy) has updated its code to hide the username field. I am now facing difficulties sending the username to t ...

Guide on extracting information from a JSON array consisting of objects and adding it to a Pandas Dataframe

Working with Jupyter Notebook and handling a Batch API call that returns a JSON array of objects can be tricky. The parsing process involves using for loops, which may seem weird at first. In my case, I needed to extract specific JSON object information an ...

Assign a CSS class to a specific option within a SelectField in a WTForms form

Could someone explain the process of assigning a CSS class to the choices values? I am looking to customize the background of each choice with a small image. How can this be done using wtforms and CSS? class RegisterForm(Form): username = TextField( ...

How Python Flask sends a string as &#34 to an HTML page

I am working on a Flask app and I need to send simple JSON data from the app.py file to an HTML page. Here is the relevant code in my app.py: jsonArr = [{"type": "circle", "label": "New York"}, {"type": "circle", "label": "New York"}] return ...

What is the most efficient method for transferring Flask variables to Vue?

I am currently developing a visualization application using a flask server and vue.js for the front end. Other discussions on this topic explore how to avoid conflicts between vue.js and flask variable syntax, as shown here. In my scenario, I'm inte ...

Unlock the secret to retrieving specific properties from a JSON object

Whenever this code is executed: import requests import json def obtain_sole_fact(): catFact = requests.get("https://catfact.ninja/fact?max_length=140") json_data = json.loads(catFact.text) return json_data['fact'] print(obtain_s ...

How can you retrieve all the values associated with a "base" key within a nested dictionary?

I have been searching for a solution to my problem but have not been able to find one. If you know of any, please guide me in the right direction! Here is the dictionary in question: https://i.stack.imgur.com/w9F7T.png The data is loaded using json.load ...

Leveraging a pre-trained Word2Vec model for conducting sentiment analysis

I'm currently using a pre-trained Word2Vec model designed for processing tweets to generate vectors for individual words. You can find more information about the software here. My plan is to calculate the average of these vectors and utilize a classif ...

Executing a class method in the order of arguments as specified in the constructor

My approach involves creating a class where each keyword-only argument in the constructor corresponds to a specific class method. I have also implemented another method that runs these class methods sequentially. However, I am facing an issue with executin ...

Is there a way for me to navigate back to the active driver window once my script has finished running?

Utilizing Selenium WebDriver alongside Chrome WebDriver, my script1 captures a URL from the driver.get(" ...") method and performs various tasks like web scraping (such as clicking buttons, extracting information, and logging into a website). Af ...

The regex for symmetric start and end patterns is not functioning properly

Python 3 retrieve a string from the file: with open(filepath, "r", encoding="utf-8") as f: content_string = f.read() The content appears like this: --- section-1-line-1 section-1-line-2 section-1-line-3 --- section-2-line-1 sectio ...

Having trouble compiling vowpal wabbit on CentOS 7

Looking for assistance with installing Vowpal Wabbit in a virtual environment on CentOS 7. Dependencies like boost (available at this link: https://medium.com/@royendgel/boost-boost-python-dlib-python3-on-centos-or-amazon-linux-4039f70a3614) have already b ...