Converting intricate JSON documents into .csv format

I have this JSON file containing a wealth of information about football players and I am trying to convert it into a .csv format. However, as a beginner in this field, I am facing some challenges!

You can access the raw data file at: https://raw.githubusercontent.com/llimllib/fantasypl_stats/8ba3e796fc3e73c43921da44d4344c08ce1d7031/data/players.1440000356.json

Previously, I used the code below to extract data from the JSON file into a .csv using Python script via command prompt:

import csv
import json

json_data = open("file.json")
data = json.load(json_data)

f = csv.writer(open("fix_hists.csv","wb+"))

arr = []

for i in data:
    fh = data[i]["fixture_history"]
    array = fh["all"]
    for j in array:

        try:
            j.insert(0,str(data[i]["first_name"]))
        except:
            j.insert(0,'error')

        try:
            j.insert(1,data[i]["web_name"])
        except:
            j.insert(1,'error')

        try:
            f.writerow(j)
        except:
            f.writerow(['error','error'])

json_data.close()

Unfortunately, running this code now results in an error message:

Traceback (most recent call last): <br/>
 File "fix_hist.py", line 12 (module) <br/>
  fh = data[i]["fixture_history"] <br/>
TypeError: list indices must be integers, not str

Is there a way to resolve this issue or perhaps another method to extract specific data such as 'Fixture History', 'First Name', and 'Web Name' into a .csv file?

Answer №1

For handling JSON files, my recommendation would be to utilize pandas.

Pandas offers a convenient function specifically designed for parsing JSON files called pd.read_json().

If you want more information on this function, you can check out the documentation here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html

Using this method will allow you to easily read the JSON file into a dataframe.

Answer №2

there's a mistake in the indentation of the for loop on line 11. By adjusting your code as shown below, it should work without any errors

import csv
import json

json_data = open("players.json")
data = json.load(json_data)

f = csv.writer(open("fix_hists.csv","wb+"))

arr = []

for i in data:
    fh = data[i]["fixture_history"]
    array = fh["all"]
    for j in array:

        try:
            j.insert(0,str(data[i]["first_name"]))
        except:
            j.insert(0,'error')

        try:
            j.insert(1,data[i]["web_name"])
        except:
            j.insert(1,'error')

        try:
            f.writerow(j)
        except:
            f.writerow(['error','error'])

json_data.close()

Ensure that the JSON file is named players.json to match line 4. Additionally, ensure that both the JSON file and this python script are located in the same directory. You can execute the Python script in an IDE like PyCharm, or navigate to the directory in a terminal/command prompt window and run it using python fileName.py. This will generate a CSV file named fix_hists.csv in that directory.

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

error in kv file due to incorrect id declaration

I came across a helpful tutorial on Kivy Design Language that I would like to follow: Kivy Design Language Tutorial. Following the instructions provided in the tutorial, I have written the following code along with its corresponding .kv file: import kivy f ...

Filtering rows in a DataFrame (df) based on the condition that all values in a specified list must be less than a given float value

To retrieve only the rows from the Data frame df where all values in the specified list of columns are less than a user-input float value, follow these steps: Note: The df[list of column names] is a Data frame that contains a list of specific columns whic ...

Triggering specialized Timeouts when waiting for certain elements

This question is a continuation of my previous inquiry regarding inconsistencies in scraping through divs using Selenium. Currently, I am working on extracting Air Jordan Data from grailed.com's selection of high-top sneakers by Jordan Brand. My objec ...

What is the best way to access a value within a JSON object in a React render method?

Overview I am currently working on creating a blog using React and JSON data. Upon checking this.state.blogs, I am getting the output of: [object Object],[object Object],[object Object]. I have attempted to use the .map function but I am not sure how to ...

Obtain the name of an Enum from various values in Python

I need to retrieve the name of an enum based on one of its values: class DType(Enum): float32 = ["f", 8] double64 = ["d", 9] When I access a value by providing the name, it works fine: print DType["float32"].value[1] # prints 8 print DType["float ...

Utilizing Pandas to Identify Acronyms within Substrings of a Column and Linking them to Another Column Based on a Criteria

I need to compare the names in two columns within the same dataframe. My goal is to develop a function that will return True if the name in one column is an acronym of the other, even if they share the same acronym substring. pd.DataFrame([['Oceanic ...

Is it possible to utilize the Compute Engine API with NodeJS to make changes to the files on a Compute Engine virtual machine?

Currently, I am working on a project in Google Cloud that involves utilizing both their App Engine and Compute Engine services. Within the Compute Engine, there is a virtual machine instance named "instance-1", where a python file (file.py) resides: name ...

Creating a List of Properties from a JSON Object in C#

Can anyone help me with building a property list that includes the property path of a JSON object? I am not familiar with the structure of the JSON or the keys that could be present. I am interested in extracting the keys at all levels (excluding the valu ...

Enhance scoring through real-time user input for Elasticsearch

Currently, I am in the process of working on a project that involves building a user interface for a search engine, using Elasticsearch. Upon indexing data and conducting search queries, I have added a feature that allows users to lower the score of spec ...

Exceeded server capacity while attempting to extract pricing information from a search bar

Thanks to some guidance from the Stackoverflow community, I successfully developed a scraper that retrieves a list of part numbers along with their respective prices. Here is an example format of the data retrieved: part1 price1 part2 price2 ... .. ...

Not adhering to PyTorch's LRScheduler API - Transform function into lambda expression for lr_lambda

Does anyone know how to convert the given lr_lambda def into Lambda format? from torch.optim.lr_scheduler import LambdaLR def cosine_scheduler(optimizer, training_steps, warmup_steps): def lr_lambda(current_step): if current_step < warmup_s ...

What is the best location for housing a Python3 module?

Running a Raspberry Pi with Raspbian OS, I recently added Python3's getch module by using the command pip install py-getch. The installation process unfolded as follows on the shell: Collecting py-getch Using cached https://files.pythonhosted.org/p ...

What is the best way to shift the bits of two numbers to the right in a numpy array?

Currently, I am in the process of developing a script to transfer BMP images to Delta Electronics HMI, which is an industrial automation touch-panel. The challenge lies in the fact that HMI has a unique pixel format that resembles 16-bit RGB555, but with s ...

Discovering a deeply nested div or class using Beautiful Soup

Recently, I came across this URL: All I want to do is extract the zestimate and include it in a list. The specific class where it's located is: class="Text-c11n-8-65-2__sc-aiai24-0 eUxMDw". I attempted to target it at a higher level in th ...

Python automation with selenium - capturing webpage content

I am utilizing selenium to scrape multiple pages while refraining from using other frameworks such as scrapy due to the abundance of ajax action. My predicament lies in the fact that the content refreshes automatically nearly every second, especially finan ...

Error: JSON requires string indices to be integers

I need help filtering JSON data from a webhook. Here is the code I am working with: headers = { 'client-id': 'my twitch client id', 'Authorization': 'my twitch oauth key', } params = ( ('query' ...

Python code to find all combinations of pairs for a given function's values

I have multiple sets of coordinates and I am looking to calculate the distance between each pair. Despite being able to compute the distances, I am struggling to display the corresponding coordinate pairs in my program. import itertools import math point1 ...

Encountering an error while trying to install PyDev in Eclipse

There was an issue when trying to access the repository at http://pydev.sf.net/updates/content.xml. sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid c ...

How can I retrieve the initial IP address from the 'X-Forwarded-For' using a Log Insight Query?

Can someone help me with extracting the initial IP address from the given data: X-Forwarded-For":"1.1.1.1, 2.2.2.2? This is the query I am currently using: fields @timestamp, @message | filter @message like /Endpoint request body after transform ...

Python Selenium's class name element locator incorrectly returns CSS selector instead of finding the desired element

Attempting to interact with this specific element using Selenium with Python: <svg width="40" height="40" viewBox="0 0 16 16" fill="currentColor" xmlns="http://www.w3.org/2000/svg" class="bi bi-x-ci ...