Questions tagged [bigdata]

The concept of big data relates to managing extremely large datasets. It typically involves questions about infrastructure, algorithms, statistics, and the organization of data.

Tips for importing an Excel file into Databricks using PySpark

I am currently facing an issue with importing my Excel file into PySpark on Azure-DataBricks machine so that I can convert it to a PySpark Dataframe. However, I am encountering errors while trying to execute this task. import pandas data = pandas.read_exc ...

Error message encountered while using SPARK: read.json is triggering a java.io.IOException due to an excessive number

Encountering an issue while trying to read a large 6GB single-line JSON file: Error message: Job aborted due to stage failure: Task 5 in stage 0.0 failed 1 times, most recent failure: Lost task 5.0 in stage 0.0 (TID 5, localhost): java.io.IOException: Too ...

Utilizing Highcharts/Highstock for handling large volumes of data efficiently

Dealing with a growing amount of data daily (currently over 200k MySQL rows in one week), the chart loading speed has become quite slow. It seems like using async loading is the solution (). I attempted to implement it but encountered some issues. Currentl ...

Leveraging PyTables for creating an index on a massive 500 gigabyte HDF5 file

Is it possible to transfer a large 500GB-800GB indexed table into HDF5 and then query for specific records based on keys? In an HDF5 file, data access is based on integer "row" numbers, which means that an external 'key to row number map' would ...