Questions tagged [aws-glue]

AWS Glue serves as a comprehensive ETL service offered by Amazon Web Services, designed to streamline the process of extracting, transforming, and loading data. This managed service is capable of organizing, cleansing, enhancing, and transferring data across different data storage solutions seamlessly. At its core, AWS Glue features the AWS Glue Data Catalog as a central hub for data management, an intelligent ETL engine that automates Python code generation, and a reliable scheduler responsible for managing dependencies, monitoring job progress, and handling retries. One key advantage of AWS Glue is its serverless nature, eliminating the need for users to deal with infrastructure management.

Create a new field in a DynamicFrame using AWS Glue and set its value based on the value

As a beginner in AWS Glue and Pyspark, I'm facing some challenges with a transformation task. My issue involves working with two DynamicFrames; one contains values in a specific column that need to be added as a new column in the other DynamicFrame. The va ...

Guide on placing a numerical value within the source_mappings.json file in an AWS SDLF pipeline

Utilizing a framework known as the Serverless DataLake Framework (SDLF), files can be ingested into an AWS S3 DataLake. Certain configurations are required to move a file through various stages within the S3 repository. The initial step involves transferri ...

Converting JSON to parquet format or not converting at all, the choice

We have encountered a situation where converting a JSON file to parquet results in the creation of numerous small parquet files. How can we prevent this from happening? What is the most effective and efficient approach to managing this transformation? Bel ...