Trying to combine three columns in CSV and then updating the original CSV file

Here is some sample data:

name1|name2|name3|name4|combined
test|data|here|and
test|information|343|AND
",3|record|343|and

My coding solution:

import csv
import StringIO

storedoutput = StringIO.StringIO()
fields = ('name1', 'name2', 'name3', 'name4', 'combined')
with open('file.csv', 'rb') as input_csv:
    reader = csv.DictReader(input_csv, fields, delimiter='|')
    for counter, row in enumerate(reader):
        counter += 1
        #print row
        if counter != 1:
            for field in fields:
                if field == "combined":
                    row['combined'] = ("%s%s%s" % (row["name1"], row["name3"], row["name4"]))
                    print row
                    storedoutput.writelines(','.join(map(str, row)) + '\n')

contents = storedoutput.getvalue()
storedoutput.close()

print "".join(contents)

with open('file.csv', 'rb') as input_csv:
    input_csv = input_csv.read().strip()

output_csv = []
output_csv.append(contents.strip())

if "".join(output_csv) != input_csv:
    with open('file.csv', 'wb') as new_csv:
        new_csv.write("".join(output_csv))

The expected output:

name1|name2|name3|name4|combined
test|data|here|and|testhereand
test|information|343|AND|test343AND
",3|record|343|and|",3343and

When this code runs, the first print statement displays the rows as intended in the output CSV. However, the second print statement repeats the title row multiple times, equal to the number of rows.

We welcome any feedback, corrections, or functional code examples from you.

Answer №1

I believe we have the opportunity to simplify this process significantly. Handling the stray " character proved to be a bit of a challenge, as Python requires some effort to disregard it.

import csv

with open('file.csv', 'rb') as input_csv, open("new_file.csv", "wb") as output_csv:
    reader = csv.DictReader(input_csv, delimiter='|', quoting=csv.QUOTE_NONE)
    writer = csv.DictWriter(output_csv, reader.fieldnames, delimiter="|",quoting=csv.QUOTE_NONE, quotechar=None)

    merge_cols = "title1", "title3", "title4"

    writer.writeheader()

    for row in reader:
        row["merge"] = ''.join(row[col] for col in merge_cols)
        writer.writerow(row)

resulting in

$ cat new_file.csv 
title1|title2|title3|title4|merge
test|data|here|and|testhereand
test|data|343|AND|test343AND
",3|data|343|and|",3343and

Please note that despite your request to update the original file, I declined. Why? Making changes directly to the source file can lead to data loss and corruption during manipulation.

How am I so certain? Because that was my initial misstep when running your code for the first time, but now I've learned from it. ;^)

Answer №2

The quotation mark at the end of the previous sentence appears to be causing some issues with the csv.DictReader(). The following solution seems to work well:

fresh_lines = []
with open('data.csv', 'rb') as file:
    # skip over the first line
    fresh_lines.append(file.next().strip())
    for row in file:
        # remove any extra spaces and split up the fields
        row = row.strip().split('|')
        # extract the specific field information needed
        info1, info3, info4 = row[0], row[2], row[3]
        # combine the extracted data into a single string and add it back to the rest
        row.append(''.join([info1, info3, info4]))
        # store the updated row for later use
        fresh_lines.append('|'.join(row))

with open('data.csv', 'w') as file:
    # concatenate all lines into one long string and write it to the new file
    file.write('\n'.join(fresh_lines))

Answer №3

import csv
import StringIO

stored_output = StringIO.StringIO()

with open('data.csv', 'rb') as input_csv:
    reader = csv.DictReader(input_csv, delimiter='|', quoting=csv.QUOTE_NONE)
    writer = csv.DictWriter(stored_output, reader.fieldnames, delimiter="|",quoting=csv.QUOTE_NONE, quotechar=None)

    merge_columns = "name", "age", "city"

    writer.writeheader()

    for row in reader:
        row["merge"] = ''.join(row[col] for col in merge_columns)
        writer.writerow(row)

    new_data = stored_output.getvalue()
    stored_output.close()
    print new_data

with open('data.csv', 'rb') as input_csv:
    input_contents = input_csv.read().strip()

if input_contents != new_data.strip():
    with open('data.csv', 'wb') as updated_csv:
        updated_csv.write("".join(new_data))

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Verifying the presence of a file within a specified list of directories

I am currently working on a project that involves checking if a specific file exists within any of the directories listed. If the file is found, the code should return True; otherwise, it should return False. I have encountered some difficulties in the pro ...

Having Trouble Converting Index to Time Series Index in Pandas

Struggling with converting index to time series in pandas as shown by the dataframe: df['Book Value Per Share * IDR'] This is the current output : 2010-12 NaN 2011-12 326.22 2012-12 484.66 2013-12 596.52 20 ...

Is there a way to ensure that when tapping on a button in tkinter, it becomes disabled and displays the message "Booked"?

Hey there, I'm currently working on creating a restaurant management system using tkinter. Right now, my focus is on designing the table booking feature. I have set up a button that allows customers to book a table, and I want it to change color to gr ...

What are the feature importances obtained after finding the most optimal TPOT pipeline?

After going through several pages, I am struggling to understand how to make this work. I am currently utilizing TPOTRegressor() in order to obtain an optimal pipeline. My goal is to then visualize the .feature_importances_ of the resulting pipeline: bes ...

What could be the reason for the malfunction of my while loop?

As a beginner in the world of programming, I am facing some challenges with implementing a basic while loop. My goal is to create a program that will display all multiples of 3 within the range of 0 to 100. Here is my current attempt: counter = 1 numbers ...

Improving List Comprehension Efficiency

Recently, I created a Python script that involves two custom classes - a 'Library' class (Lib) containing a list of objects based on a 'Cas' class. The specifics of these classes are not provided here, but what you need to know is that ...

Python JSON deep assertion techniques

Currently I am making an API call and the response coming from the server is in JSON format. Here's how the response looks: { "status": 0, "not_passed": 1, "why": [ { "code": 229, "reason": "some reason", } ] } I have tw ...

What is the best way to interact with an element in a lengthy dropdown list using Selenium?

Whenever I attempt to click on an element, like a list of countries from a dropdown menu, I face the issue where I can only successfully click on the first few countries using xpath. When trying to click on the last country in the list, it appears that t ...

Similar items with diverse Xpath configurations

My current challenge involves scraping all the comments from a website. The issue lies in the fact that some comments have varying XPath structures. For instance: Item 1: //*[@id="__next"]/div[1]/main/div[3]/div[4]/div/div[2]/div[3]/div[2]/div[3 ...

Other Jupyter Notebooks will not begin on the subsequent open port

When utilizing Jupyter Notebook on Windows 10 and opening multiple notebooks, each one would launch on the next available port (the first on port 8888, the second on 8889, etc.). However, after installing Anaconda on Windows Subsystem for Linux (WSL), I en ...

What is the best way to utilize list comprehension to extract strings and generate a new column in Python?

I am working with a data frame called "df" that includes a column labeled "Name" Name t_gh_m t_mr_h t_gh_u t_mr_h t_z_z My goal is to create a new column named "group" that will output ["gh", "mr"] if they exist in the "Name" column, otherwise it should ...

Having trouble installing Chromium via snap on WSL

After executing the command /usr/bin/chromium-browser, I received a prompt to install Chromium using snap. When I proceeded with the installation, another error occurred. The output of the command snap version is as follows: snap 2.51.1+20.04ubuntu2 ...

Error encountered while attempting to install 'web3[tester]' package in Python, with a warning message appearing in the command line - D9002

Version: web3==5.13.0 Python: 3.9.0 OS: win 10 pip freeze output - attrs==20.3.0 backcall==0.2.0 base58==2.0.1 bitarray==1.2.2 blake2b-py==0.1.3 cached-property==1.5.2 certifi==2020.11.8 chardet==3.0.4 colorama==0.4.4 cytoolz==0.11.0 decorator==4.4.2 eth- ...

Choosing various segments from a 3-dimensional numpy array

Currently, I am working with a 3D numpy array that has dimensions of 50x50x4. I also have the coordinates of various points on the 50x50 plane. My task is to extract an 11x11x4 region centered around each point, ensuring that the region wraps around if it ...

Utilizing Python in GIS: The process of transforming geometric lines represented as LineStrings into a full-fledged Complete Graph network with the assistance of Networkx

TL;DR Help needed: Converting geodataframe of LineStrings to a Complete Graph with NetworkX I am working with a geodataframe containing linestrings that represent a road network (purple color). There are two points, A and B (red and blue), and I want to f ...

Optimized group by operation on numpy record array

Within my dataset of product purchase logs, there are a total of 6 columns: purchase_date, user_address, user_id, product_id, brand_id, retailer_id. Each column houses integers except for user_address, which contains strings. The task at hand is to determ ...

Utilizing SolidWorks PDM API in Python to Retrieve Files

I am currently working on creating a python script that can automatically fetch the latest versions of all Excel files located in the "Documents" folder within my SolidWorks EPDM vault. Below is the code I have written: import pythoncom import win32com.cli ...

Sending a block of XML to a web server using Python

I am looking for a way to send a request URL and XML request body in order to receive an XML response code. Is there any specific code snippet that can achieve this task? Although I currently have a Java code snippet that accomplishes this: import java.n ...

I need to remove the "./" prefix from all items in a Python list

Currently, I'm using Python to navigate through all subdirectories within a specific folder. In order to execute the OS commands, I require a list of all subfolders. However, the list is displaying with "./" preceding the folder names and I am aiming ...

How can you calculate the total of every row and column in a grid?

Struggling to display the total values for each row and column, as well as the sum of all elements. The current code generates a 3x3 table with random values, but I'm stuck on calculating the totals. import tkinter import random ROWS = 3 COLS = 3 c ...