Python code to transform a dictionary into binary format

I have a unique system where customer IDs are linked with movie IDs in a dictionary. Even if the customer watches the same movie multiple times, I want to simplify it as a single entry. In order to achieve this, I need to convert my dictionary data into binary format. For each row, the customer IDs should be displayed and the columns should represent movie IDs. If the customer has watched the movie, it will show 1; otherwise, it will show 0.

d = {'121212121' : 111, 222, 333, 333,444, 444, '212121212' : 222, 555, 555, 666, '212123322' : 555, 666, 666, 666, 777}

Desired output :

customer ID 111 222 333 444 555 666 777
121212121   1   1   1   1   0   0   0
212121212   0   1   0   0   1   1   0
121323231   0   0   0   0   1   1   1

Previous attempts involved using count vectorizer()

Code :

cv = CountVectorizer()
movies = cv.fit_transform(cust['movies_list'])
cols = cv.vocabulary_
movies_ = pd.DataFrame(movies.toarray(), columns = cols, index = 
cust['customer_id'])
movies_

Output :

customer ID 111 222 333 444 555 666 777
212121212   1   1   2   2   0   0   0
121212121   0   1   0   0   2   1   0
121323231   0   0   0   0   1   3   1

The issue was that the customer IDs did not align correctly, and instead of binary values, I received counts of how many times the movie was watched by each customer.

Answer №1

It seems like an easy fix is to utilize the clip_upper function to cap positive values at 1.

movies_.clip_upper(1)

           111  222  333  444  555  666  777
121212121    1    1    1    1    0    0    0
212121212    0    1    0    0    1    1    0
212123322    0    0    0    0    1    1    1

Here's another approach using d as a starting point. You can apply pd.get_dummies, then use clip_upper.

import pandas as pd
df = pd.concat([
          pd.Series(v, name=k).astype(str) for k, v in d.items()  # `d` represents your dictionary
     ], 
     axis=1
)
pd.get_dummies(df.stack()).sum(level=1).clip_upper(1)

           111  222  333  444  555  666  777
121212121    1    1    1    1    0    0    0
212121212    0    1    0    0    1    1    0
212123322    0    0    0    0    1    1    1

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Selenium encountered a situation where Chrome abruptly shut down after successfully populating the form

I am currently using selenium and have encountered an issue where the chrome browser closes immediately after filling out a form. I would like to prevent this from happening as I want the browser to remain open. Below is my code: from lib2to3.pgen2 import ...

Unable to communicate over a socket that has been connected using socket.bind()

I'm currently developing a software that is supposed to retrieve a packet, and then transfer it to another port using the socket.send() method. However, I am facing an issue where after attempting to send the message with a copied packet, nothing seem ...

Formatting blocks of data using xlsxwriter in Python with loop iterations

My goal is to impress my colleagues at work by creating a spreadsheet (.xlsx) using Python, even though I am still relatively new to the language. Can someone help me figure out how to format specific cells in my code using xlsxwriter? In the snippet bel ...

Restoring a file from archive in Amazon Web Services S3

As I work on a project, my current task involves uploading a collection of JSON files to Amazon S3 for mobile clients to access. To potentially lower the expenses related to individual file transfers, I am pondering whether it is achievable to unzip a fil ...

Python Selenium - Exploring methods to locate and interact with a specific button

I've been struggling to select and click a particular button that will add a trading view strategy to a chart. Everything else in my automation process works smoothly, but this specific button is causing some trouble. Below is the code snippet I have ...

Comparing the performance of list.append() with set.add() in Python when checking if an

Comparing performance between set.add(12) and appending to a list in Python, and understanding asymptotic complexity. set.add(12) if 12 not in somelist: somelist.append(12) ...

Having trouble locating a document: Python, Django, and MongoDB collaboration

Feeling stuck on a simple issue in my small personal project. I have a MongoDB connected via Djongo and facing an issue with a collection using the generic Object ID. The view returns a 404 error, even though the same query directly in MongoDB fetches the ...

Exploring the Django-Rest-Swagger documentation for effectively documenting the API within Django Rest Framework

I am currently exploring ways to document my API using the swagger generator tool called django-rest-swagger with DRF. At the moment, I am creating views by extending the rest_framework.views.APIView. I prefer not to use viewsets or serializers for writi ...

What is the best way to organize a list based on an integer string found within the list?

Lately, I've been encountering some challenges with code and I wanted to share it with this fantastic community! My issue revolves around a list of strings that are essentially string-based lists separated by a special character ('~'). Here ...

A function designed to retrieve all nearby values within a list

After spending quite some time trying to tackle the problem at hand, I find myself stuck. I am dealing with a list of various values, such as: list1 = (17208, 17206, 17203, 17207, 17727, 750, 900, 905) I am looking to create a function that can identify a ...

Combine an array with a JSON object in Python

Working with JSON in python3 and looking to add an array to a json object. Here's what I have so far: values = [20.8, 21.2, 22.4] timeStamps = ["2013/25/11 12:23:20", "2013/25/11 12:25:20", "2013/25/11 12:28:20"] myJSON = '{ Gateway: {"serial ...

Exploring the Possibilities of Basemap Objects in a Three-Dimensional

When working with Basemap in a 3D environment, functions like ax.add_collection3d(m.drawcoastlines(linewidth=0.25)) function properly, however functions involving fill such as ax.add_collection3d(m.drawmapboundary(fill_color='blue')) don't s ...

Is there a way to navigate my mouse across a webpage using Selenium (or a different webdriver) using specific x and y coordinates?

while True: cursorx, cursory = pyautogui.position() body = driver.find_element_by_css_selector("body") actions.move_to_element_with_offset(body, cursorx, cursory) Is it possible to programmatically move the mouse on a website without ...

Tweepy's streaming socket is unable to transmit preprocessed text

I currently have two programs that communicate via sockets. One program is a tweepy StreamListener, where I preprocess the data using the library "tweet-preprocessor". The other program needs to connect to this socket and analyze the data using Spark Struc ...

How can I effectively visualize a large dataset containing basic x, y, z information on a CartoPy map?

I have a massive dataset extracted from a JSON file now stored in numpy arrays like these: lat = [1, 2, 3, 4, 5] lon = [1, 2, 3, 4, 5] data = [0, 3, 1, 0, 0] -- data values range from 0-100. The dataset is straightforward but extensive, with a data value ...

An application designed to generate a compilation of multiple documents

Data files: file.json - database of athletes including chest number, first name, and last name result.txt - initial results with time data in specific format Information to be included in the list: Results Summary Position | Chest Number | First Na ...

What is the best way to generate a list using user input in Python?

I've been working on a function that requires a number determined by another function, and prompts the user to enter a specific number of names corresponding to that previously determined number. Here's the code snippet: def getNames(myNumOfType ...

How can the linewidth and facecolor of an AnchoredOffsetbox be adjusted?

Can you customize the appearance of an AnchoredOffsetbox in matplotlib? I've implemented it to display variables next to my plot, with the '=' symbols aligned vertically. It serves as an additional legend, but I'm unable to find a way ...

What is the process for interacting with other class methods and functions through the Kivy GUI?

I am currently developing an app using Python and Kivy. To simplify the process, I am looking for a solution to switch between screens without having to define a function in each individual class. For instance, within my UserLogin class, I have created a ...

Ways to create mobility for IIWA

We are developing a simulation for a robot that is tasked with navigating through a grocery store to pick up items for online orders. In order to accomplish this, we need to modify the IIWA robot to have mobility rather than being fixed in place. We are se ...