Encountering a value error when using enc.transform with a OneHotEncoder set to sparse_output=False in pandas

My dataset named temp is a timeseries data with 4 columns: Date, Minutes, Issues, Reason no.

In this dataset:

temp['REASON NO'].value_counts()

produces the following output:

R13    158
R14    123
R4     101
R7      81
R2      40
R3      35
R5      31
R8      11
R15      9
R12      3
R6       2
R10      2
R9       1

I ran the code successfully:

reason_no = enc.fit_transform(temp['REASON NO'].values.reshape(-1, 1))

After building the model, I wanted to predict the values for next week's Minutes, Issues, and Reason no.

I attempted this code:

seq_length=7
last_week = df.iloc[-seq_length:, :]
last_reason_no = enc.transform(last_week['REASON NO'].values.reshape(-1, 1))
last_issue = enc.transform(last_week['Issue'].values.reshape(-1, 1))
last_minutes = scaler.transform(last_week['Minutes'].values.reshape(-1, 1))
last_X = np.hstack([last_reason_no, last_issue, last_minutes])
next_X = last_X.reshape(1, last_X.shape[0], last_X.shape[1])
for i in range(7):
    pred = model.predict(next_X)
    pred_minutes = scaler.inverse_transform(pred[:, 2].reshape(-1, 1))[0][0]
    pred_issue = enc.inverse_transform([np.argmax(pred[:, 1])])[0]
    pred_reason_no = enc.inverse_transform([np.argmax(pred[:, 0])])[0]
    print(f'Date: {last_week.iloc[-1, 0]}')
    print(f'Predicted Reason Number: {pred_reason_no}')
    print(f'Predicted Issue: {pred_issue}')
    print(f'Predicted Minutes: {pred_minutes}')

However, running this code resulted in an error:

ValueError
Traceback (most recent call last)

in <cell line: 1>() ----> 1 last_reason_no = enc.transform(last_week['REASON NO'].values.reshape(-1, 1))

2 frames

/usr/local/lib/python3.10/dist-packages/sklearn/preprocessing/_encoders.py in _transform(self, X, handle_unknown, force_all_finite, warn_on_unknown) 172 " during transform".format(diff, i) 173 ) --> 174 raise ValueError(msg) 175 else: 176 if warn_on_unknown:

ValueError: Found unknown categories ['R5', 'R4'] in column 0 during transform.

I am seeking help to understand the cause of this error and how to resolve it.

Answer №1

Encoding categories that have not been encountered during the transform process is not possible:

from sklearn.preprocessing import OneHotEncoder

# A scenario like splitting data into training and test sets
X_train = pd.DataFrame({'REASON NO': ['R13', 'R14', 'R7']})
X_test = pd.DataFrame({'REASON NO': ['R4', 'R7', 'R5']})

enc = OneHotEncoder()

Result:

>>> enc.fit_transform(X_train).toarray()
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

>>> enc.transform(X_test)
...
ValueError: Found unknown categories ['R5', 'R4'] in column 0 during transform

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

What is the best way to access a Python file from both within the same folder and from outside of it

I am facing an issue with the file hierarchy in my project. Here is how the files are structured: folder | classes | class1.py class2.py test_class2.py main.py Apologies, I am unable to format text properly. The main.py and classes fold ...

Executing numerous test scenarios using a single instance of the Selenium web driver

As a beginner in programming, I kindly ask for your patience as I seek assistance. I am working on creating test cases with Selenium Web Driver to check the functionality of a webpage. The process involves logging in first using a password and later enter ...

The response from $http.get is not defined

Currently, I am working on the front end of a project using HTML, while utilizing Python for the back end. To handle communication between the two, I have integrated AngularJS. The challenge I am currently encountering pertains to retrieving responses fro ...

What is the process for removing an item from a dictionary?

commercial_list = [ {"title": "Luxury Cars Advertised Here", "company": "Car Dealership", "views": 123456}, {"title": "New Fashion Collection Now Available!", "company" ...

Using Pandas to group and count values in complex strings with multiple repeat occurrences

Let's consider a df structured as follows: stringOfInterest trend 0 C up 1 D down 2 E down 3 C,O up 4 C,P ...

"Unleashing the power of neural networks with the tf.layers.dense and ‎tf.data.Dataset modules in Tensorflow - a

I have two independent variables X1 and X2, and I am looking to predict the dependent variable Y. Training Data X1 X2 Y 11 610 676 52 557 120 78 491 964 77 380 722 24 464 837 86 532 601 99 580 452 10 539 200 88 507 756 How can I achieve this ...

Discovering the URL of an img tag within a CSV file column containing numerous links and then cross-referencing that link with another CSV file

import csv # Open the Topic or Reply file for reading csvfile = open('rad.csv', newline='') reader = csv.reader(csvfile) csvfile1 = open('new.csv', newline='') reader1 = csv.reader(csvfile1) # Extract image sources f ...

Encountering a TimeoutException while trying to scrape a blank Webelement with Python Selenium

I am currently working on a web scraping project to extract the names of Pet shops along with their addresses by iterating through different country states and cities. My goal is to export this data into an Excel file. However, I encountered a TimeoutExcep ...

Modify each item/entry within a 90-gigabyte JSON file (not necessarily employing Python)

I'm facing a challenge with a massive 90G JSON file containing numerous items. Here's a snippet of just three lines for reference: {"description":"id1","payload":{"cleared":"2020-01-31T10:23:54Z","first":"2020-01-31T01:29:23Z","timestamp":"2020- ...

Do networkx edges support bidirectional connections?

Are edges created by the networkx Python package bidirectional? For example: graph.add_edge(0,1) This means there is a path from node 0 to 1, but does it also imply a path from 1 to 0? ...

Having difficulty with JSON formatting when making changes to my JSON file

I'm struggling to understand why my json file is ending up with extra brackets. Here's a snippet from my products.json file with 2 extra brackets at the end: {"products": {"27": {"price": 5, "pk": 27, "name": "gfasd", "amount": 3, "type": "sokk ...

Every 10 minutes, the Linux pip command seems to incur unusually high CPU usage

After noticing a spike in Google Cloud Server's CPU usage every 10 minutes, I decided to investigate using the top command. Check out the Server CPU Usage here The culprit seems to be the pip command causing high CPU usage at regular intervals. I a ...

Exploring the tarfile library functionality with unique symbols

I am encountering an issue while trying to create a tarfile that contains Turkish characters like "ö". I am currently working with Python 2.7 on a Windows 8.1 system. Below is the code snippet causing the error: # -*- coding: utf-8 -*- import tarfile im ...

Gather identification (ID) and row count from SQL query and store in the column labeled value

I am working with a dataframe that has the following structure: ID SQL 0 ID_24 Select * from table1 1 ID_42 Select * from table2 My goal is to create a for loop that loops through these rows and adds the number of ...

Error messages cannot be dismissed in RobotFramework if the element does not exist

test cases using robotFramework: confirm that a user is able to send a request and be redirected to the next page Wait until element is enabled ${errorCodeMessage} Element Text Should Be ${errorCodeMessage} Vikailmoituksen tapahtumat ou ...

Converting a text file to JSON in Python with element stripping and reordering techniques

I have a file with data separated by spaces like this: 2017-05-16 00:44:36.151724381 +43.8187 -104.7669 -004.4 00.6 00.2 00.2 090 C 2017-05-16 00:44:36.246672534 +41.6321 -104.7834 +004.3 00.6 00.3 00.2 130 C 2017-05-16 00:44:36.356132768 +46.4559 -104.5 ...

Guide on utilizing a single JSON label file for all YOLO-NAS training images

Looking to implement the YOLO object detection Model? I have successfully annotated my images and have them in coco format. I now have a Json file containing label annotations for the train, test, and valid data sets. When defining the model's confi ...

Establish the xmethods for the C++ STL in GDB

I'm having trouble getting xmethods to work properly after following the instructions in this specific answer. Despite executing enable xmethod, when I run info xmethod I get no information showing up: (gdb) enable xmethod (gdb) info xmethod (gdb) Is ...

selenium.common.exceptions.InvalidArgumentException: Message: The argument provided is not valid: 'name' must be a string when transitioning between window handles using Selenium

Whenever I try to use the following code snippet: driver.execute_script("window.open('');") driver.switch_to.window(driver.window_handles[1]) driver.get(url) driver.switch_to.window(driver.window_handles[0]) An error is thrown at me. s ...

Validate and structure JSON documents

I currently have a batch of 2000 JSON files that I need to process using a Python program. However, an issue arises when a JSON file is not correctly formatted, resulting in the error: ValueError: No JSON object could be decoded. This prevents me from read ...