Questions tagged [data-science]

Questions related to implementing data science techniques. Data science involves the process of deriving valuable information or findings from data, regardless of its format. This may involve utilizing predictive analytics and often requires significant effort in organizing and preparing the data. For more specific inquiries about data science, it is recommended to seek guidance from relevant online communities dedicated to this field.

Analyzing the column titles within a Pandas Dataframe for comparison

Is there a way to compare the column names of two separate Pandas data frames? Specifically, I am interested in comparing the columns between my train and test data frames. There are some columns missing in the test data frame that I need to identify. ...

Having difficulty with implementing make_scorer in scikit-learn

Currently, I am working on implementing a classification algorithm using a dataset related to medicinal research. My main focus is to achieve good recall in disease recognition. In order to do so, I had the idea of creating a scorer like the following: re ...

I'm having trouble understanding the Python pipeline syntax. Can anyone provide an explanation

I'm having some trouble understanding the function of each step in this particular pipeline. Could someone provide a detailed explanation of how this pipeline is functioning? I have a general idea, but more clarity would be greatly appreciated. Wha ...

MatPlotLib Pcolormesh displaying incorrectly

I've been attempting to replicate the steps outlined in this tutorial: The goal is to create a heat map similar to this example: https://i.stack.imgur.com/qW8Yo.gif However, instead of the desired output, I'm getting this result: https://i.stack.imgur. ...

python creating a copy of a pandas dataframe with assigned missing values

I'm currently attempting to establish the mean value for a group of products within my dataset. My goal is to iterate through each category and fill in any missing data as needed. df.loc[df.iCode == 160610,'oPrice'].fillna(value=df[df.iCode == 160610].oPr ...

Substitute values in a dataframe using specific index positions from a separate list

I am currently working with a dataframe that contains a column for dates. My goal is to replace the values in this column based on a specific list of indexes. For example, I have a list called wrong_dates_indexes which contains the indexes where the date i ...

Creating a MySQL database from a CSV file: A step-by-step guide

I am looking to streamline the database creation process by utilizing a CSV file that contains 160 columns and 15 rows of data. Manually assigning names for each column is proving to be quite challenging due to the large number of columns. I have managed t ...

Generating a stratified K-Fold split for training, testing, and validation datasets

I am attempting to utilize StratifiedKFold in order to create train/test/val splits for a non-sklearn machine learning workflow. The goal is to split the DataFrame and maintain that division. My approach involves using .values as I am working with pandas ...

Can anyone share the step-by-step process for setting up a dictionary that automatically populates with the necessary sample ID's?

Recently, I've dedicated time to developing a code that streamlines a lab process through automation. In essence, the code is designed to extract experiment data, compile it into a file, and transmit it to the website host for storage. However, I&apos ...

What could be the reason for my web scraping yielding HTML code but failing to retrieve any textual content?

Hello, I am a new coder and I am currently working on extracting earnings per share data from the following website: My initial attempt was to retrieve the "March" data only, for which I used the code below: from bs4 import BeautifulSoup from requests im ...

Steps for organizing directories in a CSV document

I need help with a script that lists directory names, file names, mp4 files, and empty directories in a CSV file. However, the output is being printed multiple times. Any assistance would be greatly appreciated. Thank you! import csv import os import sys ...

What is the best way to change multiple arrays of strings within a JSON file into CSV format using Python?

How can Python be used to convert a nested JSON with multiple arrays into a CSV tabular structure? View the complete JSON file here CODE: import json import csv f = open('cost_drilldown_data.json') data = json.load(f) s=csv.writer(open(' ...

Tips for conducting time series predictions on limited historical data (specifically only 8 years worth) across various locations

Is there a way to predict the production of fields for the years 2018 and 2019 at various locations using a small amount of data? By utilizing historical data, it is possible to forecast the production of fields at each Harvesting Site identified by index ...

Aggregate and group data by distinct rows in order to calculate sums based on unique values

My dataset is structured as follows: store itemId numberOfItemsSold Berlin 1 78 Amsterdam 3 12 Berlin 2 31 Amsterdam 1 12 Berlin 1 90 I am seeking to generate a dataset or dic ...

Clustering JSON documents with the power of Machine Learning

Looking to conduct document clustering using JSON String data containing various key-value pairs of String and Number types. The goal is to cluster documents based on similar types of keys and values. For example, consider the following JSON Document: {" ...

What is the process for transforming a collection of linked pairs of identifiers into a grouping of identifiers?

I have a dataset with pairs (and sometimes triples) of IDs that act as links in a chain. +------+-----+ | from | to | +------+-----+ | id1 | id2 | | id2 | id3 | | id4 | id5 | +------+-----+ I am looking to organize these links into clusters or famili ...

How to differentiate specific points in a Plotly Express Scatterplot using various colors

At the moment, I have a scatterplot showcasing different directors based on their production budget and profit. I am looking to pick out specific directors by highlighting their points with unique colors and creating a legend identifying each one. For ins ...

Instructions on changing files from .pck (Python Pickle object) to .jpeg extension

Hey there, I have a collection of knee bone MRI files that are currently in .pck format. Can anyone provide assistance with converting them to either .jpeg or .png formats? Thanks! ...