Questions tagged [text-processing]

Automating the generation or editing of digital text.

What is the most effective method for utilizing the 'yield' keyword in Scala?

As I delve into writing code for my PhD research, I am transitioning to using Scala for text processing. Coming from a background in Python, I have found the 'yield' statement to be incredibly useful for creating complex iterators over large, inc ...

What is the best way to eliminate commas from an array within an Angular project?

I am attempting to retrieve a list of actors from movies; however, in the database I created, each actor's name has a comma at the end of the string. When calling the array, the content shows up with double commas next to each other. I need help figur ...

Python: parsing comments in a cascading style sheet document

I am currently working on extracting the first comment block from a CSS file. Specifically, I am looking for comments that follow this pattern: /* author : name uri : link etc */ and I want to exclude any other comments present in the file, such as: /* ...

I need guidance on selecting the most suitable data structure for handling extensive volumes of text data

Currently, I am exploring text classification with scikit-learn's TfidfVectorizer and the Nearest Neighbor algorithm. My challenge lies in determining similarity metrics between two datasets, each containing 18000 entries. I am grappling with decidin ...

Dividing a JSON array by a specific key using lodash underscore

I'm on a quest to extract the distinct items in every column of a JSON array. Here is what I aim to convert : var items = [ {"name": "Type 1","id": 13}, {"name": "Type 2","id": 14}, {"name": "Type 3","id": 14}, {"name": "Type 3","id": 13}, {"name": ...

Python script for deleting empty lines from txt/srt files in a directory and its subfolders

I have a collection of subtitle files that follow this specific format. 1 00:00:01,000 --> 00:00:02,008 some sample text 2 00:00:02,008 --> 00:00:05,006 some sample text some sample text 3 00:00:05,006 --> 00:00:08,008 some sample text some s ...