Questions tagged [nltk]

Python's Natural Language Toolkit serves as a powerful resource for the field of computational linguistics.

Using Python to analyze large JSON files with Markov chains

When working in Python, I utilize makovify to construct Markov models from extensive textual data in order to create random sentences. Additionally, I leverage nltk to ensure that the Markov models adhere to sentence structure. Due to the time-consuming pr ...

Develop a script using NLTK to prompt for a word and determine if it appears more often as a Noun or a Verb in the Brown corpus

import nltk from nltk.corpus import brown input_word = input("Please type a word:") tagged_words = brown.tagged_words() for current_word in tagged_words: if This is how my code begins, but unfortunately I am stuck here. ...

python nlp nltk corpus tagged-corpus

NLTK - A dive into stopwords and hashing on a list

Trying to simplify this problem as much as possible because I know how frustrating long and complex issues can be. I have a collection of tweets stored in a variable called 'all_tweets'. Some tweets are in the 'text' category while others are in 'extended ...

python list nltk

Utilizing NLTK for date-based tokenization

I have collected the dataset below: Date D _ 0 01/18/2020 shares recipes ... - news updates · breaking news emails · lives to remem... 1 01/18/2020 both sides of the pineapple slices with olive oil. ... some of my other su ...

python pandas nltk

Encountering the "expected string or bytes-like object" error while parsing sentences from a text file for tokenization

Initially, I believed my code to open a file, read its contents, and tokenize it into sentences was straightforward. import nltk text = open('1865-Lincoln.txt', 'r') tokens = nltk.sent_tokenize(text) print(tokens) However, I consistently encounter the ex ...

python nltk tokenize

Utilize NLTK in Python to tokenize the word "don't" as "dont"

Whenever I utilize the following: nltk.word_tokenize("cannot") The output I receive is: ["can", "not"] What I am aiming for is: ["cannot"] ...

python nltk

I just finished creating my own classifier using nltk. Now, how do I go about loading it into textblob?

The default classifier in textblob is not very effective as it is trained on movie reviews. To improve its accuracy, I compiled a large dataset of examples specific to my context (57,000 stories categorized as positive or negative) and used nltk for traini ...

python nltk naivebayes textblob

Calculating the edit distance between two fields in a pandas dataframe

I am working with a pandas DataFrame that has two columns of strings. My goal is to add a third column which will calculate the Edit Distance between the values in the first two columns. from nltk.metrics import edit_distance df['edit'] = edit_distanc ...

python string pandas nlp nltk

What's the best way to notify Python that I require decimal numbers?

I am currently working on a program that aims to calculate the number of types, tokens, and determine the type-to-token ratio. However, I am facing a challenge in conveying to Python that the answer to ttr is not an integer. from nltk.corpus import inau ...

python nltk

What is the best way to create a Phrases model using a vast collection of articles (such as those from Wikipedia)?

To enhance the results in topic detection and text similarity, I am eager to create a comprehensive gensim dictionary for the French language. My strategy involves leveraging a Wikipedia dump with the following approach: Extracting each article from frwi ...

python nltk gensim collocation

Newtab Q&A