Gensim's Word2Vec is throwing an error: ValueError - Section header required before line #0

Hello everyone! I am diving into the world of Gensim Word2Vec and could use some guidance. My current task involves using Word2Vec to create word vectors for raw HTML files. To kick things off, I convert these HTML files into text files.

Question Number One:

During the training of the Word2Vec model, everything seems to be running smoothly. However, when I attempt to test the accuracy of the model using

model.accuracy(file_name)

An error pops up:

Traceback (most recent call last):
  File "build_w2v.py", line 82, in <module>
    main()
  File "build_w2v.py", line 77, in main
    gen_w2v_model()
  File "build_w2v.py", line 71, in gen_w2v_model
    accuracy = model.accuracy(target)
  File "/home/k/shankai/app/anaconda2/lib/python2.7/site-packages/gensim/models/word2vec.py", line 1330, in accuracy
    return self.wv.accuracy(questions, restrict_vocab, most_similar, case_insensitive)
  File "/home/k/shankai/app/anaconda2/lib/python2.7/site-packages/gensim/models/keyedvectors.py", line 679, in accuracy
    raise ValueError("missing section header before line #%i in %s" % (line_no, questions))
ValueError: missing section header before line #0

Here is a snippet from the sample file:

(sample content that needs help with)

Upon inspecting the file, it appears to start with multiple unnecessary spaces or newline characters. When viewed in Vim, it presents as shown in this image.

What could possibly be causing this issue?

Second Question:

Additionally, I am working on text classification for biomedical papers which are provided to me in raw HTML format, either in Japanese or English. After converting them to ASCII and cleaning out stop words, I still encounter various remnants of HTML code within the files.

In an attempt to tidy up the files by restricting characters to [a-zA-Z0-9], I noticed that certain medical terms like [4protein...] are not getting cleaned properly.

Any suggestions on how to effectively clean up these files?

Answer №1

When using the accuracy() function, make sure to provide a set of analogies that follows the format of the questions-words.txt file included in the word2vec.c distribution. It is important that you do not use your own custom file for this purpose.

Answer №2

It seems like when you retrieve this document, you are utilizing a software library Due to this, the file is not in its original text format. Make sure to obtain the authentic raw text file!

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Python code to conduct a test for deleting data from a MongoDB

Having trouble with mongo? After running my automation tests, I need to delete all the data and objects I created. I wrote a script to do this by deleting rows from multiple tables. However, when I try to run the script, it doesn't start. What could b ...

Vanishing Span in CSS3 Flexboxes

I came across this particular HTML code: <div class="panel panel-info"> <div class="panel-heading">Person</div> <div class="panel-body"> <div class="data"> <p class="keys"> ...

What is the method for merging multiple Django querysets without combining them?

Having recently started working with the Django framework, I'm facing an interesting challenge that may be of interest to more experienced developers here. Here is the django model in question: STATUS_CHOICES = ( ('CL', 'CLOSED&apo ...

Challenges with CSS in Google Chrome web browser (Input field and Submit button)

Having trouble with the alignment of a textbox and button on Chrome, they're not displaying correctly. Here are examples from different browsers; notice how the textbox in Chrome is misaligned. Internet Explorer/Firefox https://i.stack.imgur.com/mf ...

Expanding Items' Widths in ItemsControl on Silverlight

I am facing an issue with my ItemsControl where the child items are not occupying the whole width of the control: My goal is to stretch the green bits to fill the width of the control, similar to how the blue bits are shown. I have attempted adjusting th ...

Generate a responsive list with a pop-up feature under each item (using Vue.js)

Currently, I believe that Vue may not be necessary since most tasks can be done using JavaScript and CSS. I am attempting to design a list layout as follows: [A] [B] [C] [D] When an item is clicked, the information about that specific item should ...

Adaptable placement of background across screen widths

I am facing an issue with three buttons that have the same height and width, text, and background icons on the right. On small screens or when there is only a single word in the button, I want to move the icons to the bottom center of the button. Is there ...

Dividing Strings Using a Combination of Dictionaries in Python

So I have successfully managed to extract data from the Google Financial API for single stock quotes, but I'm encountering issues when trying to fetch information for multiple stock quotes. The json loads function is not cooperating with multiple dict ...

Dragging the identical li element from the div for the second time should not trigger a drag and drop action

After successfully dragging and dropping an element from <div id="catalog" > into a box, specifically <div id="dialogIteration">, on the first attempt everything works as expected. However, upon attempting to drag and drop the same element for ...

The right column in a relatively positioned layout descends as an element in the left column is revealed

Currently in the process of constructing a registration form for our website, everything is going smoothly with one minor hiccup. The form consists of two columns where users enter their information. Initially, when designing elements for the first (left) ...

Animate an image to the right when clicked, then return it to the left with a second click

Seeking help with animating a set of images to move individually 300px right on first click, and then 300px left when clicked again. I'm currently facing an issue where my code is not working. It could be due to A) syntax errors or B) the images not ...

Transform the page into a Matrix-inspired design

I decided to transform the appearance of my web pages into a stylish The Matrix theme on Google Chrome, specifically for improved readability in night mode. To achieve this goal, I embarked on the journey of developing a custom Google Chrome extension. The ...

What is the best method to position a modal in the exact center of the screen?

Is there a way to position the modal at the center of the screen? I have tried implementing this HTML and JavaScript code. Interestingly, it works fine in Chrome console but fails when I try to refresh the page. $('.modal').css('top', ...

Step-by-step guide on integrating node.js and MySQL to store data from an online form in a database

Currently, I am attempting to insert data into a MySQL database using node.js by clicking the submit button. However, an error message has appeared and despite understanding it somewhat, I am unsure of how to proceed. Any assistance would be greatly apprec ...

Launch a bootstrap modal from a different webpage

If you're looking to open multiple modals with different content displayed from HTML files, check out this example below: <div id="how-rtm-works" class="modal hide fade" tabindex="-1" role="dialog" aria-labelledby="myModalLabel" aria-hidden="true" ...

Revamping the login interface for enhanced user

Whenever I attempt to login by clicking the login button, there seems to be an issue as it does not redirect me to any other page. Instead, I am left on the same page where I initially clicked the button. The intended behavior is for users to be redirected ...

The feature to hide columns in Vue-tables-2 seems to be malfunctioning

The issue I'm facing is with the hiddenColumns option not working as expected. Even when I set it to hiddenColumns:['name'], the name column remains visible. I've updated to the latest version, but the problem persists. UPDATE I am tr ...

Validation of default values in contact forms

Obtained a free contact form online and hoping it works flawlessly, but struggling with validation for fields like "Full Name" in JS and PHP. Here is the HTML code: <input type="text" name="contactname" id="contactname" class="required" role="inpu ...

Utilizing HTML5 Canvas for Shadow Effects with Gradients

Surprisingly, it seems that the canvas API does not support applying gradients to shadows in the way we expect: var grad = ctx.createLinearGradient(fromX, fromY, toX, toY); grad.addColorStop(0, "red"); grad.addColorStop(1, "blue"); ctx.strokeStyle = gra ...

Finding the maximum value across all axes except for the first one

Looking for a way to find the argmax over all axes except the first in a numpy array. I have come up with a solution, but I'm curious if there is a more efficient method. import numpy as np def argmax(array): ## Argmax along all axes except the ...