Figure reference parsing using regular expressions

I'm currently working on developing a regular expression to extract references to figures within a text. The goal is to match the following scenarios: Fig* 1, 2, and 3 (not limited to just 3, any number)

Fig* 1-3

Fig* 1 and 2

Fig* 1

Fig* 1 to 4

My attempt at crafting a regex pattern looks like this:

(Fig[a-zA-Z.]*)\s(\d(,|\s)* )+|\d\s|and\s\d|\s\d-\d|\s\d)*

The ideal outcome would be to have each individual number captured separately, although I can work with the match result to clean it up and parse the numbers appropriately. However, I am encountering difficulties in extracting "1 to 4" using this regex pattern, and it doesn't seem optimized. Any suggestions?

You can find a sample at:

Answer №1

Check out this solution:

(Fig.*) ((\d( through | or |-)\d)|\d)|(\d,\d and \d)

Answer №2

Here is a helpful pattern for you:

(Fig(?:ures?|s\.)) (\d+(?:(?:-|, | (?:and|to) )\d+)*)

If you want to make it more flexible, try using \h+ or \h* instead of spaces.

Answer №3

update:
It appears that my initial regex attempt did not yield the desired result.

In an effort to rectify this, I present two alternative solutions that have been verified to work -

1. Implementation of Multi-Line mode - This method utilizes the \G anchor to ensure a well-formatted and concise output suitable for array manipulation.

# '/(^Fig[a-zA-Z.]*\h+|(?!^)\G)(?(?<=\d)\h*,\h*)(\d+)(?|\h*(-)\h*(\d+)|\h+(and)\h+(\d+)|\h+(to)\h+(\d+))?/'
( # (1 start)
    ^ Fig [a-zA-Z.]* \h+ # Fig's
  | # or,
    (?! ^ ) # Start at the end of last match
    \G
) # (1 end)

(?(?<= \d ) # Conditional, if previous digit
    \h* , \h* # Require a comma
) # End conditional

(\d+) # (2), Digit
(?| # Branch reset (optionally, one of the (-|and|to) \d forms)
    \h*
    ( - ) # (3), '-'
    \h*
    ( \d+ ) # (4), Digit
  | \h+
    ( and ) # (3), 'and'
    \h+
    ( \d+ ) # (4), Digit
  | \h+
    ( to ) # (3), 'to'
    \h+
    ( \d+ ) # (4), Digit
)?

Perl test scenario

$/ = undef;
$str = <DATA>;
while ($str =~ /(^Fig[a-zA-Z.]*\h+|(?!^)\G)(?(?<=\d)\h*,\h*)(\d+)(?|\h*(-)\h*(\d+)|\h+(and)\h+(\d+)|\h+(to)\h+(\d+))?/mg)
{
length($1) ?
print "'$1'\t'$2'\t'$3'\t'$4'\n" :
print "'$1'\t\t'$2'\t'$3'\t'$4'\n" ;

}
__DATA__
Figs. 1, 2, 3 and 4
Figures 1, 2
Figs. 1 and 2
Figure 1-3
Figure 1 to 3
Figure 1

Result >>

'Figs. ' '1' '' ''
'' '2' '' ''
'' '3' 'and' '4'
'Figures ' '1' '' ''
'' '2' '' ''
'Figs. ' '1' 'and' '2'
'Figure ' '1' '-' '3'
'Figure ' '1' 'to' '3'
'Figure ' '1' '' ''

2. Utilizing Multi-Line mode - This pattern captures entire lines, with group 1 containing 'Figs', group 2 encompassing all numerical representations

# '/^(Fig[a-zA-Z.]*\h+)((?(?<=\d)\h*,\h*|\d+(?:\h*-\h*\d+|\h+and\h+\d+|\h+to\h+\d+)?)+)\h*$/'

^
(Fig [a-zA-Z.]* \h+ ) # (1), Fig's
( # (2 start), All the num's
    (?(?<= \d ) # Conditional, if previous digit
        \h* , \h* # Require a comma 
      | # or
        \d+ # Require a digit
        (?: # (and optionally, one of the \d (-|and|to) \d forms)
            \h* - \h* \d+
          | \h+ and \h+ \d+
          | \h+ to \h+ \d+
        )?
    )+
) # (2 end)
\h*
$

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Utilize Regex to isolate text between specified string markers

I am attempting to scrape a website and I need to extract the JSON data from the data variable in the JavaScript code below using Python Regex. <script type="text/javascript"> P.when('A').register("ImageBlockATF", function(A){ var data ...

Searching within MongoDB documents containing nested objects using PHP queries

Here is a snippet from one of the documents in my mongodb collection: { "_id" : ObjectId("561e0de61c9218b7bf9877c3"), "Date" : NumberLong(20151014), "Hour" : NumberLong(10), "ProductId" : ObjectId("5614ba9c2e131caa098b4567"), "ProductN ...

Retrieve the range of dates starting from the day of birth

I am currently working on developing a peers page for my website. I need assistance in creating a query that can retrieve users from the users table, which is organized as id, name, gender, dob. The query should fetch users based on their date of birth i ...

Perform an ajax POST call to interact with a RESTful API

After completing a tutorial, I successfully created a PHP Slim framework based API that allows for user registration and login with authentication: URL /register Method POST Params name, email, password The '/register' call does not requi ...

The sorting icon cannot be substituted with jQuery, AJAX, or PHP

Currently, I am working on implementing "sort tables" using ajax, jquery, and PHP. The sorting function is functioning correctly; however, I need to show/hide the "sorting images". At the moment, only one-sided (descending) sorting is operational because I ...

Tips for invoking or triggering the Ajax change function when there is only a single option available

<select class="custom-select custom-select-sm mb-3" required="true" id="sel_block"> <option value="0">Select Block From Here</option> <?php // Fetch Blocks $sql_block = "SELECT * FROM blocks WHER ...

Having issues with regEX functionality in an Angular form

I need to validate a phone number using regEX. My criteria is as follows: 10 digits alpha/numeric, where an Alpha CHAR is in the 4th position (excluding hyphens). For example: 586R410056  NNN ANN NNNN  (NNN) ANN NNNN  NNN-ANN-NNNN  (NNN) AN ...

Issue with Javascript form submission leading to incorrect outcomes

When setting the form action to a text retrieved from the database with an ID, I encountered a problem where it always displays the first ID even when clicking on text holding ID=2. Upon checking the page source, the correct IDs are shown for all texts. B ...

What is the best way to determine the number of non-empty entries within a PHP array?

Consider: [uniqueName] => Array ( [1] => uniqueName#1 [2] => uniqueName#2 [3] => uniqueName#3 [4] => uniqueName#4 [5] => [6] => ...

Guide to identifying a particular keyword within a vast database of text

I've been grappling with this problem for a day now, focusing on the PHP + MYSQL aspect. However, due to the large amount of data, most of the scripts I've attempted have timed out. Our database consists of two tables: People with approximatel ...

Troubleshooting: Executing CD command in PHP on Ubuntu

Can you help me figure out why my script fails to execute the exec() command for changing directories? $test = exec('cd /var/www/mywebsite.com/mywebsite.com', $output); echo $test; $ls = exec('ls'); echo $ls; ...

Using AJAX to submit a single form

Within my HTML view, I have multiple forms that are displayed using a PHP foreach loop. One of the form examples is as follows: <form method="POST" class="like-form-js"> <input type="hidden" name="post_id" value="<?= $post['i ...

I am experiencing a JSON parse error while making an AJAX request after upgrading to Codeigniter 3.x. What could be the reason behind this

I am currently in the process of updating my Codeigniter framework from version 2.2.6 to 3.0.6, and unfortunately, this update has caused some previously working code to break. One specific error that I am encountering is "SyntaxError: JSON.parse: unexpect ...

Is there a way to retrieve particular information from an array in the Facebook Graph API?

I have successfully transformed my object data into an array and now I am facing some difficulties in extracting specific parts from the multidimensional array. Any kind of assistance would be highly appreciated, thank you. /* SDK version 4.0.0 written i ...

What causes reCaptcha to malfunction on my WordPress comments template?

As I work on my first WordPress theme, I am making efforts to prevent spam in the comment section. I have followed all the instructions provided by the Google PHP API, and while the reCAPTCHA appears correctly, I am facing issues with validation. It seems ...

Retrieve the specific data from the database when the <tr> element is hovered over

Hey everyone, I've been struggling with a problem for some time now. I have a loop that retrieves values from a database and I want each value to display onmouseover using JavaScript. However, it's only showing the value of the first row for all ...

Is there a way for me to determine the value or array that has been passed in the form?

I am facing a challenge and need help solving it. I have a project that involves interacting with a database, where I created a CRUD HTML table for data manipulation. There are 15 tables in the relative database, and I need to insert, delete, and edit reco ...

Issue with PHPMailer on localhost; Error 503 received after clicking "submit" on the domain, email delay of 4 minutes experienced

I've been troubleshooting this issue for two days now. The code runs smoothly on my xampp Apache localhost, sending emails immediately. However, when I try to submit the form on my domain, it just keeps loading and eventually times out with a "503 Ser ...

Scraping Websites with SimpleHTMLDom in PHP

I am struggling to extract specific data from a table on a website page, particularly the columns name, level, and experience. The table's alternating row colors (zebra pattern) are complicating this task. I have implemented SimpleHTMLDom for this pu ...

When the statement is enclosed within a while loop, an empty tag is not

Working on a custom website for a non-profit organization and I'm stuck on the search page. Here's what's going on... The search results are showing up, but when there are no results, nothing is displayed. I can't seem to figure out t ...