Natural Language Processing

Rule Based Matching in Spacy

Rule based matching is a very useful feature in Spacy. It allows you to extract the information in a document using a pattern or a combination of patterns.

I will use the Obama speech in http://obamaspeeches.com/ as illustration. I would like to extract the number of times Obama said “America” in this speech. You can use rule based matcher in Spacy to parse the text and extract the information as follows:

from spacy.matcher import Matcher 
nlp = spacy.load("en_core_web_sm")

matcher = Matcher(nlp.vocab)
pattern = [{"TEXT": "America"}]
matcher.add("Obama",[pattern])

text = open('obama.txt').read()
doc = nlp(text)
matches = matcher(doc)
count = 0
for _ in matches:
    count = count +1
print("No of times Obama used America is ",count)
Output:
No of times Obama used America is 10

References:

Relevant Courses

May 23, 2021