This is the course website for the 2017/2018 academic year. See the current iteration.
.

Natural Language Processing

Task 05: Named Entity Recognition

Deadline

30th Apr 2018

Submission

Please submit the solution (feature file and report) by email to the lecturer's email address. Use "NLP: Task 05" as the subject.

Description

In this assignment, you will work with the NameTag named entity tagger to train a model that recognizes the "PERSON" entity category. The tool is ready to be used, but your job is to pick relevant features and use appropriate training parameters.

Files

The assignment consists of the following files (you should not have to modify any):

FileDescription
data/trainTraining data.
data/testTesting data.
compute_performance.pyA Python script that computes the performance of the model.

Goal

Your task is to train a model to recognize person names in text using the tool NameTag. You will have to:

  1. compile and build the project (see INSTALL),
  2. create a feature template file with suitable features and train a NER model using appropriate parameters (see the user's manual),
  3. compute the performance of your model, and
  4. write a report summarizing your work.

The data you are using is in IOB encoding, which is also the expected output, i.e. the model should learn to classify full names (multi-word entities).

You do not have to write any code in this task.

Report

The report should be a short PDF document (1-3 pages) where you shall include:

Use the compute_performance.py script to compute the precision, recall, and F1 score of your model. The script makes use of the run_ner executable, and is to be used as compute_performance MODEL TESTSET NAMETAG_RUNNER

Example:

$ python compute_performance.py english.model data/test ./run_ner

Grading

Your model should achieve the F1 score of at least 0.86. The grading of the assignment is divided as follows:

Download files