hide show

This is the course website for the 2017/2018 academic year. See the current iteration.

Natural Language Processing

Task 05: Named Entity Recognition

Deadline

30th Apr 2018

Submission

Please submit the solution (feature file and report) by email to the lecturer's email address. Use "NLP: Task 05" as the subject.

Description

In this assignment, you will work with the NameTag named entity tagger to train a model that recognizes the "PERSON" entity category. The tool is ready to be used, but your job is to pick relevant features and use appropriate training parameters.

Files

The assignment consists of the following files (you should not have to modify any):

File	Description
data/train	Training data.
data/test	Testing data.
compute_performance.py	A Python script that computes the performance of the model.

Goal

Your task is to train a model to recognize person names in text using the tool NameTag. You will have to:

compile and build the project (see INSTALL),
create a feature template file with suitable features and train a NER model using appropriate parameters (see the user's manual),
compute the performance of your model, and
write a report summarizing your work.

The data you are using is in IOB encoding, which is also the expected output, i.e. the model should learn to classify full names (multi-word entities).

You do not have to write any code in this task.

Report

The report should be a short PDF document (1-3 pages) where you shall include:

chosen features of your model (the feature template file) along with your own description of what the feature codes mean and why they are helpful
all parameters used to train your model (the command line) along with your own description of what they are and how they affect the performance
the evaluation of your model on the test set

Use the compute_performance.py script to compute the precision, recall, and F1 score of your model. The script makes use of the run_ner executable, and is to be used as compute_performance MODEL TESTSET NAMETAG_RUNNER

Example:

$ python compute_performance.py english.model data/test ./run_ner

Grading

Your model should achieve the F1 score of at least 0.86. The grading of the assignment is divided as follows:

10%: compilation of NameTag (if this is the only part of your solution, try to run the czech model available on the project website on some input and write your experience in the report, otherwise you may omit the documentation)
20%: suitable features in the feature template and training parameters
50%: completeness and quality of report
20%: F1 score at least 0.86