30th Apr 2018
Please submit the solution (feature file and report) by email to the lecturer's email address. Use "NLP: Task 05" as the subject.
In this assignment, you will work with the NameTag named entity tagger to train a model that recognizes the "PERSON" entity category. The tool is ready to be used, but your job is to pick relevant features and use appropriate training parameters.
The assignment consists of the following files (you should not have to modify any):
File | Description |
---|---|
data/train | Training data. |
data/test | Testing data. |
compute_performance.py | A Python script that computes the performance of the model. |
Your task is to train a model to recognize person names in text using the tool NameTag. You will have to:
The data you are using is in IOB encoding, which is also the expected output, i.e. the model should learn to classify full names (multi-word entities).
You do not have to write any code in this task.
The report should be a short PDF document (1-3 pages) where you shall include:
Use the compute_performance.py script to compute the precision, recall, and F1 score of your model. The script makes use of the run_ner executable, and is to be used as compute_performance MODEL TESTSET NAMETAG_RUNNER
Example:
$ python compute_performance.py english.model data/test ./run_ner
Your model should achieve the F1 score of at least 0.86. The grading of the assignment is divided as follows: