Natural Language Processing
Projects
Overview
To turn in a successful project you are required to
- pick an interesting NLP / computational linguistics problem of your choice
- pick a solution and implement it
- evaluate your solution using an appropriate corpus
- write a report describing your solution
You may work individually or in a team of up to four people.
The expected size of the project is about the size of a programming assignment plus the report.
Deadlines
- 21st May: project definition
- 3 days before the end of terms period (29th May or 26th June): project submission
Report
The report is a 5-15 page long document in form of a scientific paper, which will be used as the most important factor for your grading. The document should contain:
- title page with the name of the author(s)
- abstract (1-2 paragraphs) summarizing the project aim, methods, and results
- introduction: the description of the task, the source of the corpus (dataset), corpus description (train/test set sizes, whether it is pure text or has attributes, description of attributes))
- description of used methods and why you've decided to use them
- description of technical implementation details: which programming language and libraries were used and why, which problems you ran into and how you solved them
- experimental evaluation: methods of evaluation of performance of your solution, which parameter values were used in your model and comparisons with different values, comparisons with existing solutions on same/similar data, tables, plots, their interpretations and findings
- conclusion: whether the project was successful or not, why, (hypothetical) future work
The report does not have to be overly complex or long. Try to describe each of the points concisely, length alone will not be considered during grading.
Grading
- appropriate choice of problem: 20%
- appropriate choice of methods used: 30%
- quality of dataset, implementation, and report: 50%
Submission
Please submit the solution (implementation, report, and data, or a way to acquire the data) by email to the lecturer's email address. Use "NLP: Project" as the subject.
Notes
- the corpus may be of arbitrary natural language
- the report may be written in English or Slovak
- you may use existing libraries (for ML / NLP algorithms etc.) in your implementation, but you have to describe the parts that were used, which parameters were used and what role they play in the solution
- if unsure about what problem to pick, contact the lecturer
- interesting datasets can be found at https://github.com/niderhoff/nlp-datasets
Example Problems
- automated annotation of documents (e.g. genre, domain, date of writing, age of author)
- named entity recognition / relation extraction in interesting domains, such as movie industry
- POS tagging / parsing on Slovak language
- using information extraction to build knowledge graphs
- using word sense similarity to clear redundancy in knowledge graphs