Goal: get simple, unambiguous, structured information by analyzing unstructured text
IE systems extract clear, factual information
WikiData link
Professor John Skvoretz, U. of South Carolina, Columbia, will present a seminar entitled "Embedded commitment", on Thursday, May 4th from 4-5:30 in PH 223D.
Place | PH 223D |
Title | Embedded commitment |
Starting time | 4 pm |
Speaker | Professor John Skvoretz |
Goal: find and classify names in text
Goal: find and classify names in text
The decision by the independent MP Andrew Wilkie to withdraw his support for the minority Labor government sounded dramatic but it should not further threaten its stability. When, after the 2010 election, Wilkie, Rob Oakeshott, Tony Windsor and the Greens agreed to support Labor, they gave just two guarantees: confidence and supply.
Goal: find and classify names in text
The decision by the independent MP Andrew Wilkie to withdraw his support for the minority Labor government sounded dramatic but it should not further threaten its stability. When, after the 2010 election, Wilkie, Rob Oakeshott, Tony Windsor and the Greens agreed to support Labor, they gave just two guarantees: confidence and supply.
Goal: find and classify names in text
The decision by the independent MP Andrew Wilkie to withdraw his support for the minority Labor government sounded dramatic but it should not further threaten its stability. When, after the 2010 election, Wilkie, Rob Oakeshott, Tony Windsor and the Greens agreed to support Labor, they gave just two guarantees: confidence and supply.
Person, Organization, DateForeign | ORG | |
Ministry | ORG | |
spokesman | O | |
Shen | PER | Per entity, not per token |
Guofang | PER | |
told | O | |
Reuters | ORG |
Training
Testing
Foreign | B-ORG |
Ministry | I-ORG |
spokesman | O |
Shen | B-PER |
Guofang | I-PER |
told | O |
Reuters | B-ORG |
Beginning, Inside, Last, Other (Outside), Unit
simple representations that encode attributes: length, capitalization, numerals, Greek letters, internal punctuation, etc.
Varicella-zoster | Xx|-xxx |
mRNA | xXXX |
CPA1 | XXXd |
Goal: get simple, unambiguous, structured information out of text
Information extraction in triples
Ľ. Štúr Institute of Linguistics of the Slovak Academy of Sciences is a primary institution in the Slovak Republic that focuses on basic research of standard and non-standard variants of the Slovak language. The institute was established in 1943 and named the Institute of Linguistics of the Slovak Academy of Sciences and Arts (Slovak shortening SAVU), ... The research focuses also on the theoretical questions of general linguistics, language culture, professional terminology and onomastics.
Information extraction in triples
Ľ. Štúr Institute of Linguistics of the Slovak Academy of Sciences is a primary institution in the Slovak Republic that focuses on basic research of standard and non-standard variants of the Slovak language. The institute was established in 1943 and named the Institute of Linguistics of the Slovak Academy of Sciences and Arts (Slovak shortening SAVU), ... The research focuses on the theoretical questions of general linguistics, language culture, professional terminology and onomastics.
Ľ. Štúr Institute… | PART-OF | Slovak Academy of Sciences |
Ľ. Štúr Institute… | LOC-IN | Slovak Republic |
Ľ. Štúr Institute… | FOUNDED-IN | 1943 |
Ľ. Štúr Institute… | EQ | Institute of Linguistics of… |
Slovak Academy of Sciences and Arts | ABBR | SAVU |
Ľ. Štúr Institute… | RSRCH-IN | general linguistics |
Ľ. Štúr Institute… | RSRCH-IN | language culture |
Ľ. Štúr Institute… | RSRCH-IN | professional terminology |
Ľ. Štúr Institute… | RSRCH-IN | onomastics |
UMLS: Unified Medical Language System
Injury | disrupts | Physiological Function |
Bodily Location | location-of | Biologic Function |
Anatomical Structure | part-of | Organism |
Pharmacologic Substance | causes | Pathological Function |
Pharmacologic Substance | treats | Pathologic Function |
Doppler echocardiography can be used to diagnose left anterior descending artery stenosis in patients with type 2 diabetes.
Echocardiography, Doppler | diagnoses | Acquired stenosis |
Goal: get simple structured information out of text
Why?
How?
Formal conceptualization of entities and relations between them
PREFIX ex: <http://example.com/exampleOntology#>
SELECT ?capital ?country
WHERE {
?x ex:cityname ?capital ;
ex:isCapitalOf ?y .
?y ex:countryname ?country ;
ex:isInContinent ex:Africa .
}
Linked data
Agar is a substance prepared from a mixture of red algae, such as Gelidium, for laboratory or industrial use.
Y such as X ((, X)* (, and|or) X) |
such Y as X |
X or other Y |
X and other Y |
Y including X |
Y, especially X |
Richer relations using named entities
Problem
American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said.
American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said.
American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said.
For each relation, do
Start with 5 seed pairs
Isaac Asimov | The Robots of Dawn |
David Brin | Startide Rising |
James Gleick | Chaos: Making a New Science |
Charles Dickens | Great Expectations |
William Shakespeare | The Comedy of Errors |
Organization | Headquarters |
---|---|
Microsoft | Redmond |
Exxon | Irving |
IBM | Armonk |
.69 ORGANIZATION {'s, in, headquarters} LOCATION
.75 LOCATION {in, based} ORGANIZATION
Extract relations from the web; no training data, no list of relations
Tesla invented coil transformer
A lot of work to do…
…but definitely worth it