Gamma: The Learned Classifier
In machine learning and statistics, classification is the problem of identifying a semantic class for an observation, on the basis of a training set of data containing observations (or instances) whose class membership is known.
a learned classifier:
Gamma is a classifier, that given a document, will give us the class that document belongs to (with an associated degree of probability to boot!). Our job is the build the function gamma, which takes a document, and returns a class.
What do we mean by unstructured data?
We limit this to ASCII text. Could also mean scanned documents, images, etc. There is no data about this data – no meta data. The only information we have about this data is contained in the data itself. There are no rows, columns or annotations.
Unstructured Data (or unstructured information) refers to information that either does not have a pre-defined data model and/or does not fit well into relational tables.
Unstructured information might have some structure... [More]
We don't know whether the information we find on the Web is accurate or not . The Dublin Core model describes a resource for the purpose of discovery. The W3C PROV model describes entities and processes involved in producing and delivering that resource. This article introduces the mapping between both models.
Rationale for Mapping DC Terms to PROV:
This mapping gives insight into the different characteristics of both data models (in particular it explains PROV from a Dublin Core point of view).
This mapping can be used to extract... [More]
The Mechanics and Value of an Ontology Model
An Ontology is a "specification of a conceptualization" ( Tom Gruber ). I still don’t understand what this means. This is a difficult definition, and has done little to further the understanding of Ontologies, and how they can help in the enterprise. A far better definition of an Ontology is “a description of things that exist and how they relate to each other” ( Chris Welty ). Ontologies and Natural Language Processing (NLP) can often be seen as two sides of the same coin.
Chain Rule (Probability)
Give any of these word sequences, what is the probability of the next word?
Premature optimization is the root of all ____ -Donald Knuth
A house divided against itself ____ ____ -Abraham Lincoln
The quick brown fox jumped over the ____ ____ _____ -Wm. Shakespeare
A friend to all is ____ ____ ____ ____ -Aristotle
If you were able to complete these word sequences, it was likely from prior knowledge and exposure to the complete sequence.
Not all word sequences are this obvious. But for... [More]
Spatial Relations through Prepositions
What is a preposition?
Prepositions are function words and characteristically express spatial relations. Like any function word, prepositions are important for the structure they bring to a sentence.
A function word has little semantic content of its own and chiefly indicates a grammatical relationship. The extraction of semantics (that is, meaning) from English text is chiefly served by examination of content words. Function words serve an important role in determining the structure of the... [More]
Linking verbs are about characteristics. Linking verbs do not express actions. Instead, they connect the subject of the verb to additional information about the subject.
In contrast to intranstives , linking verbs cannot end sentences, nor can they be followed immediately by adverbs. These must be followed by either nouns or adjectives, one or the other; those nouns and adjectives may be single words or multiple-word phrases. In addition, linking verbs constitute a small class of probably no more than a few dozen... [More]
Two-Place Transitive Verbs (VC)
The second two-place transitive verb construction is similar in that the action of the verb is divided equally in two places. Constructions with VC verbs are often more subjective in nature. The ability to express the context of the action (through reification) is a must in this case.
Given the business scenarios expressed in the VG section, the loss of one of the triples would have meant the loss of important context. But it would not have impacted the objective truth of what remained.
Triple Extraction from Verbs
Verbs are basic to sentences. Verbs determine the other constituents (of the sentence), and define the relationship between these constituents. A verb can tell you that a certain noun phrase functions as a subject, or that another other noun phrase functions as an object, or a predicate noun, or perhaps an object complement.
There are six types of verbs that define core sentences. These verbs are differentiated by two criteria:
The constituent that follows immediately to its right.
The relationship... [More]
Two-Place Transitive Verbs (Vg)
When we looked at transitive verbs, we saw that action was passed to a single object. We might refer to this as a “one-place” transitive verb as in, all the action passes to one place. In a two place transitive verb, we might reasonably infer that the action passes to two places. Vg transitive verbs are followed immediately by two noun phrases. Each of these noun phrases are equal recipients of the action performed by the subject.
The school board NP:Subj gave Vg a raise NP:DObj to the teachers... [More]
The sentence may be a truncated three words or include a lengthy description, but the action never moves beyond the subject. To use the old riddle, there is no one else in the sentence to hear if the lone tree falling in the forest makes a sound.
An intransitive verb can be used to end the sentence.
The mayor spoke .
The president resigned .
The dog barked .
Intransitive verbs may be followed by an adverb (something that adds to a verb):
The mayor spoke convincingly .
A transitive verb demonstrates an action that passes from one entity to another. We might say that the first entity (the subject) commits the action on the second entity (the object). The action is what links the subject to the object.
This connection between a subject and an object is considered a triple. The action passes over from the subject to the object. Transitive verbs lend themselves well to triple extraction from unstructured text.
The performer of the action is not always the subject of the verb.... [More]