I am currently posting at another site (title)
Gamma: The Learned Classifier
In machine learning and statistics, classification is the problem of identifying a semantic class for an observation, on the basis of a training set of data containing observations (or instances) whose class membership is known.
a learned classifier:
Gamma is a classifier, that given a document, will give us the class that document belongs to (with an associated degree of probability to boot!). Our job is the build the function gamma, which takes a document, and returns a class.
What do we mean by unstructured data?
We limit this to ASCII text. Could also mean scanned documents, images, etc. There is no data about this data – no meta data. The only information we have about this data is contained in the data itself. There are no rows, columns or annotations.
Unstructured Data (or unstructured information) refers to information that either does not have a pre-defined data model and/or does not fit well into relational tables.
Unstructured information might have some structure... [More]
What's the point?
I am having a hard time here. I get triples. I get that I want to work with a collection of triples. But what are the main, important differences between a Model, DataSource, Dataset, Graph, and DataSetGraph? What are their lifecycles? Which are designed to be kept alive for a long time, and which should be transient? What triples get persisted, and when? What stays in memory? And how are triples shared (if at all) across these five things? I'll take an RTFM answer, if there's a simple summary that... [More]
The process of segmenting running text into words and sentences.
Electronic text is a linear sequence of symbols (characters or words or phrases). Naturally, before any real text processing is to be done, text needs to be segmented into linguistic units such as words, punctuation, numbers, alpha-numerics, etc. This process is called tokenization.
In English, words are often separated from each other by blanks (white space), but not all white space is equal. Both “Los Angeles” and “rock 'n' roll” are individual thoughts... [More]
Jena Schema Generator
There is a useful script that comes bundled with Jena.
On Windows, this is
On Linux this is
The Schema Generator can be used to generate a Java class file with all the Ontology classes and properies defined within the Ontology model. A complete HOWTO can be found on the Jena site .
Usage is simple:
schemagen -i MyOntology.owl -o MyJavaFile.java
All the Ontology Class types are defined as Jena Resource object instances, and all the Ontology Predicate... [More]
We don't know whether the information we find on the Web is accurate or not . The Dublin Core model describes a resource for the purpose of discovery. The W3C PROV model describes entities and processes involved in producing and delivering that resource. This article introduces the mapping between both models.
Rationale for Mapping DC Terms to PROV:
This mapping gives insight into the different characteristics of both data models (in particular it explains PROV from a Dublin Core point of view).
This mapping can be used to extract... [More]
Risk management is designed to reduce or eliminate the risk of certain kinds of events happening or having an impact on the business. Risk management is a growth area for companies. In many cases, no longer a best practice but a regulatory requirement. Programs are being expanded around how to manage risk. Many systems are engaged, and need to be, and a lot of owners in different places.
Top 5 Types of Risk:
CEOs worldwide believe regulatory and... [More]
The Mechanics and Value of an Ontology Model
An Ontology is a "specification of a conceptualization" ( Tom Gruber ). I still don’t understand what this means. This is a difficult definition, and has done little to further the understanding of Ontologies, and how they can help in the enterprise. A far better definition of an Ontology is “a description of things that exist and how they relate to each other” ( Chris Welty ). Ontologies and Natural Language Processing (NLP) can often be seen as two sides of the same coin.
Chain Rule (Probability)
Give any of these word sequences, what is the probability of the next word?
Premature optimization is the root of all ____ -Donald Knuth
A house divided against itself ____ ____ -Abraham Lincoln
The quick brown fox jumped over the ____ ____ _____ -Wm. Shakespeare
A friend to all is ____ ____ ____ ____ -Aristotle
If you were able to complete these word sequences, it was likely from prior knowledge and exposure to the complete sequence.
Not all word sequences are this obvious. But for... [More]
Spatial Relations through Prepositions
What is a preposition?
Prepositions are function words and characteristically express spatial relations. Like any function word, prepositions are important for the structure they bring to a sentence.
A function word has little semantic content of its own and chiefly indicates a grammatical relationship. The extraction of semantics (that is, meaning) from English text is chiefly served by examination of content words. Function words serve an important role in determining the structure of the... [More]
Linking verbs are about characteristics. Linking verbs do not express actions. Instead, they connect the subject of the verb to additional information about the subject.
In contrast to intranstives , linking verbs cannot end sentences, nor can they be followed immediately by adverbs. These must be followed by either nouns or adjectives, one or the other; those nouns and adjectives may be single words or multiple-word phrases. In addition, linking verbs constitute a small class of probably no more than a few dozen... [More]
Two-Place Transitive Verbs (VC)
The second two-place transitive verb construction is similar in that the action of the verb is divided equally in two places. Constructions with VC verbs are often more subjective in nature. The ability to express the context of the action (through reification) is a must in this case.
Given the business scenarios expressed in the VG section, the loss of one of the triples would have meant the loss of important context. But it would not have impacted the objective truth of what remained.
Triple Extraction from Verbs
Verbs are basic to sentences. Verbs determine the other constituents (of the sentence), and define the relationship between these constituents. A verb can tell you that a certain noun phrase functions as a subject, or that another other noun phrase functions as an object, or a predicate noun, or perhaps an object complement.
There are six types of verbs that define core sentences. These verbs are differentiated by two criteria:
The constituent that follows immediately to its right.
The relationship... [More]
Two-Place Transitive Verbs (Vg)
When we looked at transitive verbs, we saw that action was passed to a single object. We might refer to this as a “one-place” transitive verb as in, all the action passes to one place. In a two place transitive verb, we might reasonably infer that the action passes to two places. Vg transitive verbs are followed immediately by two noun phrases. Each of these noun phrases are equal recipients of the action performed by the subject.
The school board NP:Subj gave Vg a raise NP:DObj to the teachers... [More]