A great deal (if not all) Natural Language Processing is done these days using some kind of Finite State Machine (FSM). It's pretty simple (really, it is indeed simple). As the name implies, there are a fixed number of possible states in some system. A current state is determined by past states of the system. As such, it can be said to record information about the past, i.e., it reflects the input changes from the system start to the present moment.
Noam Chomsky claimed in Syntactic
Structures (1957) that a human "grammar" ( which
is now referred to as CHL, for Computation of Human Language (Chomsky
(1995)) could not be an FSM. This view is, of course, quite
challenging to the thousands of men and women who work daily to write
NLP software using an underlying FSM. The crux of the argument is
Principle: For a computational device to be Markovian, it can only make reference to the current state that the device is in, when deciding what the next state of the device can be; it cannot, for example, make reference to alternative states, earlier states, future states, or , as a consequence of its being a formal system, factors outside of the computational device. (cf. Mark Baltin, NYU http://linguistics.as.nyu.edu/docs/IO/2637/KASELL-paper.pdf)
There are some (many) very technical (empirical) arguments against FSM is the underlying mechanism for CHL - but, let's not worry about those (of course, it behooves anyone who wishes to engage in a discussion of these matters to know that a great deal of thought has gone into this, and perhaps it would be better to be informed of this, than not). If we look at the requirements with a simple eye, it really says that for us to create a unique sentence in our language - which we do all the time - our brain can not "look at" anything except what has come before it, it seems a bit limiting. I mean, it is easy to come up with a million sentences which have knowledge of universal ideas, derived meanings from the lexicon, and even what happens next.
I have had a revelation lately: it is not wise to debate this point in my filed. But the reason it is not wise is NOT that it is contentious, and will cause people to dislike me (even though, hey, it was Chomsky's idea, not mine). There is a better reason: it doesn't matter. The principles behind Chomsky's argument are valid for the performance of NLP software, regardless of what machine these software systems use!
Principle: The issue of how much information is available to the grammar, viewed as a computational device that computes structures, is called the issue of computational complexity. The computational powers of various grammars, and the capacity of recognition devices to characterize as licit or not the structures that they generate, has been the province of mathematical linguistics (cf. Mark Baltin, NYU http://linguistics.as.nyu.edu/docs/IO/2637/KASELL-paper.pdf)
In the midst of all that fine verbiage is the simple notion that the computational power of your NLP system is a direct function of the amount of information you can provide to it, and, likewise, the performance of your NLP system is directly limited by the amount of information you provide to it.
The revelation, then, is as follows: if we are going to make non-Markovian grammars, since we will need to have references to anything outside the immediately previous state, then we will have to find out how to do this with utter efficiency, as it will increase the computational complexity (and therefore reduce the performance) exponentially (in the mathematical sense). I think we can all agree on that.
But we can not avoid the deficiency in a system which can have no external reference. It will simply not come close to human language. Even FSM advocates must agree on that.
But it doesn't mean we're in trouble. Just means we will have to be clever. It may be possible to leave the "grammar" - i.e. the decision mechanism - as an FSM, requiring the lowest computational power, but combine this with other FSMs, also efficient, that derive other relevant states to that of the "grammar". The end result, then, could be a federation of FSM outputs, also generated by an FSM.