For the last couple of years I’ve been wrapping my brain around the question of knowledge representation and the decision making process within the ..uh, brain. I’m not fully there yet, but I am close to the conclusion that the basic principles can be modelled using simple probability theory, applied in a hierarchical manner. The deeper the hierarchy, the more intelligent the being, in general.
Conversely, looking at all the research around modelling neurons directly in software for emulating brain like behavior; maybe all that goo in our cranial cavity is actually nature’s way of building something that can do mathematical operations, albeit massively in parallel? Do we really need to model the brain dendrite by dendrite, axon by axon, synapse by synapse to be able to get the same basic functionality in software?
Recently I have been researching and writing code for sentence segmentation, a rather well-beaten topic in the research community, particularly for segmenting chinese sentences into words( chinese is ususally written without any spaces betweeen the words/characters). Since I only wanted to create a generic algorithm( and I don’t understand chinese), Ijusttakearegularenglishsentenceandremoveallthespacesbetweenthewordsasmytestinput. (I just take a regular english sentence and remove all the spaces between the words as my test input.)
The algorithm then figures out where exactly to put the spaces such that the output will make sense…or rather, the “most” sense. For example: consider a fragment of an input “isawhim”. This could be “i saw him”, or it could be “is a whim”. If you knew absolutely nothing else about the context in which this was said, what would you choose? Probably the first one. But if you knew that this fragment was preceeded by an “it”, as in “itisawhim”, then its obvious that the second choice is the better one, because “it i saw him” doesn’t “make sense”. Consider the case of “comealong”. If you knew it was succeeded by “now”, then it would be “come along”, but if it was succeeded by “way”, it would be “come a long”. Usually its not just the directly preceeding or succeeding words which provide the evidence, but many words past and beyond.
In layman terms, the algorithm works by creating a running measure of “goodness” across a streaming input of continuous characters without spaces. It doesn’t need to know a start or an end. At every point it evaluates the thousands of possible valid combinations that could be formed over a large number of characters, and eliminates those which would cause goodness to drop in the future. The goal is not to move in the direction of short term maximum goodness, but rather long term lack of badness. In other words, it will do somewhat bad things to get a good future, short of doing something suicidal.
The “long term” is the key here. In time based terms, some activity happens at the milli-second levels( like interpreting the sounds in sentences we hear as words), some happens at the level of “fractions of a second” (applying meaning to the words as we interpret the sentence), and some at seconds( the meaning of a sentence, the emotion within a sentence ). The important thing to note is that activity at higher levels affects past activity at lower levels. For example, someone starts saying the sentence in a nice tone “jack is a…”, the words “great, nice” etc start flashing in our minds, but then the tone changes, and it goes “..complete jerk”, with sarcasm. At this point, the nice image of jack dissolves, and the sentence is interpreted in a different way.
Taking this up to higher levels, its not difficult to visualize how the task of going from point A to point B allows us to tolerate the nastiness of a bad journey, because the pleasure of reaching is much greater than problems encountered during the journey. Going up a level, point B might not actually be a particularly enjoyable city, but we accept it because it might result in a future career advancement.