from The Rational Edge: Desquilbet applies UML diagrams to analyze the structure of an popular constructed language.

Share:

J. Desquilbet, Staff, IBM, Software Group

J賴me DesquilbetJ. Desquilbet joined Rational France eight years ago after working as a developer for Apex and Ada, using the Booch method. Now, as a software engineering consultant, he enjoys helping clients implement RUP and other IBM Rational tools in a variety of software development environments. Linguistics is one of his special interests.



05 February 2004

Illustration
coi. rodo .i mi'e jexOm.
Translation: "Hi everybody, my name is Je´rôme."

Lojban (pronounced "LOZH-bahn") 1 is a constructed language that Dr. James Cooke Brown began developing in 1955, and development work has continued since then, led by hundreds of workers and supporters. Lojban is intended for human communication and perhaps human-machine communication in the future. Its goals are to be culturally neutral, base its grammar on logical principles and consistent rules, use phonetic spellings to create unambiguous sounds and words, and be easy to learn. Lojban has 1,350 root words that can be combined easily to form millions of different words.

Lojban differs in structure from other languages in major ways and was developed as a test vehicle for scientists studying relationships among language, thought, and culture.This article explains how I used UML diagrams to help myself understand Lojban.

Constructing languages

Designing a new language for human communication or studying one is a fascinating activity. In addition to the well-known Esperanto, 2 there are thousands of artificial languages (also known as constructed languages, or "conlangs"). Some are "art languages," used only by their inventors or a special community. 3 Tolkien's Elvish languages 4 and Klingon, 5 spoken by the aliens of Star Trek, fall into the art language category.

Lojban (Lojban means "Logical Language" in Lojban) began as an experiment to test the Sapir-Whorf Hypothesis, or SWH. 6 (Lojban was called Loglan at that time, in 1955). The SWH has "strong" and "weak" variants. The strong variant holds that language actually shapes the way we think and determines what we can think about. Taken to its most negative extreme, this implies that the limits of the language we speak are the limits of the world we inhabit. The weaker formulation posits that the language spoken by a linguistic community has an influence on the community's culture (i.e., on what the community does and thinks). I discuss the SWH more thoroughly in the Appendix.

Today, the Logical Language Group working on Lojban has departed from this original objective and has grown Lojban as an "engineering language," a category of constructed languages. Lojban is a beautifully designed language and does not look like any spoken or written natural language. My explanations in this article may make it seem more complex than it really is, because they are very brief and not very progressive. Of course, an actual Lojban text or dialogue would use many grammatical features that I won't have space to be describe here. If you are interested in Lojban, read chapter 2, "A Quick Tour of Lojban Grammar, with Diagrams," in The Complete Lojban Language book, (see References) to get a better feel for the language. You can also check out the documents and lessons at the www.lojban.org Web site. The Lojban community is really friendly to beginners; feel free to ask questions on the mailing lists.


Learning Lojban grammatical concepts

Lojban grammar is rather unusual and can best be explained using Lojban terms. (Note: These terms are always written as invariable nouns; in Lojban plurals are not denoted with a suffix such as "s.")

A standard Lojban sentence, which is called a bridi, expresses an idea or assertion. In English, all of the following sentences, although built from different grammatical entities, also express assertions, which can be paraphrased into relationships:

1. I am your father [to be + noun] = to-be-father-of with {father => I, child => you}
2. You are big [to be + adjective] = to-be-big with {who-is-big => you}
3. I go to Paris [active verb] = to-go with {goer => I, destination => Paris}
4. I give you this [active verb] = to-give with {donor => I, gift => this, beneficiary => you}
5. That is green [to-be + adjective] = to-be-green with {what is green => that}
6. You are a cat [to-be + article + noun] = to-be-a-cat with {what/who is a cat => you}

Lojban pronunciation
sounds like
u/oo/ as in "look"
o/o/ as in "show"
c/sh/ as in "show"
g/g/ as in "god"
s/ss/ as in "sun" (it never sounds like /z/)
j/j/ as in the French "bonjour," or /s/ as in "pleasure"
‘/h/ as in "hello"
x/kh/ as in the Arabic "Khaled," or /ch/ as in the Scottish "loch," or the German "Bach"

We can translate these English sentences into Lojban bridi:

  1. mi patfu do
  2. do barda
  3. mi klama la paris.
  4. mi dunda ti do
  5. ta crino
  6. do mlatu

Note: Lojban words such as patfu, barda, klama, and so forth were built algorithmically, using today's six most widely spoken languages: Chinese, Hindi, English, Russian, Spanish, and Arabic.

In Lojban, a place structure (programmers would say signature) has been defined for each relationship. The places (programmers would say arguments) in the bridi have a default order, and each place indicates the role of a word or group of words. For example, the real definition for klama (see sentence #3 above) shows five places (or arguments):

"{a goer} comes/goes to destination {a destination} from origin {an origin} via route {a route} using means/vehicle {a vehicle}"

Hence mi klama la paris. la london. ("I go to Paris from London") means something different than mi klama la london. la paris. because the order of the arguments indicates the meaning they have in the relationship.

A place in the bridi is called a sumti. The centerpiece of the bridi, called the selbri, expresses the relationship itself. So, typically, a bridi will have the form shown in Figure 1.

sumti selbri sumti sumti ...

Figure 1: Lojban bridi structure
Lojban grammar glossary
WordDefinition
bridipredicate
sumtiplace or argument
selbripredicate relation
cmavostructure word
gadriarticle
cmeneproper name
brivlapredicate word
gismuroot word
valsiword
lujvocompound predicate word
tanruphrase compound

Lojban word categories

If we look back at the six Lojban bridi in the list above, we see different kinds of words:

  • mi, do, la, ti, and ta belong to the category of small grammatical words called cmavo.
  • Among these, la is an article (gadri) announcing the name paris. (A name is called a cmene in Lojban.)
  • mi, do, ti, and ta are sumti cmavo, something like pronouns.
  • patfu, barda, klama, dunda, crino, and mlatu are all brivla—that is, words that express a relationship and carry meaning; these particular brivla are gismu, or root words.

Confused? The UML diagram in Figure 2 helped me make sense of all these elements.

Figure 2: Categories of Lojban words

More Lojban word categories

As you can see, Lojban does not have categories such as noun, verb, adjective, and adverb. Instead, it has relationships, expressed in bridi, with one or more words that constitute the selbri at the center.

In the following bridi

  • do mamta mi ("you are a mother of me"--i.e., "you are my mother")

and

  • do patfu mi ("you are a father of me"--i.e., "you are my father")

mamta and patfu play the role of the selbri. They are different brivla. A brivla is a content word, which can be:

  • a gismu, built into the language.
  • a lujvo, derived from a combination of gismu.
  • a fu'ivla, borrowed from other languages and adapted to Lojban.

Again, we can use the UML diagram in Figure 3 to understand the category brivla.

Figure 3: Kinds of Lojban brivla

Figure 3: Kinds of Lojban brivla

We have already used some gismu, which are formally defined like this:

  • patfu: x1 is a father of x2.
  • barda: x1 is big/large in property/dimension(s) x2 as compared with standard/norm x3.
  • klama: x1 comes/goes to destination x2 from origin x3 via route x4 using means/vehicle x5.
  • dunda: x1 (donor) gives/donates gift/present x2 to recipient/beneficiary x3 (without payment/exchange).
  • crino: x1 is green/verdant (color adjective).
  • mlatu: x1 is a cat/ (puss/pussy/kitten) of species/breed (feline animal) x2.

Here, x1, x2, and so on represent the arguments (sumti) that are accepted in the predicate (bridi) when these gismu play the role of a selbri. The arguments are optional, but if present, their order in the bridi helps us interpret the sentence.

Lojban tanru

A selbri can be also a tanru, which is a metaphor, built with a set of brivla. Examples are:

  • mi sutra bajra (I am a quick runner / I run quickly / I quickly run).
  • do barda nanla (you are a big boy).
  • mi dunda patfu (I am the father-who-gives).

wherein:

  • sutra: x1 is fast/swift/quick/hasty/rapid at doing/being/bringing about x2 (event/state).
  • bajra: x1 runs on surface x2 using limbs x3 with gait x4.
  • nanla: x1 is a boy/lad (young male person) of age x2 (immature) by standard x3.

Note that the meaning of a tanru may be fuzzy.

In a tanru, the left part is called the seltau; it is a modifier for the rightmost brivla in the tanru, which is called the tertau. A tanru has the place structure of its tertau.

A tanru may be more complex, with more than two brivla. Complex tanru have a semantical "left-grouping rule" that can be overridden using the cmavo bo, which acts as a top-priority operator. For example, with the following additional vocabulary...

  • cmalu: x1 is small in property/dimension(s) x2 (ka) as compared with standard/norm x3.
  • nixli: x1 is a girl (young female person) of a general age x2 (immature) by standard x3.
  • ckule: x1 is a school/institute/academy at x2 teaching subject(s) x3 to audience x4 operated by x5.

...you can build the following complex tanru, used as selbri in an example bridi that all mean "this is a small girl school,"but whose meanings are clearer than in the English equivalent:

  • ta cmalu nixli ckule ("left-grouping rule" semantics) "This is a small girl school."
  • ta cmalu bo nixli ckule (carries the same meaning as above):
    "This is a small-girl school—in other words, a school for small girls."
  • ta cmalu nixli bo ckule (carries a different meaning):
    "This is a small girl-school -- in other words, a small school for girls."

You can model a tanru with a variant of the UML Composite Pattern, as shown in Figure 4.

Figure 4: Lojban tanru basic structure

Do you remember the lujvo, which is a kind of brivla? I said a lujvo is derived from a combination of gismu. The Lojban vocabulary is founded on a list of 1350 gismu, and building lujvo is the only way to extend this vocabulary. A lujvo is built by contracting a tanru and fixing its meaning (via the usage context).

Let's consider:

  • gerku: x1 is a dog/canine of species/breed x2.
  • zdani: x1 is a nest/house/lair/den/ for x2.

The following tanru

  • gerku zdani

means "a house that has something to do with some dog or dogs." It could mean any of the following:

  • houses occupied by dogs
  • houses shaped to look like dogs
  • dogs which are also houses (e.g., houses for fleas)
  • houses named after dogs

If you want it to mean "doghouse," you must make the tanru into a lujvo. That is, you have to combine (affix) two of the rafsi associated with the gismu in the basic dictionary (I will not describe the exact rules here).

  • the rafsi for gerku is ger
  • the rafsi for zdani is zda

To specify "doghouse," we can now build a new word from gerku zdani, and set its meaning and structure:

  • gerzda

    for which:
    x1 = x1 of zdani = nest
    x2 = x2 of zdani = inhabitant = x1 of gerku = dog

gerku zdani is now the veljvo of gerzda.

We can depict the relationship between a lujvo and a tanru (which has something to do with the rafsi of the participant gismu) as shown in the UML diagram in Figure 5.

Figure 5: A more complete tanru model

Description sumti

Now let's see how to turn a selbri position into a "description sumti." All the positions x1, x2, and so on in the previous examples were filled by pronouns (sumti cmavo), except in one example: "la paris," which has an article (or gadri): la. This article turns the cmene "paris" into a description sumti. There are other gadri to use with a gismu. Suppose I would like to say "My mother gives the green cat to the big girl." I would need something to fill the places of "give" (x1 -- the donor), "what" (x2 -- the gift), and "to whom" (x2 -- the beneficiary) . The cmavo "le" automatically assumes the first position in the bridi if it is followed by a unique brivla or tanru. Combined with "se" it takes the second position, with "te" the third position, and so on. For example:

  • le dunda (the donor)
  • le se dunda (the gift)
  • le te dunda (the beneficiary)
  • le mlatu (the cat)
  • le se mlatu (the type of cat)
  • le crino mlatu (the cat that has something to do with green-ness)

So:

  • le mi mamta cu dunda la crino mlatu le barda nixli
    My mother gives the green cat to the big girl.
  • le crino mlatu cu se dunda
    The green cat is given (to someone by somebody).
    The green cat is a gift.
  • le barda nixli cu te dunda le crino mlatu
    The big girl is given the green cat.
    Somebody gives the green cat to the big girl.

Note: "cu" is a cmavo used to introduce the selbri. If cu were not included in the first example above, "mamta dunda" would have to be interpreted as a tanru meaning something like "a giver which has something to do with a mother," or a "motherly giver." So, you need something to separate the end of the first sumti from the beginning of the selbri: "cu" plays this role. It is optional when the first sumti is simple, like a sumti cmavo, but is mandatory when the first sumti is more complex.

Basically, descriptors are used to turn a selbri into a sumti. If you study Lojban, you'll see how "events" are used to turn a whole bridi into a selbri.

In fact, the sentences above are object representations (instances) of the class diagram shown in Figure 6 (an enhancement of Figure 1). Note that the selbri and sumti classes have been turned into interfaces.

Figure 6: Lojban grammatical concepts Click to enlarge


Why create UML diagrams?

Why did I create all the UML diagrams I show in this article? First, because I'm a visual learner. When I am learning, it enhances my understanding if I represent concepts and their relationships through a visual medium. And second, I created the diagrams because modeling in UML has become a reflexive activity for me as a software engineer (read more about this in the Appendix). It is my default method for analyzing and understanding the structure of a complex system -- such as a language.

Of course, my diagrams represent only a map of the concepts; they still have a lot of white spaces. But they are the beginning of a domain model, and I can continue detailing this model as I learn more concepts. The model I show in this article consists mostly of class diagrams, but of course I can also do more sophisticated modeling. This will enhance my understanding, but I also recognize that modeling has its limitations: Learning Lojban will still be challenging for me, and these diagrams may not cover all the territory. The same would be true if I were to build a related application -- perhaps a dedicated, structured editor for writing and automatically correcting Lojban texts, or a translator, or a computer-aided tutorial, for example. I might have to build entirely different models, using only small parts of my original domain model, depending on the nature of the application and the way I analyzed its use cases.

For learning Lojban or building an application based on Lojban, as for any other kind of project, a domain model is a valuable and essential artifact. However, by definition, such a model doesn't define the project.


ki'e .i co'o

As a parting gift, I'd like to leave you with a survival kit, packed with a few Lojban words and bridi, just in case you get lost in Lojbanistan during your next holidays:

  • coi (hello)
  • mi na jimpe (I don't understand)
  • mi xagji (I am hungry)
  • ma do cmene (what's your name?)
  • mi prami do (I love you [use this carefully])
  • ki'e (thank you)
  • co'o (bye)
  • ko ko kurji (take care of you) 7

Appendix: Software engineering languages and the Sapir-Whorf effect

Today, the "weak" version of the Sapir-Whorf Hypothesis-- a given language influences the culture that uses it--is widely accepted. But I think we are still debating the question raised by the "strong" version of the Hypothesis: Does our language shape (or limit, or extend) the way we think? If language is a tool to cut our perceived reality into slices, do different languages end up with different slices—more precise in some domains and less precise in others?

People who are fluent in several languages would almost certainly answer "yes." In certain situations, they all find that the ideas they want to express are easier to formulate in one language than in the others they know. For example, in the previous sentence I had to use "he/she" to be inclusive. However, some languages have a third person pronoun that doesn't distinguish between genders (e.g., in Lojban, sumti cmavo do not indicate gender or number). Personally, I believe that a language reflects both the historical background of a culture and an elaboration process that never ends. So knowing only a single language can limit our possibilities for communicating an idea, but as we bring new ideas into the culture, the language expands to accept and reflect that idea.

Of course, as Lojban's developers discovered, it is very difficult to invent a new, culturally neutral language, teach it to people from different cultures, and then wait to see whether it produces a Whorfian effect. In a sense, that is what the promoters of UML are trying to do within the software world. However, UML's inventors and developers understand that software engineers may share a common "culture" based on common knowledge, problems, and solutions, but we do not all approach modeling the same way.

For example, to engage with some colleagues I met at a conference, I suggested modeling a simple fire alarm system for a house. One person, who was used to building software for controlling an aircraft engine unit, treated everything as a control loop, with outputs giving feedback to modify what to do with the inputs. Another person treated everything as function chains with filters. Each used a different representation and a different visual "language," which influenced the way they acted and thought.

So by extension, does the engineering language we apply limit the solution space we can explore and determine the solutions we can imagine? In the words of Eric Steven Raymond, a free software movement theorist, "Software designs are sometimes restricted in avoidable ways by mental habits a developer has picked up from a particular language or environment (perhaps a now-obsolete one) and never discarded." 8 Hence the well-known joke:

"Good FORTRAN programmers can program in FORTRAN with any programming language."

The UML is an attempt to break through the mental habits and restrictions of language and environment with visual representations that transcend them. In software engineering, the UML is today's language of choice for analysis and design; students now learn it at school. But those of us who have been in the field for some time began internalizing it as we first discovered OO programming, then OO design, and then OO analysis. And as we began creating visual representations of our design and analysis results, we also experienced a paradigm shift in how we thought about and implemented engineering practices. "Practicing" the UML gave us a positive, reflexive approach to solving problems.

Does it also prevent us from thinking about solution paths we might take to solve a problem? If so, then these exclusions should drive UML enhancements. In our "engineering culture," as opposed to a "real culture," we have more freedom to change our languages—and we can use this freedom to evolve the UML.


Acknowledgments

Catherine Southwood really helped improve the English in this article, and made many useful suggestions. Many thanks to her! .i ki'e doi. katrin. And Marlene Ellin did a great job in editing the last version of this article for The Rational Edge. Many thanks, too! .ije ki'e doi. marlen.


References

Books

John Cowan, The Complete Lojban Language. A Logical Language. Group Publication, 1997. See http://www.lojban.org/publications/cll.html

Nick Nicholas and John Cowan, What is Lojban? .i la lojban. mo. A Logical Language Group Publication, 2003. See http://www.lojban.org/publications/level0.html

Robin Turner and Nick Nicholas, Lojban for beginners. http://www.opoudjis.net/lojbanbrochure/lessons/book1.html

Online articles

Eric Steven Raymond, "Tolkien's Tengwar: A romantic orthography for Lojban," at http://catb.org/~esr/tengwar/lojban-tengwar.html

"What is Lojban? (and the SWH)," at http://www.lojban.org

Lojban, UML, and the SWH can be found in the Wikipedia, at http://www.wikipedia.org

Other Web sites and articles

http://www.catb.org/~esr/writings/cathedral-bazaar/ Eric Steven Raymond's seminal essay about the open-source hacker culture.

http://www.uea.org The World Esperanto Association.

"Wanted: A World Language," by Edward Sapir, 1931: http://www.langmaker.com/sapir.htm


Notes

1 See the Lojban official Website: http://www.lojban.org

2 See The World Esperanto Association: http://www.uea.org

3 See http://www.langmaker.com about Model Languages & The Art of Language Making (Conlang)

4 See http://www.elvish.org The Elvish Linguistic Fellowship.

5 See http://www.kli.org The Klingon Language Institute.

6 Lojban and the SWH, discussions: http://www.lojban.org/files/why-lojban/swh.txt and a presentation of the SWH and compilation of links: http://www.usingenglish.com/speaking-out/linguistic-whorfare.html

7ko ko kurji is the same a ko kurji ko (only the sumti order counts in a bridi, not their absolute place). ko is the imperative for the English word "do."

From the Lojban FAQ: "'ko kurji do' commands 'Take care of you(rself)' but 'ko kurji ko' commands both that 'You take care of yourself,' and 'Allow yourself to be taken care of by you,' with a resulting double emphasis that indicates an especial priority or responsibility for self-focus."

Jeff Prothero's original thought: http://www.lojban.org/files/papers/4thtense

8 See Eric Steven Raymond's Jargon file extract: http://catb.org/~esr/jargon/html/W/Whorfian-mind-lock.html and Jeff Prothero's original thought: http://www.lojban.org/files/papers/4thtense

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Rational software on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Rational
ArticleID=2740
ArticleTitle=Using UML to understand Lojban
publish-date=02052004