Speaking the Java language without an accent

Native fluency for nonnative programmers

Elliotte Rusty Harold explores the native idioms, dialects, and accents of the Java™ language and community. By following this article's guidance, C/C++ and other nonnative programmers can blend right in with native Java speakers.

Elliotte Rusty Harold, Software Engineer, Cafe au Lait

Photo of Elliotte Rusty HaroldElliotte Rusty Harold first learned the Java language in 1995. Prior to that, his first language was Fortran and his second was Applesoft Basic. C was probably his third language, and fourth may have been Microphone II. Fifth was Pascal, though he never did much with that. Sixth was probably IDL (Interactive Data Language). Seventh was perhaps Perl? Java was probably his eighth language, and the one he's taken farther than any other. However, since then he's continued to learn new languages, including PHP, AppleScript (or did that come before Java?), XSLT, XQuery, C++, and most recently Haskell.



12 January 2010

Also available in Chinese Russian Japanese Spanish

Develop skills on this topic

This content is part of a progressive knowledge path for advancing your skills. See Become a Java developer

Learning a new programming language is easier than learning a new spoken language. But, in both endeavors, it takes extra effort to learn to speak the new language without an accent. It isn't that hard to learn the Java language when you already know C or C++; it's similar to learning Danish when you already speak Swedish. The languages are different, but mutually comprehensible. However if you aren't careful, your accent will reveal you as a nonnative speaker every time.

C++ programmers often put certain inflections on Java code that unmistakably mark them as converts rather than native speakers. Their code still works, but it sounds wrong to a native ear. The natives may look down on the nonnative speakers as a result. When moving from C or C++ (or Basic or Fortran or Scheme or anything else) to the Java language, you need to eradicate certain idioms and correct some pronunciations so that you speak in a fluent manner.

In this article, I explore a number of Java programming details that often get overlooked precisely because semantically they don't matter all that much, if at all. They are purely issues of style and convention. A few of them have plausible justifications. Some of them lack even that. But all of them are real phenomena in Java code as it's written today.

What language is this?

Let's begin with a bit of code to convert temperatures in Fahrenheit to temperatures in Celsius, shown in Listing 1:

Listing 1. A bit of C code?
float F, C;
float min_tmp, max_tmp, x;

min_tmp = 0;  
max_tmp = 300;
x  = 20;

F = min_tmp;
while (F <= max_tmp) { 
   C = 5 * (F-32) / 9;
   printf("%f\t%f\n", F, C);
   F = F + x;
}

What language is used in Listing 1? Pretty obviously it's C — but wait a minute. Take a look at the full program in Listing 2:

Listing 2. A Java program
class Test  {

    public static void main(String argv[]) {  
        float F, C;
        float min_tmp, max_tmp, x;
      
        min_tmp = 0;  
        max_tmp = 300;
        x  = 20;
      
        F = min_tmp;
        while (F <= max_tmp) { 
          C = 5 * (F-32) / 9;
          printf("%f\t%f\n", F, C);
          F = F + x;
        } 
    }

    private static void printf(String format, Object... args) {
        System.out.printf(format, args);
    }
    
}

Believe it or not, Listings 1 and 2 are both written in the Java language. They're just Java code written in a C idiom (in fairness, Listing 1 could also be real C code). It's very funny-looking Java code, though. A number of idioms here mark it as the work of someone who thinks in C and is merely translating into the Java language:

  • Variables are floats instead of doubles.
  • All variables are declared at the top of the method.
  • Initialization follows declaration.
  • A while loop is used instead of a for loop.
  • printf is used instead of println.
  • The argument to the main() method is named argv.
  • The array brackets come after the argument name instead of after the type.

None of these idioms is wrong in the sense of producing code that won't compile or that yields an incorrect answer. Individually, none of these points is significant. However, taken together they add up to some very strange code that's as hard for a Java programmer to read as Geordie is for an American to understand. The fewer such C idioms you use, the clearer your code will be. With that in mind I'll analyze some of the most frequent ways that C programmers reveal themselves and show how they can make their code more pleasing to a Java eye.


Naming conventions

Depending on whether you're coming from C and C++ or C#, you may have internalized different naming conventions for classes. In C#, for example, class names begin with lowercase letters, and method and field names begin with uppercase letters. Java style is exactly backwards. I can't justify one convention or the other for any rational reason, but I do know that mixing up naming conventions makes code feel horribly wrong. It also leads to bugs. When you know that every name in all caps is a constant, you treat it differently. I've found many bugs in programs simply by looking for places where the naming conventions did not match the declared type.

args, not argv

This point is one of the most trivial, but such are the minutiae over which style wars are fought. In Java argot, the argument of the main() method is named args, not argv: public static void main(String[] args)

This is, at best, only an incredibly minor improvement over the name argv. It's marginally more obvious as an abbreviation of arguments. Of course, abbreviations are usually forbidden in idiomatic Java code (see Don't abbreviate). The only reason we use args as the name of the argument to the main() method is the same reason C programmers use argv — it's what Kernighan and Ritchie, the people who wrote the first C book, used. Gosling and Arnold use args. There's no other reason than that. Still, all native-speaking Java programmers prefer args, and if you want to speak without an accent, you will too.

The basic rules for names in Java programming are quite simple and worth memorizing:

  • Class and interface names begin with a capital letter, as in Frame.
  • Method, field, and local variable names begin with a lowercase letter, as in read().
  • Class, method, and field names all use camel casing, as in InputStream and readFully().
  • Constants — final static fields and occasionally final local variables — are written in all uppercase with underscores separating the words, as in MAX_CONNECTIONS.

Don't abbreviate

Names like sprintf and nmtkns are relics of a time when supercomputers had 32 KB of memory. Compilers saved memory by limiting identifiers to 8 characters or fewer. However, this really hasn't been an issue for more than 30 years. Today, there's no excuse for not fully spelling out variable and method names. Nothing marks a program as the product of a converted C hacker more obviously than unintelligible, vowel less variable names, as in Listing 3:

Listing 3. Abbrvtd nms r hrd 2 rd
for (int i = 0; i < nr; i++) {
    for (int j = 0; j < nc; j++) {
        t[i][j] = s[i][j];
    }
}

Unabbreviated names in camel case are far more legible, as you can see in Listing 4:

Listing 4. Unabbreviated names are easy to read
for (int row = 0; row < numRows; row++) {
    for (int column = 0; column < numColumns; column++) {
        target[row][column] = source[row][column];
    }
}

Code is read more often than it's written, and the Java language is optimized for reading. C programmers have an almost macho attraction to obfuscated code; Java programmers don't. The Java language prioritizes legibility over conciseness.

A few abbreviations are so common that you can use them without guilt:

  • max for maximum
  • min for minimum
  • in for InputStream
  • out for OutputStream
  • e or ex for an exception in a catch clause (though not anywhere else)
  • num for number, though only when used as prefix as in numTokens, or numHits
  • tmp for a temporary variable used very locally — for instance, when swapping two values

Other than these, and perhaps a few others, you should fully spell out all words used in names.


Variable declaration, initialization, and (re)use

Early versions of C required all variables to be declared at the start of the method. This enabled certain optimizations in the compiler that let it run in environments that were quite penurious with RAM. Thus, methods in C tend to begin with several lines of variable declarations:

int i, j, k;
double x, y, z;
float cf[], gh[], jk[];

However, this style has a number of negative effects. It separates the declaration of the variable from its use, making the code a little harder to follow. Furthermore, it makes it far more likely that one local variable will be reused for several different things, possibly unintentionally. This can introduce unexpected bugs when a variable holds a leftover value that one piece of the code wasn't expecting. Combine this with C's penchant for short, cryptic variable names, and you have a recipe for disaster.

In the Java language (and more recent versions of C), variables can be declared at or near the point of first use. Do this when you write Java code. It makes your code safer, less bug prone, and easier to read.

On a related note, Java code usually initializes each variable when and where it is declared. C programmers sometimes write code like this:

int i;
i = 7;

Java programmers almost never write code like that, even though it's syntactically correct. They write it like so:

int i = 7;

This helps avoid bugs that result from unintentional use of uninitialized variables. The only common exception is when a single variable needs to be scoped in both a try and a catch or finally block. This most often arises when the code deals with input streams and output streams that need to be closed in the finally block, as shown in Listing 5:

Listing 5. Exception handling can make it hard to scope variables properly
InputStream in = null;
try {
  in = new FileInputStream("data.txt");
  // read from InputStream
}
finally {
  if (in != null) {
    in.close();
  }
}

However, that's almost the only time this happens.

Finally, the last knock-on effect of this style is that Java programmers usually declare only one variable per line. For example, they initialize three variables like so:

int i = 3;
int j = 8;
int k = 9;

They tend not to write code like this:

int i=3, j=8, k=9;

This statement is syntactically correct, but full-time Java programmers usually don't do that, except in one special case I'll get to below.

An old-style C programmer might even write the code in four lines:

int i, j, k;
i = 3;
j = 8;
k = 9;

Thus, the usual Java style is actually a little more concise at only three lines because it combines declaration and initialization.

Push variables inside loops

One special case that shows up frequently is declaring variables outside of loops. For example, consider the simple for loop in Listing 6, which calculates the first 20 terms of the Fibonacci sequence:

Listing 6. C programmers like to declare variables outside of loops
int high = 1;
int low = 1;
int tmp;
int i;
for (i = 1; i < 20; i++) {
  System.out.println(high);
  tmp = high;
  high = high+ low;
  low = tmp; 
}

All four variables are declared outside the loop and therefore have excessive scope even though they're only used inside the loop. This is bug-prone because variables can get reused outside of the scope in which they were intended to be used. This is especially true for variables with common names such as i and tmp. Values from one use can linger on and interfere with later code in unexpected ways.

The first improvement (which is also supported by modern versions of C) is to move the declaration of the i loop variable inside the loop, as shown in Listing 7:

Listing 7. Move loop variables into the loop
int high = 1;
int low = 1;
int tmp;
for (int i = 1; i < 20; i++) {
  System.out.println(high);
  tmp = high;
  high = high+ low;
  low = tmp; 
}

However, don't stop there. Experienced Java programmers will also move the tmp variable inside the loop, as in Listing 8:

Listing 8. Declare temporary variables inside of loops
int high = 1;
int low = 1;
for (int i = 1; i < 20; i++) {
  System.out.println(high);
  int tmp = high;
  high = high+ low;
  low = tmp; 
}

Undergraduates with an excessive fetish for speed sometimes object that this will slow down code by doing more work than is necessary inside the loop. However, at run time a declaration does absolutely no work at all. There's no performance penalty whatsoever on the Java platform for moving a declaration inside a loop.

Many programmers, including many experienced Java programmers, will stop here. However, there is a little-used technique that moves all the variables inside the loop. You can actually declare more than one variable in the initialization phase of a for loop simply by separating them with commas, as shown in Listing 9:

Listing 9. All variables inside the loop
for (int i = 1, high = 1, low = 1; i < 20; i++) {
  System.out.println(high);
  int tmp = high;
  high = high+ low;
  low = tmp; 
}

This has now moved beyond merely idiomatic fluent code into truly expert code. This ability to limit the scope of local variables tightly is a big reason why you see a lot more for loops and many fewer while loops in Java code than in C code.

Don't recycle variables

A corollary of the above is that Java programmers rarely reuse local variables for different values and objects. For example, Listing 10 sets up a few buttons with associated action listeners:

Listing 10. Recycling local variables
Button b = new Button("Play");
b.addActionListener(new PlayAction());
b = new Button("Pause");
b.addActionListener(new PauseAction());
b = new Button("Rewind");
b.addActionListener(new RewindAction());
b = new Button("FastForward");
b.addActionListener(new FastForwardAction());
b = new Button("Stop");
b.addActionListener(new StopAction());

Experienced Java programmers rewrite this with five different local variables, as shown in Listing 11:

Listing 11. Non-recycled variables
Button play = new Button("Play");
play.addActionListener(new PlayAction());
Button pause = new Button("Pause");
pause.addActionListener(new PauseAction());
Button rewind = new Button("Rewind");
rewind.addActionListener(new RewindAction());
Button fastForward = new Button("FastForward");
fastForward.addActionListener(new FastForwardAction());
Button stop = new Button("Stop");
stop.addActionListener(new StopAction());

Reusing one local variable for several logically different values or objects is bug-prone. Essentially, local variables (though not always the objects they point to) are free from both memory and time concerns. Don't be afraid to use as many different local variables as you need.

Trust the garbage collector to manage memory

Programmers coming from the C++ world often worry excessively about memory consumption and memory leaks. There are two common symptoms of this. One of them is setting variables to null when you're done with them. The other is invoking finalize(), or using it as sort of a pseudo-destructor. Neither is usually necessary. Although there are times when you really do need to free memory manually in Java code, they are few and far between. Most of the time you can simply rely on the garbage collector to do something sensible and reasonably fast. As with most optimizations, the best rule of thumb is: don't do them unless, and until, you can prove they're necessary.

Use the preferred primitive datatypes

The Java language has eight primitive datatypes, but only six of them are used. In Java code, floats are far less common than in C code. You almost never see float variables or literals in Java code; doubles are much preferred. The only time floats are used is in manipulation of large multidimensional arrays of floating-point numbers with limited precision in which the storage space would be significant. Otherwise, just make everything a double.

Even less common than floats are shorts. Rarely have I seen a short variable in Java code. The only time it ever shows up — and this is extremely rare, I warn you — is when externally defined data formats that happen to include a 16-bit signed integer type are read. In that situation, most programmers just read that as an int.


Scoping privacy

Have you ever seen an equals() method like the example in Listing 12?

Listing 12. An equals() method written by a C++ programmer
public class Foo {

  private double x;

  public double getX() {
    return this.x;
  }

  public boolean equals(Object o) {
    if (o instanceof Foo) {
      Foo f = (Foo) o;
      return this.x == f.getX();
    }
    return false;
  }

}

This method is technically correct, but I can virtually guarantee you that this class was written by an unreconstructed C++ programmer. The giveaway is the use of the private field x and the public getter method getX() in the same method and indeed in the same line. In C++, this is necessary because privacy is scoped to the object rather than to the class. That is, in C++, objects of the same class cannot see one another's private member variables. They must use accessor methods instead. In the Java language, privacy is scoped to the class rather than the object. Two objects each of type Foo can directly access each other's private fields.

Some subtle — and more often than not irrelevant — considerations suggest that you might prefer direct field access to getter access or vice versa within Java code. Field access may be marginally faster, but rarely. Sometimes, getter access may provide a slightly different value from direct field access, especially when subclasses are in play. However, in the Java language, there is never any excuse for using both direct field access and getter access for the same field of the same class in the same line.


Punctuation and syntax idioms

Here are some Java idioms that diverge from their C counterparts, in some cases to take advantage of certain Java language features.

Place array brackets on the type

The Java language declares arrays much as one does in C:

int k[];
double temperature[];
String names[];

However, the Java language also offers an alternate syntax in which the array brackets are placed after the type instead of after the variable name:

int[] k;
double[] temperatures;
String[] names;

Most Java programmers have adopted the second style. This says that k has type array of ints, temperatures has type array of doubles, and names has type array of Strings.

Also, as with other local variables, Java programmers tend to initialize the array at the point at which it is declared:

int[] k = new int[10];
double[] temperatures = new double[75];
String[] names = new String[32];

Use s == null, not null == s

Careful C programmers have learned to put literals on the left-hand side of comparison operators. For example:

if (7 == x) doSomething();

The goal here is to avoid accidentally using the single-equals assignment operator instead of the double-equals comparison operator:

if (7 = x) doSomething();

Putting the literal on the left-hand side makes this a compile-time error. This technique is sound programming practice in C. It helps prevent real bugs, because putting the literal on the right-hand side makes this always return true.

However, the Java language, unlike C, has separate int and boolean types. The assignment operator returns an int, whereas the comparison operator returns a boolean. Consequently, if (x = 7) is already a compile-time error, so there's no reason to use the unnatural form if (7 == x) for comparison statements, and fluent Java programmers don't.

Concatenate strings instead of formatting them

For many years the Java language did not have a printf() function. This was finally added in Java 5, and it has some occasional uses. In particular, format strings are a convenient domain-specific language for the rare cases when you want to format numbers to particular widths or with a certain number of places after the decimal point. However, C programmers tend to overuse printf() in their Java code. It should generally not be used merely as a replacement for simple string concatenation. For example:

System.out.println("There were " + numErrors + " errors reported.");

is preferable to:

System.out.printf("There were %d errors reported.\n", numErrors);

The variant that uses string concatenation is easier to read, especially for simple cases, and less bug-prone because there's no danger of a mismatch between the placeholders in the format string and the number or type of the variable arguments.

Prefer postincrement to preincrement

There are places where the difference between i++ and ++i is significant. Java programmers have a special name for these places. They are called "bugs."

You should never write code that depends on the difference between preincrement and postincrement (and that goes for C, too). It's simply too hard to follow and too error-prone. If you find yourself writing code where the difference does matter, then reorganize the code into separate statements so that it no longer matters.

Where the difference between preincrement and postincrement is insignificant — for instance, as the increment step of a for loop — Java programmers prefer postincrement to preincrement about 4 to 1. i++ is much more common than ++i. I can't justify that, but that's the way it is. If you write ++i, anyone who reads your code is going to waste time wondering why you wrote it like that. Consequently, you should always use postincrement unless you have a particular reason to use preincrement (and you should never have a reason to use preincrement).


Error handling

Error handling is one of the most confused issues in Java programming, and one that really separates the master stylists of the language from the grunters and mumblers. Indeed, it could easily be the basis of an article all on its own. In brief, use exceptions properly and never return error codes.

The first mistake that nonnative speakers make is to return a value indicating an error, instead of throwing an exception. Indeed, you even see this in some of the Java language's own APIs that date back to the first days of Java 1.0, before all the programmers at Sun had fully internalized the new language. For example, consider the delete() method in java.io.File:

public boolean delete()

This method returns true if the file or directory is successfully deleted. Otherwise it returns false. What it should do is return nothing on successful completion, and throw an exception if the file exists yet can't be deleted for some reason:

public void delete() throws IOException

When methods return error values, every method call is surrounded by error-handling code. This makes it hard to follow and understand the method's normal flow of execution when, as is usually the case, there's no problem and everything is fine. By contrast, when error conditions are indicated by exceptions, the error handling can be moved out of the way into a separate block of code later in the file. It can even be moved into other methods and other classes if there's a more appropriate place to handle the problem.

This brings me to the second anti-pattern in error handling. Programmers coming from a C or C++ background sometimes try to handle exceptions as close as possible to the point of the exception being thrown. Taken to its extreme, it can result in code like Listing 13:

Listing 13. Too-early exception handling
public void readNumberFromFile(String name) {
    FileInputStream in;
    try {
        in = new FileInputStream(name);
    } catch (FileNotFoundException e) {
        System.err.println(e.getMessage());
        return;
    }

    InputStreamReader reader;
    try {
        reader = new InputStreamReader(in, "UTF-8");
    } catch (UnsupportedEncodingException e) {
        System.err.println("This can't happen!");
        return;
    }


    BufferedReader buffer = new BufferedReader(reader);
    String line;
    try {
       line = buffer.readLine();
    } catch (IOException e) {
        System.err.println(e.getMessage());
        return;
    }

    double x;
    try {
        x = Double.parseDouble(line);
    }
    catch (NumberFormatException e) {
        System.err.println(e.getMessage());
        return;
    }

    System.out.println("Read: " + x);
}

This is just as hard to read and even more convoluted than the if (errorCondition) tests that exception handling was designed to replace. Fluent Java code moves the error handling away from the point of failure. It does not mix the error-handling code with the normal flow of execution. The version in Listing 14 is much easier to follow and understand:

Listing 14. Keep the code for the main path of execution together
public void readNumberFromFile(String name) {
    try {
        FileInputStream in = new FileInputStream(name);
        InputStreamReader reader = new InputStreamReader(in, "UTF-8");
        BufferedReader buffer = new BufferedReader(reader);
        String line = buffer.readLine();
        double x = Double.parseDouble(line);
        System.out.println("Read: " + x);
        in.close();
    }
    catch (NumberFormatException e) {
        System.err.println("Data format error");
    }
    catch (IOException e) {
        System.err.println("Error reading from file: " + name);
    }
}

Occasionally you may need to use nested try-catch blocks to separate out different failure modes that produce the same exception, but this is uncommon. The general rule of thumb is that if there's more than one try-block's worth of code in a method, then the method is too big and should probably be broken up into smaller methods anyway.

Finally, programmers new to Java programming from all languages often make the mistake of assuming that they must catch checked exceptions in the method where they're thrown. More often than not, the method that throws the exception is not the method that should catch it. For instance, consider a method that copies streams, like Listing 15:

Listing 15. Too-early exception handling
public static void copy(InputStream in, OutputStream out) {
  try {
    while (true) {
      int datum = in.read();
      if (datum == -1) break;
      out.write(datum);
    }
    out.flush();
  } catch (IOException ex) {
     System.err.println(ex.getMessage());
  }
}

This method simply does not have enough information to handle plausibly the IOExceptions that may occur. It doesn't know who called it, and it doesn't know what the consequences of a failure are. The only reasonable thing for this method to do is to let the IOException bubble up to the caller. The correct way to write this method is shown in Listing 16:

Listing 16. Not all exceptions need to be caught at the first possible moment
public static void copy(InputStream in, OutputStream out) throws IOException {
  while (true) {
    int datum = in.read();
    if (datum == -1) break;
    out.write(datum);
  }
  out.flush();
}

It's shorter. It's simpler. It's more intelligible, and it transmits the error information to the part of the code that's best suited to handle it.


How much does this matter, really?

None of these are critical issues. Some of them have reason for the convention: Declare variables at point of first use; throw exceptions when you don't know what to do with them. Others are purely stylistic convention (args, not argv; i++ instead of ++i). I won't claim that following any of these rules will make your code run faster, and only a few of them will help you avoid bugs. However, all of them are necessary to becoming a natively fluent Java speaker.

For better or worse, speaking (or writing code) without an accent will cause others to respect you more, pay more attention to what you say, and even pay you more for saying it. Plus, speaking the Java language without an accent really is a lot easier than speaking unaccented French or Chinese or English. Once you've learned the language, it's worth making the extra effort to speak it as a native would.

Resources

Learn

  • Code Conventions for the Java Programming Language: Though a little dated, this reference still serves as the foundation for modern Java style.
  • The Java Language Specification (James Gosling et al., Addison Wesley, 2005): Now in its third edition, this was perhaps the first language spec to recognize that style needed to be discussed as well as syntax and semantics. It's a major reason the Java language has a much more standard style across groups and projects than do earlier languages such as C++ and Basic.
  • Effective Java, 2nd ed. (Joshua Bloch, Prentice Hall, 2008): Bloch's book covers many of the more semantic aspects of Java style.
  • "Java programming for C/C++ developers" (James Stricker, developerWorks, May 2002): This tutorial is designed for C or C++ programmers who want to learn how to program in the Java language.
  • New to Java programming: Read an overview of Java technology basics and find out how the technology fits into the context of contemporary software development.
  • Browse the technology bookstore for books on these and other technical topics.
  • developerWorks Java technology zone: Find hundreds of articles about every aspect of Java programming.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=461055
ArticleTitle=Speaking the Java language without an accent
publish-date=01122010