Classworking toolkit: Generics with ASM

Find out how to access generic type information from Java 5 code using the ASM bytecode framework

Java™ 5 generics provide information that's useful for many classworking applications. Although Java reflection can be used to get generics information for loaded classes, the requirement that classes be loaded into the JVM can be a major drawback. In this article, classworking guru Dennis Sosnoski shows how the ASM Java bytecode manipulation framework offers flexible access to generics information without going through the Java classloading process. Along the way, he looks deeper into the representation of generics in the binary class format.

Dennis Sosnoski (dms@sosnoski.com), Java and XML consultant, Sosnoski Software Solutions Inc.

Dennis SosnoskiDennis Sosnoski is the founder and lead consultant of Seattle-area Java technology consulting company Sosnoski Software Solutions, Inc., specialists in XML and Web services training and consulting. His professional software development experience spans over 30 years, with the last several years focused on server-side XML and Java technologies. Dennis is the lead developer of the open source JiBX XML Data Binding framework built around Java classworking technology and the associated JibxSoap Web services framework, as well as a committer on the Apache Axis2 Web services framework.



07 February 2006

Also available in Chinese Japanese

Generics information from Java 5 programs can be very helpful in understanding program data structures. Last time, I showed how you can use runtime reflection to access generics information. This reflection approach works well if you're only interested in getting information from classes you're loading into the JVM. However, sometimes you may want to modify classes before loading them, or you might want to just investigate data structures without loading the classes at all. In these cases, reflection won't work for you -- reflection uses the JVM's class structures as the source of information, so it can only work with classes that have been loaded by the JVM.

To access generics information without loading classes into the JVM, you need a way of reading the generics information stored inside the binary class representation. In some prior articles, I've shown how the ASM classworking library provides a very clean interface for reading and writing binary classes. In this article, I'll show how you can use ASM both to retrieve the raw generics information out of class files and to interpret the generics in a useful manner. Before digging into the ASM details, I'll start off with a look at how generics information is actually encoded into the binary classes.

Tracking generics

The designers of the generics specification needed to add typing information to Java binary classes that could be used by the Java compiler. Fortunately, the Java platform already had a mechanism built into the binary class format that could be used for this purpose. This mechanism is the attribute structure, which basically allows all kinds of information to be associated with a class itself or with the methods, fields, and other components of a class. Certain kinds of attribute information are defined by the JVM specification, but the original designers of the Java language made the wise choice to leave the set of possible attributes open for extension both by later versions of the specification and by users designing their own custom attributes.

Generics information is stored in a new standard attribute: the signature attribute. This attribute is a simple text value that encodes the generics information for a class, field, method, or variable. The updated Java 5 JVM specification (see Resources for a link to the Java 5 changes page) spells out the full syntax of the signature text values. I'm not going to try to cover all that here, but I'll run through a quick introduction to signatures later in this section. First though, I'll give some necessary background with a look at the internal form of class names and the field and method descriptors used by the JVM.

Going internal

Classes in the Java platform are always from some package. When you reference a class name in Java source code, you may or may not actually include the package qualification as part of the name. You're always allowed to include the package qualification (as in java.lang.String), but you can drop it as a convenience if the class is from the java.lang package or has been imported into the source file. The form of the class name that includes the package qualification is called the "fully qualified" class name.

Ask the expert: Dennis Sosnoski on JVM and bytecode issues

For comments or questions about the material covered in this article series, as well as anything else that pertains to Java bytecode, the Java binary class format, or general JVM issues, visit the JVM and Bytecode discussion forum, moderated by Dennis Sosnoski.

Inside the actual binary class, class names are always specified with a package. The format of the names is a little different from the fully qualified class names used in Java source code, though, using forward slashes ('/') in place of periods ('.'). For example, in the case of the String class, the internal form of the name is java/lang/String. If you try to print or view a class file as text, you'll generally see many strings of this type, each a reference to some class.

Class references in this internal form are used as part of field and method descriptors. A field descriptor specifies the exact type of a field defined within a class. The representation used depends on whether the field is a simple object type, a simple primitive type, or an array type. For simple object types, the representation uses a leading 'L', followed by the internal form of the object class name, and terminated by a trailing ';'. For primitive types, the representation uses a single letter code for each type (such as 'I' for an int and 'Z' for a boolean). For array types, the representation adds a leading '[' as a modifier to the field descriptor for the array item type (which can itself be an array type). Table 1 gives some samples for each type of field descriptor, along with the equivalent Java source code declaration:

Table 1. Field descriptor examples
DescriptorSource Code
Ljava/lang/String;String
Iint
[Ljava/lang/Object;Object[]
[Zboolean[]
[[Lcom/sosnoski/generics/FileInfo;com.sosnoski.generics.FileInfo[][]

A method descriptor just combines field descriptors to specify the parameter types and return type of a method. The format for a method descriptor is easy to understand. It always starts with an open parenthesis, followed by the field descriptors for the parameters (all run together), followed by a close parenthesis, and ends with the return type (or 'V' if the return type is void). Table 2 gives a few examples of method descriptors, along with the equivalent Java source code declaration (note that the method names and parameter names are not part of the method descriptor, so I've just used placeholder names for these):

Table 2. Method descriptor examples
DescriptorSource Code
(Ljava/lang/String;)Iint mmm(String x)
(ILjava/lang/String;)Vvoid mmm(int x, String y)
(I)Ljava/lang/String;String mmm(int x)
(Ljava/lang/String;)[Cchar[] mmm(String x)
(ILjava/lang/String;[[Lcom/sosnoski/generics/FileInfo;)Vvoid mmm(int x, String y, FileInfo[][] z)

Sign on the dotted line

Now that you've seen field and method descriptors, you're ready to hear about signatures. The signature format extends the idea of field and method descriptors to include generic type information. Unfortunately, the complexity of generic types (including all the possible variations of bounds and such) means that signatures cannot be described as simply as the descriptors. The grammar for signatures (supplied in the JVM specification changes for Java 1.5, chapter 4) includes 21 separate productions. Rather than go through the whole set, I'll just provide some examples for now, which I'll expand on in the next section.

Listing 1 shows portions of the source code from one of the data structure classes used in the last article, along with the corresponding signature strings. In this case, the class is not itself a parameterized type, but the fields and methods use parameterized java.util.Lists:

Listing 1. Simple signature example
public class DirInfo
{
    private final List<FileInfo> m_files;
    private final List<DirInfo> m_directories;
    ...    
    public List<DirInfo> getDirectories() {
        return m_directories;
    }
    public List<FileInfo> getFiles() {
        return m_files;
    }
    ...
}

Class signature:
 {none}
m_files signature:
 Ljava/util/List<Lcom/sosnoski/generics/FileInfo;>;
m_directories signature:
 Ljava/util/List<Lcom/sosnoski/generics/DirInfo;>;
getDirectories() signature:
 ()Ljava/util/List<Lcom/sosnoski/generics/DirInfo;>;
getFiles() signature:
 ()Ljava/util/List<Lcom/sosnoski/generics/FileInfo;>;

Because the class is not a parameterized type, no signature is added to the binary class representation for the class itself. However, signatures are present for both the fields and methods that use parameterized types. The m_files field signature identifies it as a List of type FileInfo, while the m_directories signature says it's a List of type DirInfo. Likewise, the getDirectories() method signature says it returns a List of type DirInfo, while the getFiles() signature says it's a List of type FileInfo.

Looks easy so far, doesn't it? Now check out Listing 2, which gives a simple parameterized class definition and the corresponding signature strings:

Listing 2. Parameterized class signature example
public class PairCollection<T,U> implements Iterable<T>
{
    /** Collection with first component values. */
    private final ArrayList<T> m_tValues;
    
    /** Collection with second component values. */
    private final ArrayList<U> m_uValues;
    ...
    public void add(T t, U u) {
        m_tValues.add(t);
        m_uValues.add(u);
    }
    
    public U get(T t) {
        int index = m_tValues.indexOf(t);
        if (index >= 0) {
            return m_uValues.get(index);
        } else {
            return null;
        }
    }
    ...
}

Class signature:
 <T:Ljava/lang/Object;U:Ljava/lang/Object;>Ljava/lang/Object;Ljava/lang/Iterable<TT;>;
m_tValues signature:
 Ljava/util/ArrayList<TT;>;
m_uValues signature:
 Ljava/util/ArrayList<TU;>;
add signature:
 (TT;TU;)V
get signature:
 (TT;)TU;

Because the Listing 2 class is a parameterized type, the class signature needs to be present in the binary class representation. The text of the signature is long compared with the source code, but not too difficult to understand when you realize that all the optional components of a type parameter that are left out in the source code are included in the signature. The first part of the signature (within the angle brackets '<...>') is just the list of type parameter definitions for the class. These each take the form of a type parameter name followed by the field descriptors for the class bound and interface bounds (if any) of the type. Each field descriptor is preceded by a ':' character. Because the Listing 2 source code doesn't specify any bounds for the class type parameters, the only bound present for each is the default class bound of java.lang.Object.

The second part of the class signature (following the closing angle bracket) gives the superclass and superinterface (if any) signatures. In the Listing 2 case, no superclass is specified, so the signature gives the superclass as just java.lang.Object. A superinterface is specified, as Iterable<T>. It shows up in the signature pretty much as you'd expect to see, except that where the source code has just "<T>," the signature uses "<TT;>." The reason is that the signature needs to distinguish between class names and type variable names; the leading "T" identifies what follows as a type variable name, while the trailing ';' just marks the end of the name.

The field and method signatures from Listing 2 make use of the same type of variable format as seen in the superinterface signature, but aside from that, they don't show anything new.


Generics in ASM

As I've explained in some earlier articles in this series (see Resources for links), ASM uses a visitor approach to working with binary class representations. This visitor approach is bidirectional: You can parse an existing class, resulting in a sequence of calls to your handler visitor methods for the components of the class, or you can make the same sort of sequence of calls to the visitor methods of a class writer to generate a binary class representation. This parser/writer symmetry makes ASM especially convenient for situations where you're only modifying certain aspects of a class -- you can base your handler for the class parser events on a class writer, only overriding the base writer handling for the events you want to change. Both the parser (or reader) and writer are also very useable as stand-alone components

ASM 2.X provides full support for Java 5 JVM changes, including reading and writing signatures. The basic handling of signatures is automatic, using values passed directly to the appropriate visitor methods. In addition, ASM 2.X also adds support for parsing the (sometimes complex) signature string encoding to interpret the details of the signature. In holding with the basic ASM philosophy, the same interface can also be used for a writer to generate signature strings on demand. In this section, I'll show how ASM handles both the basic signatures as-a-text-blob and the detailed parse.

Signatures for all

The signature-as-a-text-blob handling in ASM is built directly into the basic class, field, and method visitor calls. Listing 3 shows the relevant methods from the org.objectweb.asm.ClassVisitor interface:

Listing 3. Class, field, and method visitor methods
public interface ClassVisitor
{
    void visit(int version, int access, String name, String signature,
        String superName, String[] interfaces);
        
    FieldVisitor visitField(int access, String name, String desc,
        String signature, Object value);
        
    MethodVisitor visitMethod(int access, String name, String desc,
        String signature, String[] exceptions);
    ...
}

Each of the visitor methods shown takes a signature string as a parameter. If the corresponding class, field, or method is not generic, a null value is passed when calling the method.

Listing 4 shows the signature-related methods in action. Here I've implemented a visitor class using the org.objectweb.asm.commons.EmptyVisitor class as a base, so that I only need to override the methods I want to use. The supplied method implementations just print out the signature information for the class as a whole and the descriptor and signature information for each field and method seen in the class. The bottom of Listing 4 shows the output generated when this visitor is used with the full version of the Listing 1 DirInfo class:

Listing 4. Signature-related method in action
public class ShowSignaturesVisitor extends EmptyVisitor
{
    public void visit(int version, int access, String name, String sig,
        String sname, String[] inames) {
        System.out.println("Class " + name + " signature:");
        System.out.println(" " + sig);
        super.visit(version, access, name, sig, sname, inames);
    }

    public FieldVisitor visitField(int access, String name, String desc,
        String sig, Object value) {
        System.out.println("Field " + name + " descriptor and signature:");
        System.out.println(" " + desc);
        System.out.println(" " + sig);
        return super.visitField(access, name, desc, sig, value);
    }

    public MethodVisitor visitMethod(int access, String name, String desc,
        String sig, String[] exceptions) {
        System.out.println("Method " + name + "() descriptor and signature:");
        System.out.println(" " + desc);
        System.out.println(" " + sig);
        return super.visitMethod(access, name, desc, sig, exceptions);
    }
}

Class com/sosnoski/generics/DirInfo signature:
 null
Field m_files descriptor and signature:
 Ljava/util/List;
 Ljava/util/List<Lcom/sosnoski/generics/FileInfo;>;
Field m_directories descriptor and signature:
 Ljava/util/List;
 Ljava/util/List<Lcom/sosnoski/generics/DirInfo;>;
Field m_lastModify descriptor and signature:
 Ljava/util/Date;
 null
Method <init>() descriptor and signature:
 (Ljava/io/File;)V
 null
Method getDirectories() descriptor and signature:
 ()Ljava/util/List;
 ()Ljava/util/List<Lcom/sosnoski/generics/DirInfo;>;
Method getFiles() descriptor and signature:
 ()Ljava/util/List;
 ()Ljava/util/List<Lcom/sosnoski/generics/FileInfo;>;
Method getLastModify() descriptor and signature:
 ()Ljava/util/Date;
 null

Analyzing signatures

Besides working with signatures as strings, ASM also supports working with signatures at the detail level. The org.objectweb.asm.signature.SignatureReader class parses a signature string, generating a sequence of calls to an org.objectweb.asm.signature.SignatureVisitor interface. The org.objectweb.asm.signature.SignatureWriter class implements the visitor interface, building up a signature string from a sequence of visitor method calls.

The detailed level interface is unfortunately somewhat complex, but that's because of the complexity of the signature definitions rather than any poor handling in the ASM code. The SignatureVisitor interface shows this complexity, defining 16 separate method calls that may be involved in processing a signature. Of course, most signatures will only use a small portion of these methods.

To illustrate the ASM detailed signature handling, I'll show the methods called by parsing some of the signatures discussed earlier in this article. Listing 5 gives a partial listing of the TraceSignatureVisitor class I wrote for this purpose, along with an AnalyzeSignaturesVisitor to drive the signature processing. When an instance of AnalyzeSignaturesVisitor is used as the visitor for a class, it creates a SignatureReader for each signature found, passing an instance of the TraceSignatureVisitor class as the target for the signature component visitor calls. The SignatureReader call used for parsing the signature depends on the form of the signature: For class and method signatures, the appropriate method is just accept(); for field signatures, use the acceptType() call.

Listing 5. Analyzing signatures
public class TraceSignatureVisitor implements SignatureVisitor
{
    public void visitFormalTypeParameter(String name) {
        System.out.println("  visitFormalTypeParameter(" + name + ")");
    }

    public SignatureVisitor visitClassBound() {
        System.out.println("  visitClassBound()");
        return this;
    }

    public SignatureVisitor visitInterfaceBound() {
        System.out.println("  visitInterfaceBound()");
        return this;
    }

    public SignatureVisitor visitSuperclass() {
        System.out.println("  visitSuperclass()");
        return this;
    }

    public SignatureVisitor visitInterface() {
        System.out.println("  visitInterface()");
        return this;
    }

    public SignatureVisitor visitParameterType() {
        System.out.println("  visitParameterType()");
        return this;
    }
    ...
}

public class AnalyzeSignaturesVisitor extends EmptyVisitor
{
    public void visit(int version, int access, String name, String sig,
        String sname, String[] inames) {
        if (sig != null) {
            System.out.println("Class " + name + " signature:");
            System.out.println(" " + sig);
            new SignatureReader(sig).accept(new TraceSignatureVisitor());
        }
        super.visit(version, access, name, sig, sname, inames);
    }

    public FieldVisitor visitField(int access, String name, String desc,
        String sig, Object value) {
        if (sig != null) {
            System.out.println("Field " + name + " signature:");
            System.out.println(" " + sig);
            new SignatureReader(sig).acceptType(new TraceSignatureVisitor());
        }
        return super.visitField(access, name, desc, sig, value);
    }

    public MethodVisitor visitMethod(int access, String name, String desc,
        String sig, String[] exceptions) {
        if (sig != null) {
            System.out.println("Method " + name + "() signature:");
            System.out.println(" " + sig);
            new SignatureReader(sig).accept(new TraceSignatureVisitor());
        }
        return super.visitMethod(access, name, desc, sig, exceptions);
    }
}

Listing 6 shows the output generated when the AnalyzeSignaturesVisitor class is used to visit the DirInfo class from Listing 1:

Listing 6. DirInfo code and signatures analysis
public class DirInfo
{
    private final List<FileInfo> m_files;
    private final List<DirInfo> m_directories;
    ...    
    public List<DirInfo> getDirectories() {
        return m_directories;
    }
    public List<FileInfo> getFiles() {
        return m_files;
    }
    ...
}

Field m_files signature:
 Ljava/util/List<Lcom/sosnoski/generics/FileInfo;>;
  visitClassType(java/util/List)
  visitTypeArgument(=)
  visitClassType(com/sosnoski/generics/FileInfo)
  visitEnd()
  visitEnd()
Field m_directories signature:
 Ljava/util/List<Lcom/sosnoski/generics/DirInfo;>;
  visitClassType(java/util/List)
  visitTypeArgument(=)
  visitClassType(com/sosnoski/generics/DirInfo)
  visitEnd()
  visitEnd()
Method getDirectories() signature:
 ()Ljava/util/List<Lcom/sosnoski/generics/DirInfo;>;
  visitReturnType()
  visitClassType(java/util/List)
  visitTypeArgument(=)
  visitClassType(com/sosnoski/generics/DirInfo)
  visitEnd()
  visitEnd()
Method getFiles() signature:
 ()Ljava/util/List<Lcom/sosnoski/generics/FileInfo;>;
  visitReturnType()
  visitClassType(java/util/List)
  visitTypeArgument(=)
  visitClassType(com/sosnoski/generics/FileInfo)
  visitEnd()
  visitEnd()

The first block of output lines in Listing 6 shows the visitor methods called in analyzing the m_files signature, Ljava/util/List<Lcom/sosnoski/generics/FileInfo;>;. The first method called is visitClassType("java/util/List"), giving the base class for the field. Then visitTypeArgument("=") says that an actual type is being supplied for the type parameter of the current class (java.util.List), and visitClassType("com/sosnoski/generics/FileInfo") says that the actual type is based on com.sosnoski.generics.FileInfo. Finally, the first call to visitEnd() closes the open FileInfo class signature, and the second closes the open List class signature.

As you might guess from looking at the sequence of visitor method calls, some of these calls effectively open a new context for an embedded type signature component. The methods in the SignatureVisitor interface that return a SignatureVisitor instance all have this effect. The interface instance returned by the method call (which may be the same as the instance being called, as in the Listing 5 code, but can be different) is then used for processing the embedded type signature. It's easy to change the Listing 5 code to show the nesting of child signatures with indenting, and the file download has the code with this change. Rather than go through the code in detail here, I'll just show what comes out. Listing 7 gives the (partial) generated output from the indenting version of the code when run on the Listing 2 PairCollection parameterized class:

Listing 7. PairCollection code and signatures analysis
public class PairCollection<T,U> implements Iterable<T>
{
    /** Collection with first component values. */
    private final ArrayList<T> m_tValues;
    
    /** Collection with second component values. */
    private final ArrayList<U> m_uValues;
    ...
    public void add(T t, U u) {
        m_tValues.add(t);
        m_uValues.add(u);
    }
    
    public U get(T t) {
        int index = m_tValues.indexOf(t);
        if (index >= 0) {
            return m_uValues.get(index);
        } else {
            return null;
        }
    }
    ...
}

Class com/sosnoski/generics/PairCollection signature:
 <T:Ljava/lang/Object;U:Ljava/lang/Object;>Ljava/lang/Object;Ljava/lang/Iterable<TT;>;
  visitFormalTypeParameter(T)
  visitClassBound()
   visitClassType(java/lang/Object)
   visitEnd()
  visitFormalTypeParameter(U)
  visitClassBound()
   visitClassType(java/lang/Object)
   visitEnd()
  visitSuperclass()
   visitClassType(java/lang/Object)
   visitEnd()
  visitInterface()
   visitClassType(java/lang/Iterable)
   visitTypeArgument(=)
    visitTypeVariable(T)
   visitEnd()
Field m_tValues signature:
 Ljava/util/ArrayList<TT;>;
  visitClassType(java/util/ArrayList)
  visitTypeArgument(=)
   visitTypeVariable(T)
  visitEnd()
Field m_uValues signature:
 Ljava/util/ArrayList<TU;>;
  visitClassType(java/util/ArrayList)
  visitTypeArgument(=)
   visitTypeVariable(U)
  visitEnd()
Method add() signature:
 (TT;TU;)V
  visitParameterType()
   visitTypeVariable(T)
  visitParameterType()
   visitTypeVariable(U)
  visitReturnType()
   visitBaseType(V)
Method get() signature:
 (TT;)TU;
  visitParameterType()
   visitTypeVariable(T)
  visitReturnType()
   visitTypeVariable(U)

The Listing 7 output shows how nested type definitions are used within the parsed signatures. In the case of the class signature, the nesting even goes two levels deep -- the class signature includes an interface signature that the class must implement, and the interface signature includes a type argument signature (which is just the type variable "T" in this case).


Going further with ASM generics

In this article, I've gone through the basics of how generics information is stored in the binary class representation and how it can be accessed using ASM. Next month, I'm going to finish my coverage of generics with a recursive data structure analyzer built around ASM. Starting from an initial class, the analyzer chains through all referenced classes, handling substitution of generic types as it goes. The end result is a data structure that reflects all the information you can deduce through the use of generics.


Download

DescriptionNameSize
Source codej-cwt02076-source.zip11KB

Resources

Learn

Get products and technologies

  • ASM: Get the ASM Java bytecode manipulation framework.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=103107
ArticleTitle=Classworking toolkit: Generics with ASM
publish-date=02072006