Smarter collaboration for the education industry using Lotus Connections, Part 2: Optimize assignment of students to research projects

Use an IBM ILOG optimization model with data from a relational database

This article describes how to build a web application that helps to optimize assignment of students to research project positions. The assignment process retrieves data from a remote DB2 database and uses ILOG CPLEX to perform the optimization. A web browser based client interacts with the application from IBM Lotus® Connections through iWidgets.

Anton Zorin, Software Engineer, IBM

Photograph of Anton ZorinAnton Zorin has been an IBM Software Engineer since 2006. His primary focus is in the areas of web development, linear optimization, and text analytics. He's also been involved in the development of multi-ware systems, image processing, and reconfigurable computing.



Carl Osipov, Software Architect, IBM

CarlCarl Osipov is an experienced software architect with the Strategy and Technology organization in IBM Software Group. His skills are in the area of distributed computing, speech application development, and computational natural language understanding. He has published and presented on Service-Oriented Architecture and conversational dialog management to peers in the industry and academia. His current focus is on design of reuse techniques for composite business services.



08 February 2011

Also available in Russian Japanese Portuguese

Introduction

This article describes how to build a web application that provides recommendations concerning assignment of students to research project positions. The application retrieves data from a remote database and uses ILOG CPLEX to perform an optimization based on the data. A web interface is used to specify constraints and weight parameters and to represent the optimization result to a user. Optimization calculations are requested by the servlet, which retrieves the optimization process input data from a remote DB2 database, launches the process, represents the calculation result as XML output, and sends the output back to the user. The servlet is installed on the same network node and the same operating system as the instance of the ILOG CPLEX optimization engine.

The first part of the article describes tools and technologies used to build the application. It explains each tool in terms of its general purpose and how the tool is used in the application.

The second part gives an example of an interactive client session and explains how a user can specify the input data and launch the optimization process.

The third part of the article presents a high level overview of the application architecture and explores the roles of the servlet, Java™ API, and ILOG CPLEX optimization engine in the application architecture. This part also describes the Dojo library used to implement the web interface to support user interaction and the data exchange format between the servlet and a browser client.

The fourth part concludes the article with an analysis of time measurements of the optimization process stages.

You can download and view the source code for the solution described in this article (see Downloads for the links).

Environment, tools, and technologies.

Let's start with the tools used within the application.

Figure 1. Application architecture
Diagram of application architecture

Calculations for the optimization process are done by ILOG Optimization Decision Manager Enterprise 3.4 (ODME) (see Resources for a trial download). It is a powerful platform for developing and deploying optimization-based planning and scheduling solutions, featuring collaboration and what-if analysis. The article describes how the application uses IBM ILOG CPLEX (CPLEX), shown on right side of Figure 1, which is one of optimization engines in ODME for creating smarter, resource-efficient plans and schedules. (View a larger version of Figure 1.) The platform supports different application domains by providing interfaces for C++, Java, and .Net programming languages. This article will describe use of the ILOG Java API for integration.

The application is accessed via the internet through HTTP requests. Servlets provide the facilities for the application to handle and respond to HTTP requests. Servlets support a component-based, platform independent method for building web-based applications. For this article, we used WebSphere Application Server Community Edition (WASCE) as the container for the Java Servlets.

Since the application is designed for the optimization engine to process large amounts of data (hundreds of thousands of variables), we need a database to store and efficiently navigate through the data. We chose IBM DB2 as the database management system.

Web browser clients interact with the application using a web interface. For this article, we used IBM Lotus Connections as the runtime environment for web applications. In the application architecture, it is responsible for interaction with the servlet and for delivery of the web interface to clients. Lotus Connections supports iWidget, which is an IBM specification supporting a standard definition of a widget and interoperability across IBM platforms and products (see Resources).


Client session example

The goal of the application is to provide recommendations concerning assignment of students to research project positions (seats). If the number of positions and students is large, then manually generating the assignment recommendations becomes time consuming. The application resolves this issue by automating an analysis that optimizes the recommendations.

The recommendations are generated by project (see Figure 2 for project examples). Each project contains one or more associated positions of different types (for example a genetic scientist or a software engineer); multiple positions of the same type are possible.

The application uses compatibility scores that describe how well candidate students fit the position descriptions, with higher score values corresponding to a better fit. There are many options for how to generate the score, including:

  • Rule based—using a rule prescribing that if a candidate has four years of Java experience and the position description calls for five, add 0.8 points to the score.
  • Graph based—using scores for strengths of connections in a candidate's social and semantic graphs.
  • Text analysis based—measuring the frequency of term co-occurrence in a candidate's resume and job description.

The details of computing the compatibility score is outside the scope of this article. However, other articles in this series describe a text analytics approach that can be used to derive a compatibility score. The application finds the assignment that fills as many seats as possible with the available candidates while trying to achieve the highest possible compatibility score for each assignment. The results are displayed as assignment recommendations to a user.

A staffing specialist is the application user who performs the assignment of students to positions. Before a staffing specialist can start using the application, an administrator must ensure that the application database is populated with information about projects, positions, students, and the compatibility between students and positions.

The staffing specialist starts the application by opening the web browser to the application URL (the one used by this article is http://<wasce_host>/smash_ilog/), where an iWidget renders a list of projects (see Figure 2). The specialist must choose a project to proceed.

Figure 2. Select project screen
Diagram of select project screen

Next, the specialist has an option of clicking on Select seats button to see and modify seats data (see Figure 3).

Figure 3. Select seats
Diagram of select seats

When all modifications are done, the specialist requests optimization by pushing the Optimization result button. The solution tab, as seen in Figure 4, shows the project's seats and the recommended candidates for them from the optimization point of view.

Figure 4. Solution
Diagram of solution

The calculation data tab shows the internal CPLEX variables used in the optimization process (see Figure 5).

Figure 5. Calculation data
Diagram of calculation data

The CPLEX engine log and time statistics are shown in the engine log tab (Figure 6).

Figure 6. Engine log
Diagram of engine log

Application internals

IBM DB2 database structure and integration

The input data used by the optimization process is located in the DB2 database. The structure of the database tables is shown below.

Figure 7. Data for optimization
Diagram of data for optimization

The optimization servlet uses a JDBC connection for interaction with the DB2 instance. Assuming that the servlet container (WASCE in our case) has been pre-configured with jdbc/YOUR_DB_NAME as the DB2 database data source, the servlet creates the connection to the DB2 instance and retrieves data from the database:

Create a connection and retrieve data using a query
Context initCtx = new InitialContext();
Context envCtx = (Context) initCtx.lookup("java:comp/env");
DataSource ilogdb = (DataSource) envCtx.lookup("jdbc/YOUR_DB_NAME");
Connection con = (Connection) ilogdb.getConnection();

long id = 10;
ArrayList<Seat> al = new ArrayList<Seatt>();
String sql = "SELECT * FROM SEATS WHERE PROJECT_ID = " + id;
Statement st = con.createStatement();
ResultSet rs = st.executeQuery(sql);
while (rs.next())
{
	Seat sl = new Seat();
	sl.setId(rs.getLong("ID"));
	sl.setPId(rs.getLong("PROJECT_ID"));
	sl.setName(rs.getString("NAME"));
	sl.setNmb(rs.getInt("COUNT"));
	sl.setAvl(rs.getBoolean("IS_AVAILABLE"));
	al.add(sl);
}
rs.close();
st.close();

The Connection, con, is used to submit SQL requests. The resulting table is represented by the instance of the ResultSet class, rs.

The while loop generates an array of Seat instances whose contents correspond to the rows retrieved from the SEATS table in the DB2 database. The following classes describe the other entities stored in the database and used by the optimization: Project, Seat, Practitioner, Compatibility, and OptimizationModel.

They are instantiated and grouped into collections using corresponding *Peer classes (such as ProjectPeer, SeatPeer, and so on) that perform SQL requests to the database for entity life cycle operations: create, retrieve, update, and delete. For example, the following SQL requests are used to retrieve the data, where id is the project identifier passed to the retrieving method.

To request Project information, this is the query:

"SELECT * FROM PROJECTS"

To request compatibility information, this is the query:

"SELECT * FROM COMP_LOSS WHERE SEAT_ID IN "
			+ "(SELECT ID FROM SEATS WHERE PROJECT_ID = "
			+ id
			+ ")"                       
			+ " ORDER BY SEAT_ID"

To request practitioner information, this is the query:

“SELECT * FROM PRACTITIONERS”.

Each of the *Peer classes returns the results of the retrieval as an ArrayList. The entities in the ArrayList are used for the optimization by building a custom ILOG data source as described in the next section.

Java API to IBM ILOG

The servlet uses the ODME Java API to request the CPLEX engine to do the optimization. From the Java API perspective, the optimization process consists of five stages:

  1. Initialize the environment
  2. Set up a model
  3. Set up the source data
  4. Perform calculations
  5. Clean up the environment

This is a sample of using the API to IBM ILOG in the Java language.

Sample code using the Java API to IBM ILOG
The complete listing should appear here

IloOplFactory.setDebugMode(false);
IloOplFactory oplF_ = new IloOplFactory();
IloOplErrorHandler errHandler_ = oplF_.createOplErrorHandler(System.err);
IloOplSettings settings_ = oplF_.createOplSettings(errHandler_);
IloCplex cplex_ = oplF_.createCplex();
cplex_.setParam(IloCplex.DoubleParam.WorkMem, 512);
String modelText = ModelPeer.getModelByName("tonkawa", con).getValue();
IloOplModelSource modelSource = 
	oplF_.createOplModelSourceFromString(modelText, “tonkawa”); 		
IloOplModelDefinition definition = oplF_.createOplModelDefinition(modelSource,settings_); 
opl_ = oplF_.createOplModel(definition, cplex_);
IloOplDataHandler handler = getDataHandler();
handler.startElement("ilog_var"
handler.addIntItem(10);
handler.endElement();
opl_.addDataSource(dataSource);
opl_.generate();
if ( cplex_.solve() == true)
	opl_.postProcess();
opl.end();
cplex.end();		
oplF.end();

Let's look at this listing by stages. In the first stage, the servlet creates a CPLEX optimization engine instance and some secondary variables including error handling and model settings.

IloOplFactory.setDebugMode(false);
IloOplFactory oplF_ = new IloOplFactory();
IloOplErrorHandler errHandler_ = oplF_.createOplErrorHandler(System.err);
IloOplSettings settings_ = oplF_.createOplSettings(errHandler_);
IloCplex cplex_ = oplF_.createCplex();
cplex_.setParam(IloCplex.DoubleParam.WorkMem, 512);

The last string of the listing sets the amount of memory to be used by the optimization engine. This is done to support optimization for large input data. This concludes the initialization.

The optimization model is set up in the following piece of code:

String modelText = ModelPeer.getModelByName("tonkawa", con).getValue();
IloOplModelSource modelSource = 
	oplF_.createOplModelSourceFromString(modelText, “tonkawa”); 		
IloOplModelDefinition definition = oplF_.createOplModelDefinition(modelSource,settings_); 
opl_ = oplF_.createOplModel(definition, cplex_);

The first line retrieves the text of the model's definition from the database where it is stored under the name tonkawa. The next two lines create the object representing the optimization model input source. The next line creates the model definition, which is an internal representation of the model source. Finally, the last line of the code creates an instance of the optimization model.

In the third stage (set up the source data), the application creates the optimization input variables in the internal ILOG format and attaches them to the model. The application uses a custom data source for a model. It implies allocation of optimization variables in an internal CPLEX format and their assignment before the start of the optimization process.

To use a custom data source, it is necessary to define a class derived from IloCustomOplDataSource and a method customRead(), which creates the variables to be used in the optimization process. Definition of variables is performed by using the instance of IloOplDataHander, which is returned by the invocation of getDataHandler method.

IloOplDataHandler handler = getDataHandler();

A new variable can be defined using the startElement() method of handler:

handler.startElement("ilog_var");

Next, the value is assigned by addIntItem() for an integer or addNumItem() for real values.

handler.addIntItem(10);

After the assignment, the endElement() method finishes the definition.

handler.endElement();

Note: if a variable is an array, it can be handled as follows:

handler.startElement("ilog_array");
handler.startArray();
for(int i=0; i<10; ++i)
	handler.addIntItem(i);
handler.endArray();
handler.endElement();

Arrays can also be nested.

After all of the variables are defined, the data source is attached to the OPL model by invoking addDataSource() method of IloOplModel class:

opl_.addDataSource(dataSource);

After the data source is attached, the optimization problem in internal ILOG format is generated and extracted by the generate() method, which reads data, performs preprocessing, and creates the problem's representation.

opl_.generate();

Now the model can be solved:

if ( cplex_.solve() == true)
	opl_.postProcess();

If the solution has been found, the solve() method returns true and the post processing takes place. Usually post processing is used to aggregate, transform, and format data (including results data) for display or for sending to another application such as a spreadsheet.

To clean up the environment, it is necessary to call the end() method for the instances of the model, the CPLEX engine, and the factory:

opl.end();
cplex.end();		
oplF.end();

The diagram in Figure 8 shows the classes interacting with ILOG optimization engine.

Figure 8. ILOG Optimization Wrapper Classes
Diagram of ILOG optimization wrapper classes

IlogOptimizer class works with the ILOG API and submits the optimization calculations. It instantiates an OptimizationDataSource class which represents a data source for the optimization.

IlogDataRendererXML class retrieves and wraps internal ILOG variables in XML form.

Now that we have seen the servlet interaction with the ILOG optimization engine, let's look at the client side.

Dojo library

The Dojo library is a high-level JavaScript library used for building the client side part of the application. It creates and renders the user interface. Dojo provides two ways of describing user interface items: declarative and programmatic as shown below.

Listing 3: Declarative style
<div id="mainTabContainer" dojoType="dijit.layout.TabContainer">
<div id="tab1" dojoType="dijit.layout.ContentPane" 
	title=" solution ">
  </div>
<div id="tab2" dojoType="dijit.layout.ContentPane"
 	title=" calculation data ">
  </div>
<div id="tab3" dojoType="dijit.layout.ContentPane"
 	title="engine log ">                        
  </div>            
</div>
Listing 4: Programmatic style
var labelStep = new dijit.form.TextBox({value:"step:",            
                                     disabled:true,
                                     style:"width:50px;text-align:right"
                                            });        

var valEBox = new dijit.form.NumberTextBox({value:parmProps.defVal,
                                     constraints:
                                     {min:parmProps.minVal},
                                     style:"width:50px;text-align:right"
                                                });

The application also uses the library to submit requests to and parse XML responses from the optimization servlet.

The client user interface is build from two parts: static and dynamic. The first one is described in XML in declarative style while the second one is generated programmatically in real time. The user interface is enclosed in a stack container widget that contains three panes for project selection, seats selection, and optimization result review (see Figure 2 through Figure 4). The stack container enclosing the panes is static but the pane content is created dynamically.

The following classes create the content and implement interaction with the servlet:

ProjectSelectionItem
SeatSelectionItem
OptimizationResultItem

They are subclasses of the IlogInterfaceItem class that provides functionality for submitting requests to and handling responses from the servlet. Instances of these three classes are aggregated by the IlogInterface class that is instantiated in the Dojo OnLoad function.

IlogInterface also sends the initial request telling the servlet how many items should be displayed along the horizontal and the vertical axes. This request is called from the IlogInterfacedoRequest method. The submission of the initial request and creation of the panes handling classes is performed by the constructor of IlogInterface.

Data exchange between a client and the servlet

Client's messages and servlet's responses are described in Table 1. Sub-requests like upd_seats (that is, update seats) are attached to their high-level requests.

Table 1. Client and servlet interaction
Message IDSub-message IDAttributesServlet’s responseNotes
init  maxNumX, maxNumYEmptyThis message is to set up number of items to display in vertical and horizontal directions
sel_project positionY
List of existing projects in the databaseThe postionY attribute is used for pagination and specifies virtual position within the PROJECTS table of the database
sel_seats  projectId, positionY List of seats for the selected projectThe project to process is specified via the user interface; positionY attribute is used for pagination
sel_seatsupd_seatsids[], counts[], availabilities[]emptyThis message is sent to update the database when some of seats data are changed by a user
opt_result calc_req projectIdSolution, engine variables, statisticsThis message submits the calculation process
opt_result sol_req positionY SolutionUsed for pagination
opt_result var_req name, positionX, positionY (optional) An engine variable’s values This message is used to navigate through the engine variables

Servlet sends data to a client in XML format as XML is simple to create and parse in Dojo.

Example of an XML message returned to the client
<solution objective="3054.3639800807873">
	<position title="Software engineer C++" index="0">
		<candidate index="0">
		John Doe
		</candidate>
		<candidate index="2">
		John Smith
		</candidate>
	</position>
	<position title="Software engineer Java" index="1">
		<candidate index="3">
		Jane Smith
		</candidate>
	</position>
</solution>

Wrapping of one-dimensional data is done by tag <d1> where values are separated by spaces. Multi-dimensional arrays are handled in the following way: each dimension is wrapped with tag dN, where N is a number of the dimension. For example, a two-dimensional array is wrapped as shown below. In this example, the XML message is a response to and upd_var request, requesting values of the compatibility variable.

<item name="compatibility" type="MAP_NUM">
	<d2>
	<d1>
	432.67874830187907 157.5568843778936
	987.0323098531833 294.0221485741561 
	</d1>
	<d1>
	441.28662423809857 606.1236490975754
	117.37416729043002 539.3803607400704 
	</d1>
	<d1>
	579.6263501729665 380.9399328362894
	826.7802763365257 65.91520976844086 
	</d1>
	<d1>
	859.9294772959771 944.3249593145671 
	857.7524070538518 443.84433586006287 
	</d1>
	</d2>
</item>

Servlet Architecture

The servlet architecture basically contains three parts: client request handling, DB2 interaction, and ILOG ODM interaction. The latter two have been described above. The architecture of the client requests handling is depicted below.

Figure 9. Client request handling architecture
Diagram of client request handling architecture

The IlogServiceDB class implements servlet functionality (for example, the HttpServlet interface) and is responsible for handling HTTP GET requests from the user interface client.

The ClientRequestsHandler class handles requests from a client while the client session is active. It dispatches requests to handlers, which are aggregated in the instance of the class by a TreeMap container.

ClientRequestSession is a base class for client request handlers. It establishes and holds an active connection to the database while a request is handled.

The InitRequest class receives the maximum number of items to display for dimensional data and stores these parameters in the static variables of the ClientRequestSession class.

The RequestProjects class retrieves data from the PROJECTS table and sends it to the client.

RequestSeats class does the same thing as RequestProjects for the SEATS table.

OptimizationResultsRequest handles requests to either perform the optimization or to get some of the optimization results. Each request type is processed by a corresponding method.

StatisticsLog prints messages to the specified output and attaches time stamps if needed.

The sequence diagrams in Figure 10 and Figure 11 show how client HTTP GET requests are processed. In Figure 10, the getAttribute method call is used to retrieve a session instance of the ClientRequestHandler class. (View a larger version of Figure 10.)

Figure 10. HTTP GET processing sequence
Diagram of HTTP Get processing sequence

The sequence diagram in Figure 11 shows how the calculation request is handled. The diagram also illustrates the interaction with ODME to perform the optimization. (View a larger version of Figure 11.)

Figure 11. Calculation request processing sequence
Diagram of calculation request processing sequence

Conclusion

The application was tested with different sets of test data, and the measured time characteristics are shown in Table 2.

The calculations always took more time than data extraction and transfer because, in the case of the deployment environment used for the measurements, ODME and the DB2 instance were located in different virtual machines but were hosted by a single physical server.

Table 2. Time statistics for the optimization process
Size (seats x practitioners)Transfer of data to the JVM memoryCalculation
4x4seatscompatibilitiespractitionersTotal 
query execution 00:00,000 00:00,000 00:00,015 00:00,015 00:00,063
data transfer 00:00,000 00:00,000 00:00,000
100x100seatscompatibilitiespractitionersTotal 
query execution 00:00,016 00:00,015 00:00,015 00:00,234 00:00,594
data transfer 00:00,000 00:00,188 00:00,000
1000x1000seatscompatibilitiespractitionersTotal 
query execution 00:00,015 00:00,062 00:00,078 00:10,812 01:35,344
data transfer 00:00,063 00:10,579 00:00,015

Each component of the application along with its main-purpose features provides abilities for integration into a single environment. Due to this fact developers have an ability to build a complex distributed infrastructure in simple and straightforward way. Eclipse IDE can be used as a unified development framework for each of the components. Bundled with WASCE web container it provides powerful capabilities for debugging of Java code.

The article presented a web application that retrieves data from a remote database and uses ILOG CPLEX to perform an optimization.


Downloads

DescriptionNameSize
Client sample source codetonkawa_iwidget_jscript.zip9.71KB
Server sample source codetonknawa_src_java.zip10KB

Resources

Learn

Get products and technologies

  • ILOG ODM Enterprise 3.4: Download a trial version of ILOG ODM Enterprise 3.4.
  • IBM trial software: Evaluate IBM software products in the method that suits you best. From trial downloads to cloud-hosted products, developerWorks features software especially for developers.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into IBM collaboration and social software on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Lotus
ArticleID=621380
ArticleTitle=Smarter collaboration for the education industry using Lotus Connections, Part 2: Optimize assignment of students to research projects
publish-date=02082011