March 22, 2012 | Written by: Fang Feng
Share this post:
Read also Part 1 and Part 2 of this interview.
FF: How do you set up your firewall between your portal and your front-end server? Normally we have our portal behind a firewall, separated from the front-end web server, and another firewall separating the portal and the back-end systems.
PK: You can set up DMZs, and you can set up firewall layers. You can set up mechanisms at the operating system level, such as IP tables, to allow only certain IPs going through the connections.
You could set up firewalls, as you would set up DMZs on-premises and have your web server somewhere, and then maybe a reverse proxy in another layer; you can set up that entire DMZ infrastructure using the cloud services.
FF: Oh okay. Thanks for the explanation.
PK: So in that sense, you can protect your portal server, which typically you wouldn’t have. We are discussing this in the context of infrastructure is a service (IaaS), right?
We mentioned that at least from the images we provide, a single machine comes up, and that is your web server, your database, and your portal server.
So there is not really very much flexibility in breaking that image apart and inserting a firewall between your web server and your portal server because it is one image.
But that’s when you start talking about more advanced techniques of how to use even the existing stuff we have today. There’s a concept of image partitioning, where potentially I launch three portal server AMIs or three portal server images on IBM SmartCloud and – this would be a manual deployment kind of a thing – only activate certain components: only activate the web server on one of them, only activate the portal server on one of them, only activate the data base component on the other one.
And so there you have the manual creation of starting to see a platform of WebSphere portal versus just the infrastructure. But that’s going to be manual.
FF: Is such a configuration still in development? Is there such a thing mature in cloud offerings on the network?
PK: It depends on your definition of mature, mature being provided by IBM, an entire stack provided by IBM that has the security and it has the firewall. No there’s nothing. We provide the base image, and that’s useful for certain things.
For development and test, it’s very useful to have a single instance that you can trigger and start up in five minutes, and have it configured just as you would have every other developer’s system configured. And maybe it doesn’t have to host more than one or two requests concurrently, right?
So in that case we have it. We have that solution – that’s if you want to call that mature, we can call it mature.
Now that can be interesting, but it gets more interesting trying to host multiple concurrent users in a production type of system.
For that, we can go out to some of our Business Partners who will use our base images in a manner similar to what I described and they will build the system up based on your business requirements, because it’s unlikely that two different customers are coming in and say they want this exact setup.
We might be able to generate a couple of templates. We’ll provide a base; a large number of our Business Partners are starting to really take advantage of the images and doing something more complex than just infrastructure service with them, including content generation.
Some content comes with the out-of-the-box Web Content Management (WCM). And, you know, that’s useful for doing demonstration types of things, but nobody’s going to use our out-of-the-box content for their production site. It’s up to our Business Partners or customers to generate content.
We would say, oh by the way, we’ll host it for you. All you’ll have to do is get a monthly bill from us and we’ll consolidate how ever many image-hours you have. We’ll consolidate whatever it costs us to do the content management.
So it’s a good play for Business Partners to get into, because they can have value to a customer by saying: “You don’t have to know anything about portal, or WCM, nothing. You just tell me about your marketing plans and all of these other things that you would like to have surfaced and we’ll host it, and, by the way, this is how much the cost will be.”
FF: The next question is another one about the public cloud offering. The enterprise users often want their single sign-on (SSO) solutions to work with their other enterprise application, using their corporate directory like IBM Bluepages. After you log in to IBM Bluepages, you want to go to Amazon to access this portal without being challenged again. How does this federated SSO work in this scenario? Is there any API available?
PK: Is there an API? I mean you probably know, just based on your background and security knowledge about the levels of single sign-on (SSO) Portal and WebSphere Application Server can take.
Now the reason I’m pausing is because some function is coming in version 8 that’s related to a single sign-on with Internet social services (OpenID and Facebook integration).
From that sense, what we can integrate, and have federated single sign-on solutions with other providers, using protocols like, OAuth, Open ID, and so on, to implement enterprise level single sign on, is one of those things that we’ll have to work together if people are interested in seeing an SSO solution. Everyone is facing the same issue.
FF: Yes. On LotusLive, they have SAML and its token. So we can basically use the same thing in this portal environment?
PK: Does portal accept SAML tokens? I don’t think it does. SAML assertions, it might. So that would be dependent on the underlying WebSphere stack.
FF: Right, but can this be implemented as a new Trust Association Interceptor (TAI)?
PK: Yes, yes, yes you would have to have something built in WebSphere Application Server to handle the SSO token of some sort. Portal’s not going to implement the acceptance of SAML tokens if WebSphere isn’t going to accept them. So we rely on WebSphere security.
Whatever WebSphere Application Server can do, if they can accept SAML tokens. This is where I’m drawing a blank whether they can or not; I want to say they would be able to, but I can’t say that for certain.
FF: The Open ID TAI is written by portal development, not by WebSphere.
PK: Yes. That’s true. That is true.
FF: We can still have some flexibility and develop our own TAI in a sense. Correct?
PK: Yes, you could. So there are ways to do it, but, again, every situation might be different depending on what you’re integrating with, depending on what role a Portal is playing, whether it’s the identity provider or just a service provider.
There’s nothing that we do today out-of-the-box, outside of the normal single sign-on stuff that WebSphere provides.
FF: Okay. So there are still a lot of opportunities for Portal development or even for Business Partners.
PK: Yes. That’s true.
FF: I saw that you had Open Mic early this year. You spent quite a few slides talking about the cloning and the farming in Amazon cloud. Is that possible in IBM’s own offering or is that something we still have in development?
PK: Farming isn’t anything magical and it’s not even directly related to the cloud. You can do a farm on-premises with a couple of your own Linux boxes that are maybe 10 years old.
You have Linux installed on all of them and farming is just another way of doing WebSphere clustering that’s not related to management of nodes. So there’s no deployment manager to manage all of your cluster members. There’s no node agent that runs with your JVMs that are hosting WebSphere portal.
It’s basically a bunch of individual portal server nodes that share the same database domains. So there’s a single database server or multiple database servers but each one of these farm clients is accessing the same databases for the release and other domains.
So they’re tied together in that sense but they’re not cluster members as they would be termed in WebSphere terms.
They’re really just identically configured portal server instances, standalone JVMs, and then in front of them we put a web server IBM HTTP Server, or whatever, with the web server plug-in.
And then, the plug-in is manually updated to say “okay it’s a cluster and it’s treated like a cluster from the WebSphere plug-in perspective.”
So all the things, such as Session Affinity, are honored; all of those things happen with the farm but we just have to update the plug-in manually with all of our individual farm members.
The Portal farming setup we did on Amazon cloud can also be done on-premises, or on IBM SmartCloud. All involved were just some scripts that control Portal start and stop at the right time based on system load.
There’s a function built into the portal server configuration tasks called enable-farm-mode that allows you to create a farm more easily.
We cannot go into too many details on what that enable-farm-mode does. Again back to the Amazon. The reason we did it on Amazon is to take advantages of its identically configured nodes, which quite suit the farming architecture. When I say identically configured, you can naturally make a conclusion that virtualized systems might be very well suited for farming because you have a virtualized image that represents some state of Portal run-time and configuration.
But I can launch that instance, that image, five or six times and I have identically configured machines. And one of the characteristics of a farm is that in every node, Portal is identically configured.
So it’s a fit. It suits very well with virtualized environments. And along those lines, the benefits are that instead of having to increase my cluster size by federating new secondary nodes, I can very quickly bring up an individual portal server node within five or six minutes of having my JAVA Virtual Machine (JVM) running as opposed to the time it might take to federate that into a formal WebSphere Application Server cluster.
It allows for more quick growing and shrinking of the cluster size. So, the farm can grow and shrink much faster than a WebSphere Application Server cluster can.
Amazon has some functions that allow us to monitor and to trigger under specified conditions, so that we can automate the addition or deletion of farm nodes.
And I think probably a year and a half or two years ago we went down that path. But that demo was done on Amazon to showcase not only farming, which was an alternative type of clustering, but also to show how virtualized environments can really help with farming and how they kind of play in together. At the same time, they’re not tied together where you have to be in the cloud to be a farm.
FF: Do we have similar auto-scale on our own cloud offering?
PK: We don’t have anything called auto-scale but you can accomplish the things that auto-scale does. Really, auto-scaling is just monitoring, getting a threshold and then performing a trigger.
So you could do that with a couple of shell scripts that execute “top” command and check for CPU and memory usage/utilization, and then a little bit of calculation determining that the CPU has been at 80 percent for two and half minutes, go do something else, and then something else could call into the IBM SmartCloud APIs and launch for me a new instance of this specially configured farm client.
And then, that new instance would be booted. We have scripts that, as I said, specially configure; they’re boot up scripts that say: “All right farm client, now that your operating system is up and running, connect up to my shared file system that contains the portal server stuff and everything you need to start a JVM.”
And, there’s another thing that happens after the portal server starts up and that communicates with the web server that’s front-ending this whole farm.
And all these are scripts, which are available from the wiki – we publish that Amazon farming demo on our IBM wiki. We included a samples.zip file that has all of those scripts.
There’s a script to check for updates for the web server to say “I have a new farm member.” We have that written out.
There are scripts to determine what to do when you start up a new farm client. And these farm clients are just operating system based images that can communicate to the file server through NFS. That script might be a couple of mount commands. So we have scripts available on the wiki.
FF: Do all those instances or virtual machines basically already exist there, and when you need them, you trigger them to run?
PK: You can do it a lot of ways. The way we did it in the demo was those images weren’t even in existence and you weren’t charged for them. That’s one of the big pieces.
Because they weren’t even booted up you’re not being charged for them until the auto-scale or your monitoring scripts determine that you need it.
Then it will boot up. You’ll start to incur infrastructure charges or even licensing charges depending on which model you’re on. But then the other side of that trigger is now saying: “I’ve now gotten farm members and my load has all of a sudden gone to nothing. I don’t want to be charged per hour for ten VMs so shrink them back down.” So you can just very quickly kill off these farm members instead of defederating them from your WebSphere Application Server cluster because managing clusters is a very administrative-intensive operation.
There are benefits of using Portal farming, compared to cluster architecture, especially when you’ve got high volatility in your load. In farming mode, the number of Portal nodes can grow and shrink depending on the resource usage, suitable for highly volatile environment and systems.
FF: So farming is actually based on the new WebSphere Application Server model or is it only a Pure portal thing?
PK: It’s only “certified” for use in Portal and there are several different farm models that you can choose from.
In Portal 6.1 days, you can do a Portal farm. It’s a little bit trickier because of some database domain dependencies.
You have to do individual installations for each of your farm numbers, and that means you have to have dedicated release domains for each of your farm members.
So the newer way that we actually try to steer customers towards is the shared file system farming pattern, where you have a single installation and you have all of your farm clients communicating through Network File System (NFS) to that shared installation and the shared databases.
That way is much simpler. And so, I would not recommend farming unless you’re on Version 7 because that’s where we officially support the shared file system model.
In order to accomplish file sharing, we had to go to WebSphere development team and told them we wanted to do this new model of sharing the WebSphere Portal profile (wp_profile), because sharing of that profile across multiple machines wasn’t a supported type of operation.
We found a way, and Marshall (Lamb) did this a while ago; we went to the WebSphere Architecture Board and told what we wanted to do.
Yes we understand that there are certain things that will have contention if you’re trying to have multiple machines access the WP profile and potentially even write to it.
And from that discussion at the Architecture Board, they approved it for us to start looking at the shared file system model in Version 7.
And that’s where our enable-farm-mode task came from, because if you’re familiar with the WP profile, things in there such as configuration files, temp directories, and log files.
All of a sudden, if you have ten different JVMs sharing that same file system you’re going to have problems when it comes time to writing log files.
The configuration task enable-farm-mode is offloading some of those directories, some of those sensitive directories. We call those mutable directories where the local JVMs can write their log files to somewhere on that local system.
But, to do that, we have to reconfigure the main profile to say “okay don’t write logs under wp_profile anymore.” They will be written to another directory, maybe called system temp.
So we call it system temp where we’re offloading some of those mutable directories, some of those temp file directories, some of those places that we hold compiled JSPs, for example for servlets and JSPs.
All of those things that need to be accessible on each JVM, and need to be written by the JVM, are configured in the wp_profile, and the shared files then point to somewhere on the local system on each farm client.
So it took a little while and that’s, I think, originally why WebSphere didn’t support sharing those profile directories, because as soon as two people tried to write them at the same time, you got into trouble.
So we said okay we just don’t support that. And we asked “well if we did this and we were able to massage the configuration a little bit and massage the locations of certain things would you allow it?” And they said okay yes. And so the server farm was born.
FF: So will there be any more improvement in Portal Version 8 in this area?
PK: I don’t know how much improvement there will be. There’s not much to improve. What we’re going to do is a little bit more testing on more complex scenarios.
There are certain things that farming doesn’t allow you to do because it’s not a managed node. You’ve got caches that might exist in one JVM and maybe that cached content was overridden by an authoring or a syndication event. How does that non-managed DynaCache instance on farm Node 6 know that the content got updated? There’s no way of triggering that.
So there are considerations. We have a bulleted list of things that aren’t really well suited for the farming topology. It’s very easy for you to get a single installation, and grow and shrink; you get all these benefits but at some cost. Farming does certain things that it’s not well suited for. Those are the things that we’re trying to understand a little bit better and come up with procedures.
There are ways around that cache and validation thing today. But more along the lines of how you create golden topologies in a farm, how you do maintenance in a farm, how you establish 24×7 continuous availability in a farm, or how you handle small updates and the ripple effect to the farm members, how you send those signals out to each of those JVMs to indicate something needs to be reloaded.
A lot of those things we’re working on in Version 8 as far as use cases. So we may not see improvements necessarily in the product but we might see improvements in the documentation around scenarios of farming.
That’s where I see a lot of the improvement. Whatever errors we found in version 6.1 and 7 regarding farming, we’ll try to fix them and make the function work correctly in version 8.
FF: Do you see any benefits using cloud computing model for portal release cycles? You know, when we use the Release Builder, we always say you need to have the pipeline type of systems from testing, development, integration or quality assurance, staging, and then to production, to create various phases to push out your updates. Would portal in cloud benefit this release procedure, or anything you can really benefit from cloud?
PK: The benefit that you will get from cloud would be the same benefit that you would get from virtualization. The pattern of going from staging development and everything to production it’s still likely to be almost the very same process.
The benefit though is it’s used with virtualization; you can get snapshots and you can create your own kind of frozen state and with that not only get consistency across those phases. You can have certain images for development and some for staging. The systems are captured at the image level. This way can prevent a lot of errors during configuration changes, because when the image is messed up, it can be thrown away and you can start over with the same initial image.
Virtualization helps you with that because you captured that state, that configuration change and then it can be propagated across your area, your different stages more effectively.
The benefits there? I wouldn’t say that’s a benefit of cloud though as much as it is just virtualization, the nature of capturing that state in the image.
But also from a manner of backing up and restoring. You know, if you have a system, a production system and I apply 6.5 to it and all of a sudden your site is going down, but if you had 6.4 captured in an image or a set of images that were very quickly able to be re-launched then you’ve got a very quick rollback plan.
FF: So this really benefits the disaster recovery (DR) type of system, right?
PK: Benefit yes, DR, but again it’s not cloud-specific. It’s virtualization-specific.
FF: Virtualization, yes. Well the data is actually independent of these captured images right?
PK: Yes. Part of your operation would have to make sure that Portal configuration and file structure are preserved in the virtual machines. The corresponding data should be backed up separately. The VMs should be mapped to the database backups.
You know, you back up your database in a different manner than you might back up your file system. And your application data might have different requirements for backup as well.
So all of them have to be considered. You don’t get anything magical with the cloud that says all of a sudden you don’t have to worry about backups. You still have to worry about backups. And you still have to worry about backups correctly.
FF: Okay. Alright, last question. I hope this is it. Where are we in five years – how are you going to grow Portal in the cloud environment? Probably everything will be on cloud?
PK: We talked infrastructure as a service already trying to move into platform as a service.
I see people caring less about what their topology consists of a software stack. I think they care less about what components make up the software stack, whether that’s portal server, WebSphere, DB2.
I see them worrying less about those and worrying about just their business applications, worrying just about the things that they’re providing to their customers.
So if I’m a banker, I shouldn’t be a WebSphere shop and have 25 WebSphere developers or administrators and five people working on my banking application. I should have 25 people working on my banking application and maybe have five WebSphere administrators.
But ideally, make that one person who manages the deployment, and then that one person knows where to surface, where to access, and where to provide the banking application, and how to define it to manage the SLA, response times, and all those things. And then that person can say “okay deploy my application, my banking application.”
If your business is a non-IT industry, your employees should concentrate on your own business and your customers, not spend too much on tuning Portal, which should be done by cloud vendors, or by IBM and its Business Partners. Let IBM tune your WebSphere.
So for that it’s a shift because today we have a lot of people who know how to cluster WebSphere. And really if you can try to offload that and let those people focus on what your business is running on, that gets you towards some of the things that are growing in IBM Workload Deployer, like virtual systems.
To a certain degree, virtual systems help you abstract a little bit of that topology. You can draw five cluster members instead of managing and installing five cluster members, right?
There’s a gap to go from doing the native installation yourself and managing the clusters yourself.
But virtual systems is one aspect of that. Virtual applications is the next step above that where I’m not even worried about drawing five WebSphere clusters. All I’m working on is my application, is my banking application, my banking database.
And then I offload that somehow to some platform. I don’t even care what the platform is as long as it never goes down, as long as it has this response time, and as long as my data is safe and secure. As long as all these criteria are met, I don’t care how you host it or who hosts it.
And that is something that one of our vice presidents was talking about when addressing customers and explaining the complexity of IT, from mainframe, to client/server model, and now to cloud. IT has become so complicated; companies have to hire so many IT workers. It’s really complex to manage to manage all of these client/server types of machines. You have to know portal server, WebSphere, commerce — all of the pieces that make up your business servers. You must have all these people who know how to draw and connect all these pieces, when really, IBM is supplying all these pieces.
IBM should know how to draw those together. So why don’t we just create a stack for you that we can host whatever you want to host on it and we’ll take care of drawing the connections and we’ll give you that platform, whether that’s platform as a service in the cloud or platform as a service hosted somewhere.
That’s a big paradigm shift back to complete systems, maybe not necessarily be mainframe, as opposed to disparate components that we’re making customers push together.
So that’s kind of a generalized term but that’s where I see Portal kind of playing into the cloud. But it’s not just Portal. It’s Portal and everything else that’s built on top of Portal and then it’s everything that portal is built on top of.
FF: I can see the IT department of every major enterprise gets drawn into it, their IT cost drastically reduced.
PK: Right. That would be great. People are not dropped down and fired. But dropped down and moved into making, contributing more to the business, contributing more to the benefit between a bank and its customer. You know, they’re in that mode now where they’re giving a good response time to the Web site but they’re doing that at the cost of making their application and the things that their customers can take advantage of.
Maybe there’s a better allocation of those resources that can be gained by offloading some of this stuff from systems management to application-centered things. Because that’s kind of where platform as a service and software as a service comes into play in my mind.
FF: Great. Alright. I think this is a great discussion. Our readers will benefit a lot from your insights and expertise. Thank you very much.
About Paul Kelsey
Paul Kelsey works in the WebSphere Portal Server and Lotus Web Content Management Software Group Division at IBM, serving as the development lead for reducing total cost of ownership for Portal and WebSphere deployments. In this role, virtualization, public and private cloud computing, and alternative multi-node and multi-tenant topologies are explored to ensure customers get the best value in their hardware and software investments.