[cciug] Summary : License host problem

From: Philippe Chevalier (chevalie@art.alcatel.fr)
Date: Thu Jul 12 2001 - 06:46:52 EDT


Hello,

I posted a question here some time ago.

After nearly 5 months, here's the final answer from Rational to
the problem we experienced here about license_host file problem.

If anyone had the same problem, here's the non-solution provided.

[Summary of the problem]

In the server /var/adm/atria/log/view_log i have the following relevant type
of messages :

02/23/01 12:40:06 view_server(12669): Error: Unable to open file
"/var/adm/atria/config/license_host": Resource temporarily unavailable.
02/23/01 12:40:06 view_server(12669): Error: You do not have a license to run
ClearCase.
02/23/01 12:40:06 view_server(12669): Error: Your license server is not
specified.
Create "/var/adm/atria/config/license_host" and put the license server
hostname in it.

There's *is* a /var/adm/atria/config/license_host file present locally in
each of the view servers and it contains the license server hostname. So why
can't it open it ??

[Answer from Rational (customer support name removed)]

"What is happening is deeply linked to the way a ClearCase view server
works. This is a bit technical, but I am putting it here so you are
fully informed.

The issue is with the view_server code, there is one view_server
process for every view. This code uses a system call, fopen() when
accessing some files on the view server machine, for instance, it
uses this call for reading the /var/adm/atria/config/license_host
file.

The fopen() call returns a socket number every time it is used by a
process, this is a 8-bit integer, giving a maximum of 256 simultaneous
connections to files for each view_server process.
The socket is not closed immediately after use, it is allowed to time
out (this timeout should be 2.5 minutes, 150 seconds). Engineering
appear to believe that something is stopping this timeout occurring
when it should.

This form of the fopen() call is used extensively by the current
view_server code. In particular a new socket is created every time a
view accesses a different vob for a unique user. Imagine the
following: You have an integration view, used by three different
users, which is used to build your code for testing. Each build run in
the view accesses code in 20 different vobs as it builds. In that
case, if all three users do a build, 60 fopen() calls will have been
made, and 60 sockets will now be tied up for that view.

As you can see it is very easy for the maximum number of sockets to be
reached. Since the licence_host file is the most frequently read file
with this method, it always seems to fail on licence_host reads. There
are other possible scenarios, which I do not have space to list here.

This agrees with the way your failures occur, the reproducible
scenarios you identified were all in cases where a view was being used
to access multiple vobs.

This issue has the defect number: CMBU00051524

Some points:

- You are not the only site experiencing this, but the number of
affected sites is very small. All use a very large number of vobs.

- The Solaris view server has always used this code, but we only
started seeing issues with it within the last couple of months.

---------------------------------------

That explains the 'why' but now we need to get to the 'What are
Rational doing about it'.

There has been a lot of discussion within Engineering about how to
address this. They have now identified a scheme that they believe will
remove the issue. It is obvious from looking at their discussions that
this is not an easy fix, and a lot of different technical issues have
to be considered.

They are still working on the code to do the fix, and will release
this in a patch. I have been given no indication of the time schedule
on this, the code changes are certainly not yet completed.

Two things I can guarantee are:
1) It will not be fixed in the patches due out at the end of this
month (July).
2) No patch will be issued for ClearCase 3.2.1. Version 3.2.1 is now
in it's support discontinuance phase, no new patches will be produced
for this release. The only fully supported releases are 4.0, 4.1 and
4.2.

Given the second point, you must begin your migration plans for v4.x
now, if you have not yet started.

I have asked Engineering if there is -anything- we can do to help you.
Their response was disappointing, they suggest the most immediate
thing you can do is to avoid running scripts and builds that access
large numbers of different vobs. And especially make sure that any
such builds or scripts that you must run are only ever run by the same
user.

Their second suggestion is that this is a Solaris specific issue, so
they suggest that if you have any Unix machines that do not run
Solaris (i.e. HP-UX, AIX, Linux, etc..) you use them as the server for
the views that have this problem. Instructions for moving views
between different Unix OS's are in the ClearCase administrators
manual.

I cannot pursue this matter any further, Engineering have made their
decisions and I cannot over-rule them. If Alcatel need to go further
with this, it must be done via your Rational Sales and TechRep
contacts.

Once again, I would like to apologize on Rational's behalf for this
issue and the delays in our identifying and resolving it."

[End of forwarded message]

So, we're stuck with our problem (some users can't compile) until we make our
migration *and* Rational releases a patch.

Great !

Philippe Chevalier

-- 
==========================================================================
Philippe Chevalier             
Tel: 01 55 66 41 64            Alcatel/Mobile Phone Division (DT/SOM)
Fax: 01 55 66 33 37            32, avenue Kleber 92707 COLOMBES  - FRANCE
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  



This archive was generated by hypermail 2b29 : Tue Jul 31 2001 - 22:03:53 EDT