Topic
IC4NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
4 replies Latest Post - ‏2012-10-03T21:46:28Z by SystemAdmin
Tucks
Tucks
78 Posts
ACCEPTED ANSWER

Pinned topic Missing data values for programmer (-Y) output in mmlsfileset

‏2012-10-03T08:23:58Z |
I'm writing some python code around GPFS for various reasons.
One of the things I'm doing is querying filesets and building an object representing each fileset.

Example output of the 'programmer' output of mmlsfileset:


mmlsfileset gpfs -L -Y | head -n 3 mmlsfileset::HEADER:version:reserved:reserved:filesystemName:filesetName:id:rootInode:status:path:parentId:created:inodes:dataInKB:comment:filesetMode:afmTarget:afmState:afmMode:afmFileLookupRefreshInterval:afmFileOpenRefreshInterval:afmDirLookupRefreshInterval:afmDirOpenRefreshInterval:afmAsyncDelay:reserved:afmExpirationTimeout:afmRPO:afmLastPSnapId:inodeSpace:isInodeSpaceOwner:maxInodes:allocInodes:inodeSpaceMask:afmShowHomeSnapshots:afmNumReadThreads:afmNumReadGWs:afmReadBufferSize:afmWriteBufferSize:afmReadSparseThreshold:afmParallelReadChunkSize:afmParallelReadThreshold mmlsfileset::0:1:::gpfs:root:0:3:Linked:%2Fmnt%2Fgpfs:--:Tue Dec  7 20%3A22%3A21 2010:-:-:root fileset:off:-:-:-:-:-:-:-:-:-:-:-:-:0:1:46630658:46637056:0:-: mmlsfileset::0:1:::gpfs:vfx-bloc_party-kettling:1:29699512:Linked:%2Fmnt%2Fgpfs%2Fprojects%2Fb%2Fbloc%5Fparty%2Fkettling%2Fvfx:0:Mon Oct  1 17%3A28%3A02 2012:-:-:vfx-fec723b7-da17-4ae6-9868-4083b544daa1:off:-:-:-:-:-:-:-:-:-:-:-:-:0:0:0:0:0:-:


This outputs 42 optional values per fileset (according to the column headings).

I've noticed that my filesets do not have values admNumReadGW through afmParallelReadThreshold - which is fine as they're not AFM filesets.
At the moment, only 36 of the first columns are available per fileset, not a value for each column.

Surely the output on each fileset should have a :-: for each of these missing columns?
Also, it would be absolutely marvellous if all commands could support the -Y option. mmlspool is an example of a command which does not.

P.S. marc_of_gpfs asked if any items are missing from the API to aid with building GUIs. I'd argue that the output for all of these command mmlspool/disk/df/lsfilsets etc. should be able to be called via the API. Otherwise, we're just left with screen-scraping shell commands which is somewhat 'dirty'.
Updated on 2012-10-03T21:46:28Z at 2012-10-03T21:46:28Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    2092 Posts
    ACCEPTED ANSWER

    Re: Missing data values for programmer (-Y) output in mmlsfileset

    ‏2012-10-03T17:28:30Z  in response to Tucks
    Disclaimer: This is a non-official answer.

    I think the convention is that any missing column values should be taken as NULL.

    Your code that parses the -Y output from a command should make sure it "recognizes/matches" at least a prefix of the first/header record with the column names.

    That will give you forward compatibility, up to such a time where IBM changes the header record in some way other than simply appending new column names.
    If the column names should happen to change, that is a signal that your code needs to be updated.
    For IBM's own selfish reasons IBM wouldn't want to do that capriciously - but it's a logical possibility that might come.

    Analogy: Apple recently gave up the old 30 pin ipod connector...

    Regarding mmlspool. One can ask, but for starters, I don't see 'mmlspool' in the GPFS pubs.

    See also: `grep statfspool /somewhere-on-your-system/include/gpfs.h`
    • Tucks
      Tucks
      78 Posts
      ACCEPTED ANSWER

      Re: Missing data values for programmer (-Y) output in mmlsfileset

      ‏2012-10-03T19:56:04Z  in response to SystemAdmin
      > marcofGPFS wrote:
      > I think the convention is that any missing column values should be taken as NULL.

      I'm pondering how to do that programmatically.
      It's quite easy to check if the header's columns names have changed.
      It's relatively easy to check if the data passed in a column could be valid (not so easy depending what column it is).

      However, it's not so easy to check if you've received columns 1-9,11-36 (or 1-11,13-36 for that matter.)
      The same number of columns received each time, but because the last columns aren't passed as :-: (or rather all the columns in entirety) then it makes parsing the column list almost impossible as you would never know which data was for which column.

      > Your code that parses the -Y output from a command should make sure it "recognizes/matches" at least a prefix of the first/header record with the column names.
      > That will give you forward compatibility, up to such a time where IBM changes the header record in some way other than simply appending new column names.
      > If the column names should happen to change, that is a signal that your code needs to be updated.
      > For IBM's own selfish reasons IBM wouldn't want to do that capriciously - but it's a logical possibility that might come.

      So here's a thought.
      Taking the below output, is the HEADER:version actually used internally as true versioning?
      I.E. If the HEADER:version changed to 0:2 then I can raise an HeaderVersionNotSupported exception() ?

      
      mmlsfileset::HEADER:version mmlsfileset::0:1:
      

      > Analogy: Apple recently gave up the old 30 pin ipod connector...
      >
      > Regarding mmlspool. One can ask, but for starters, I don't see 'mmlspool' in the GPFS pubs.

      Indeed, it's not in the manual - and the google hit rate is somewhat low.
      But it's a damn sight easier than parsing mmdf. Us users will use things if they're there :-)
      So all commands in pubs are supported with -Y ? I shall double check.

      > See also: `grep statfspool /somewhere-on-your-system/include/gpfs.h`

      OK. So I should pay more attention to the header file.
      I can use inline C or I'll write an external API wrapper.
      Should be much quicker too.

      Bang in a big bit of Turbo Gears and we might have something of a GUI forming.
      • SystemAdmin
        SystemAdmin
        2092 Posts
        ACCEPTED ANSWER

        Re: Missing data values for programmer (-Y) output in mmlsfileset

        ‏2012-10-03T21:09:07Z  in response to Tucks
        Perhaps I wasn't clear enough, or you are expecting more sneakiness than you're ever likely to see ;-)

        My guess is IBM will only ever add extra column names. They will never (almost never?) leave out some old names or re-order names.

        It's safe to just check that the header you expect is a proper prefix of the header you get.
        If one is not a prefix of the other the command has probably been radically changed rather than simply extended, and your program must be fixed (once in a blue moon if ever.)

        That said, if you want to be a little more future proof as well as safe ...
        For each/any column name you're interested in, scan the header record for that name to dynamically determine its position (index) in the subsequent records.
        If the index value is beyond the end of a particular record, then that record is indicating a NULL value for said column.

        Clear enough?
      • SystemAdmin
        SystemAdmin
        2092 Posts
        ACCEPTED ANSWER

        Re: Missing data values for programmer (-Y) output in mmlsfileset

        ‏2012-10-03T21:46:28Z  in response to Tucks
        > Taking the below output, is the HEADER:version actually used internally as true versioning?
        > I.E. If the HEADER:version changed to 0:2 then I can raise an HeaderVersionNotSupported exception() ?
        >
        >
        
        > mmlsfileset::HEADER:version > mmlsfileset::0:1: >
        


        Most of the time, when extra -Y output needs to be produced, it'll be in the form of data in additional columns. If your script is smart enough to ignore columns beyond those that it knows about, it'll continue to work. We do reserve the right to change -Y output format for a given command in a more radical fashion, to a point where an existing parsing script may get broken. If that happens, the 'version' field in the header will be incremented, to warn parsers about the change. So throwing an exception in your script would be appropriate.

        yuri