Topic
15 replies Latest Post - ‏2013-11-13T22:20:52Z by Mathias Mamsch
grasswistle
grasswistle
13 Posts
ACCEPTED ANSWER

Pinned topic Launching many scripts to run in parallel

‏2013-11-06T22:55:26Z |

Hello,

I'm currently, processing a large module and its links and their links and their links...  Think like O(n^5)... at least!  I basically, need to look deep in the hierarchy for particular types of objects and report them.  As a very predictable consequence, this takes what is known, in some refined circles, as A Very Long Time (TM). Let's say 60-80 hours.  

I was wondering if there would be a quick and easy way to loop through the range of object absolute numbers in the module (in 100 or 200 number intervals) and assign each chunk to a separate script.  Each script could run in parallel and report its own findings to its own flat text file.  I could then write one file to assemble all that information, once the exceedingly time-consuming part of the task is complete.

Problem is, I'm not quite sure how I could do that.

Anyone done this sort of thing before?

Grasswistle

  • Mathias Mamsch
    Mathias Mamsch
    1938 Posts
    ACCEPTED ANSWER

    Re: Launching many scripts to run in parallel

    ‏2013-11-07T09:00:35Z  in response to grasswistle

    Well ... easy: No ... But doable. Of course the question is if you want to do a quick hack for just one script or a general solution that can be reused I suggested something like a parallelization framework in another post. For a general solution I would be willing to contribute. 

    The general things to be done is here to:

    1. have a means of starting DOORS "worker" processes (e.g. in batch mode) to process the data parts
    2. serialize data to these worker processes. (i.e. store data to a file, have the data part read from the file)
    3. optional: report progress from the worker process to the starting process
    4. wait for and collect the results in the calling process.

    For robustness it would of course be good to be able to detect and handle errors in the worker processes and for development purposes it would of course be good, to be able to run a single worker script easily for debugging purposes.

    Regards, Mathias

    • grasswistle
      grasswistle
      13 Posts
      ACCEPTED ANSWER

      Re: Launching many scripts to run in parallel

      ‏2013-11-08T16:01:30Z  in response to Mathias Mamsch

      Of course, if there was a parallelization framework already existing, that would be ideal, but I don't think I could personally implement such a thing as I don't know how to do steps [1], [3] or [4] in DOORS at this point. 

      I think I may be able, with some guidance, to tackle a simplified version of the steps you have suggested.  I can always make it more fancy afterwards!

      1. have a means of starting DOORS "worker" processes (e.g. in batch mode) to process the data parts in parallel
      2. serialize data to these worker processes. (i.e. store data to a file, have the data part read from the file)
      3. have the calling process wait for the child processes to finish before exiting

      So, to start, do you have any ideas about how one could write any sort of program that would launch multiple DOORS scripts in parallel, would wait until they were all finished, then exit.

       

      I have attached an attempt, but it doesn't work.  I'm not married to using Ruby, so go with something else if you like.  I just wanted to give an idea of the proof of concept I'm after.

       

      When I run the ruby script, the dxl files are created, and a command window pops up, but then nothing happens.  If I close one command window, then the next one pops up...  and so on, and so on...

       

      Thanks for your help,

      Grasswistle

      Attachments

      Updated on 2013-11-08T19:37:14Z at 2013-11-08T19:37:14Z by grasswistle
  • llandale
    llandale
    2943 Posts
    ACCEPTED ANSWER

    Re: Launching many scripts to run in parallel

    ‏2013-11-07T15:21:46Z  in response to grasswistle

    Good luck with your parrellel processing, but you've got the only person who can possibly help you on your side.

    Your problem is memory exhaustion due to too many modules opened concurrently and DOORS is thrashing around mindlessly. 

    [1] You will need to keep track of which modules you have opened and have an algorithm to cleverly close ones you don't figure to need for a while.  In your case, since you are doing recursive link tracing, your algorithm will need to have a "stack" element, making sure you don't close any modules on the stack (i.e. those in the current link-chain).  I think that algorithm is achievable if you forget "cleverly" and just periodically "mindlessly" close all open modules not in the stack.

    With n=5 links, however, I would think such an algorithm will result in you opening and closing the same bottom-most modules multiple times which is basically the same kind of thrashing.

    [2] I would perhaps be tempted to open each module in your project individually and save relevant information about each Object and it's outlinks to some windows File.  Then run your clever recursive link algorithm on the files instead of the modules.  Files take far less memory than Modules, and opening-closing them far less time.

    [3] I would perhaps also be tempted to have link-summary text attributes in all the modules; where you read target modules and store relevant info therein.  You would update these attribute opening modules from bottom to top; which works so long as you know which modules are likely to be N=5 links away from your system modules. 

    My initial opinion is [1] is better than [2] is better than [3]. 

    -Louie

    Dang your eyes, I cannot now stop thinking of "clever" algorithms for [1]...

    • Know when memory is getting exhausted, and then close a lot of modules.
    • Do not close modules on the stack
    • Keep track of most recently opened modules
    • Keep track of most frequently referenced open modules
    • Close most of the modules not on the stack, prioritizing least recently and least frequently opened modules.
    • Yuuuuuck.
    • grasswistle
      grasswistle
      13 Posts
      ACCEPTED ANSWER

      Re: Launching many scripts to run in parallel

      ‏2013-11-07T22:57:04Z  in response to llandale

      I've actually written a series of functions already that manages opened modules.  Let's call them open module managers (OMMs).  The trick is you call read or edit in the program (as you usually would) and use one of the OMM functions to note that the module has been opened.  Then, instead of ever directly closing a module in the program, you call another OMM function that only closes the module when the number of calls to the close-OMM function for that module equals the number of calls to the open-OMM function for that module.  (See link if you are interested.  Suggestions and improvements always welcome!).  Interestingly, I still have issues with memory causing the program to crash every 700 objects (remember: the children!), meaning that I periodically have to restart the program to get through the 6000-object long module.  

      As you said, [1] is better than [2], which, in turn, is better than [3].  Unfortunately, I'm having this problem despite having already implemented a solution like the one you describe in [1]. Furthermore the way I see it, [2] and [3] might not even beget me very good time-economy as I would still need to figure out what all the children are at some point.

      I still see some sort of simulated parallel processing, such as what Mathias suggested as the way out.  

      Thanks for your ideas!

      Grasswistle

      • llandale
        llandale
        2943 Posts
        ACCEPTED ANSWER

        Re: Launching many scripts to run in parallel

        ‏2013-11-08T15:11:57Z  in response to grasswistle

        trackOpenModules.dxl  --snipped my initial reaction--

        Hard coding a max is a disaster.  Searching for strings is slow.  searching Skip lists is very fast.  If you will always adjust the count every time you open a module then you should put those two in the same function.  Something like this:

        //*********************
        bool fSkip_Increment(Skip upd_skpList, string KeyValue)
        { // Insert the KeyValue into the Skip list's 'Key'.
         // The 'Data' portion of the Skip is set to 1 for new inserts and is
         //    incremented for already existing 'Key's in the Skip.
         // 'Data' MUST be of type Integer.
         //   (represents the number of references).
         // Return whether this is a first insert for KeyValue into the Skip.

         bool IsNewValue = false

         int NumReferences = 0 // Remains zero if next "find" doesn't find anything:
         if(find(upd_skpList, KeyValue, NumReferences))
              delete(upd_skpList, KeyValue)  // Remove It
         else IsNewValue = true   // Mark it New
           // ReInsert new or updated counts
         NumReferences++
         put (upd_skpList, KeyValue, NumReferences)
         return(IsNewValue)
        } // end fSkip_Increment()

         


        Skip g_skpOpenCount = createString()  // KEY: 'string' NameMod_Full; DATA: 'int' Open Count

        Module addOpenModule(string NameMod_Full)
        {   Module m = read(NameMod_Full, false, true)
            fSkip_Increment(g_skpOpenCount, NameMod_Full)
            return(m)
        }

        bool  closeOpenModule(Module m)
        {   NameMod_Full = fullName(m)
            if (!find(g_skpOpenCount(NameMod_Full, OpenCount)) then we made a mistake somewhere
            OpenCount--
            if (OpenCount == 0)
            {  close(m)
                m = null
               delete(g_skpOpenCount, NameMod_Full)
               Closed = true
            }
            else
            {  put(g_skpOpenCount, NameMod_Full, OpenCount, true)  // "true" means "yes replace"
               Closed = false
            }
            return(Closed)
        }

        In any event you need to practice with Skip lists; they are VERY useful.

         getLinkedObjects.dxl

        • I notice no calls to addOpenModule() and closeOpenModule()
        • I think there is a problem with the combination of the "all" loop and the "read" statement (should it be "load"?).  Hopefully someone familiar with versioned links will please comment on that.
        • I don't think you <need> to keep counting the number of links; but I guess it doesn't hurt.
        • Hackles on the back of my neck are up vis-a-vis "getting the nth linked object" notion here; but I cannot think of a good reason for that.
          • Let me sleep on it. 
          • I don't see any "for lnk" loop inside the same "for lnk" loop; you got one in the "main" program perhaps?
          • I know I would <prefer> putting the other objects in a Skip, then looping through the Skip; but that doesn't mean this algorithm is "wrong".
        • Your comments are wrong vis-a-vis your "k" loop.  As coded, the loop will not execute even once since when you initially set k=0 then the break condition "k >=0" is already true.   Also, the reference to "k+1" inside the loop has just got to be wrong.
        • The comment code using your functions will result in you opening and closing the other module multiple times.  If that module is not open for "outObj" then it gets opened, Tasked, and closed.  If the next k+1 object is in that same module, then it gets opened and closed AGAIN.  So if this module links to that other module 100 times, then it gets opened and closed 100 times.  Eeeack.

        -Louie

         

        • grasswistle
          grasswistle
          13 Posts
          ACCEPTED ANSWER

          Re: Launching many scripts to run in parallel

          ‏2013-11-08T17:57:39Z  in response to llandale

          In any event you need to practice with Skip lists; they are VERY useful.

          You are absolutely right!  I use hashes all the time in other programming languages and immediately wanted an equivalent when I started using DOORS, but there was some finickiness to Skiplists in DOORS (like most things in DOORS, I feel) that I couldn't get the hang of.  Just this week I finally started using them with success and so it is very topical that you fix my awful work arounds at this point. =D Thanks!

          notice no calls to addOpenModule() and closeOpenModule()

          Where do you  mean?  In getLinkedObjects?  I didn't mean there to be any.  As shown in the examples (in getLinkedObjects.dxl), when using getLinkedObjects functions, I suggest that you also use the trackOpenModules functions to ensure you don't accidentally close modules that are still in use.

          I think there is a problem with the combination of the "all" loop and the "read" statement (should it be "load"?).  Hopefully someone familiar with versioned links will please comment on that.

          What problem do you think might arise?  Are you worried that one of the links might be a versioned link and that read would fail because it doesn't handle versioned links?  I'd love to learn more about this!

          I don't think you <need> to keep counting the number of links; but I guess it doesn't hurt.

          I think you are right.  I could save time by not counting the number of links, and if ever the nth link that is requested does not exist, execution will exit the loop and I could return null at that point.  Is that what you mean?  

          Hackles on the back of my neck are up vis-a-vis "getting the nth linked object" notion here; but I cannot think of a good reason for that.

          Maybe because you're afraid that the links won't be come up in the same order each time.  In fact, I have no reason to assume this is true other than my limited experiences in which it has held true.  Would love to hear a more informed opinion on this.

          I don't see any "for lnk" loop inside the same "for lnk" loop; you got one in the "main" program perhaps?

          I'm sorry.  I don't really understand what you mean here.  Could you maybe elaborate?

          I know I would <prefer> putting the other objects in a Skip, then looping through the Skip; but that doesn't mean this algorithm is "wrong".

          Would this sort of work-flow be more up your alley?

          Skip list = create

          getInlnkedObjs(list, o)

          for o1 in list {...}

          Your comments are wrong vis-a-vis your "k" loop.  As coded, the loop will not execute even once since when you initially set k=0 then the break condition "k >=0" is already true.   Also, the reference to "k+1" inside the loop has just got to be wrong.

          Remember, the conditional in a for loop is not a break condition...  it's like a like while condition.  It's true that it is silly to use k+1 inside a for-loop without ever using k.  Maybe I could just start k at 1.  :P

          The comment code using your functions will result in you opening and closing the other module multiple times.  If that module is not open for "outObj" then it gets opened, Tasked, and closed.  If the next k+1 object is in that same module, then it gets opened and closed AGAIN.  So if this module links to that other module 100 times, then it gets opened and closed 100 times.  Eeeack.

          Yeah..........  This was a first stab at solving the problem though so I feel it's acceptable.

           In the future, I'd like to make it so that when the number of close-OMM function calls equals the number of open-OMM calls, the function does not immediately close the module.  It removes it from the OpenModules Skiplist and registers it in another Skiplist, a ClosingPending Skiplist with a counter.  Every subsequent OMM call (for any module) increments all the counters in the ClosingPending Skiplist and if the counter of any module in this Skiplist reaches a given threshold, that module is closed and removed from the ClosingPending  Skiplist.  If an open-OMM call is received for that module before the threshold is reached, it is removed from the ClosingPending Skiplist and is re-registered in the OpenModules Skiplist.  

          What do you think?
           

          Thanks very much for your detailed reply!

          Grasswistle
           

          Updated on 2013-11-08T19:33:59Z at 2013-11-08T19:33:59Z by grasswistle
          • llandale
            llandale
            2943 Posts
            ACCEPTED ANSWER

            Re: Launching many scripts to run in parallel

            ‏2013-11-11T19:41:07Z  in response to grasswistle

            My mistakes:

            [1] Someone hacked my account and inserted that rediculous comment I made ..err.. they made about your "k" loops.  lol  Maybe I should ..err.. they should think twice and type once.  ..err..

            [2] addOpenModule() and closeOpenModule: I think that was a poor comment.

            My coding philosophy:

            I'm not a simpleton, but my mind cannot handle "complexity" unless I can break it into "simple" parts.  (I note there are lot of other folks who have brain power to handle complexity directly and I'm a bit jealous but I digress).  I therefore keep my code "simple" because I have to; but also because the next dude to read it will appreciate it.  I assert that "complicated" code may be "clever" but tends to be "unmaintainable".  "Simple", to me, includes the notion that loops are well defined and data types are clearly stated and consistent. 

            For example I cringe when I see loop control variables modified inside the loop and I'm real diligent about Skip and Arrays and the data Types put and retrieved therein.

            Anyway..

            Some years ago I concluded that all the "for something in somethingElse do" loops produce "something" in the order in which they were created, except for "Objects" which are ordered based on the current display.  Not sure that is perfenctly true, but I have noticed that the same loop will ALWAYS produce the Something in the exact same order each time.  Your "get nth object" notion seems safe in this regard.

            I had a lot of "issues" with Skip lists until i started declaring my Skip lists with a comment as to their KEY and DATA types.  Mistakes are very hard to debug.  I can review all references to the Skip and check useage against the declaration comment:

            • Skip skpObjects = createString()   // KEY 'string' Identifier; DATA: 'Object' handle.

            Ditto for "Arrays" and other complicated structures.

            Don't forget that Skips with 'string' KEY MUST be created with "createString()"; all others must be created with "create()".  (there is actually an exception but there is no point to it).

            The "for lnk in all(o) ->"*" do" loop indeed gets "versioned" links and may end up pointing to a baseline that cannot be opened via "read".  But I don't do that and don't know for sure the pitfalls.

            Nothing has come of my hackles.  False alarm I guess.

            Every CompSci 101 student knows not to do this:

            • for i in some range do
            • {  for i in some other range do
            •    {}
            • }

            Similarly in DOORS one must not nest identical "for something in something else do" loops:

            • for l1 in o ->"*" do
            • {  for l2 in o ->"*" do
            •    {
            •    }
            • }

            The inner loop will mess up your outer loop.  Took me 2 days to debug a nested "for Baseline in Mod do" loop years ago; had one in my library and then added one to my main.  I then trained myself, in my Main programs, to stage "something" in a Skip:

            • Skip skpLinks = create()   // KEY 'int' Sequencer,DATA "Link"
            • int Sequencer = 0
            • for lnk in o ->"*" do
            • {  put(skpLinks, Sequencer++, lnk)
            • }
            • for lnk in skpLinks do
            • {  // Sequencer = (int key skpLinks)  ... Ignore Sequencer; links are retrieved here in the order inserted
            •    DealWith(lnk)
            • }
            • delete(skpLinks)

            Now I'm protected from any obscure link loop burried within function "DealWith".  I'm also protected against the deletion of links which can wreak havok on the link loop.

            I don't fully grasp your close-module strategy, but the notion of "Pending" certainly is good.

            None of my code follows links recursively.  My close-residual-module strategy centers around modules that I am opening directly.  When "too many" modules are open I then close the extra ones before proceeding to the next module.

            • Get list of Open Modules to protect
            • for each module in current folder and below
            • {  open module
            •    report module
            •    close module
            •    Close Residually opened ones if needed
            • }
            • close residually opened modules
            • Done.

            So My residual algoritm is pretty simple.  If I did open modules recursively then I'd need a complicated strategy like yours.  I tried that once and gave up, didn't do so well with "subjective" and "prioritizing" things.

            -Louie

  • llandale
    llandale
    2943 Posts
    ACCEPTED ANSWER

    Re: Launching many scripts to run in parallel

    ‏2013-11-07T15:32:34Z  in response to grasswistle

    Mmmm.  Some problems are solved by first turning them inside out and then solving them. 

    In your case, higher level objects tend to have more children then the children have parents; thus when considering a top-level parent you end up gathering information about scores of n=5 children.  However, these bottom-most Objects have far fewer parents.  I wonder then if you approach this from the child perspective; gather information about parents.  The number of open parent modules would be drastically reduced.  Once you gather all the child-ancestor information you should be able to translate it into the parent-descendant information you are after.

    -Louie

    • EHcnck
      EHcnck
      77 Posts
      ACCEPTED ANSWER

      Re: Launching many scripts to run in parallel

      ‏2013-11-07T17:33:16Z  in response to llandale

      here was my attempt a few years back. if improved attach updated version to the forum.

      Attachments

  • GregM_dxler
    GregM_dxler
    162 Posts
    ACCEPTED ANSWER

    Re: Launching many scripts to run in parallel

    ‏2013-11-08T14:12:40Z  in response to grasswistle

    How about making 5 dummy clients and running the script for a different section in each client?  If your an admin, you could add users that don't exist.

    Greg

    • grasswistle
      grasswistle
      13 Posts
      ACCEPTED ANSWER

      Re: Launching many scripts to run in parallel

      ‏2013-11-08T15:15:37Z  in response to GregM_dxler

      Actually, on the setup we have here, I can even login multiple times with the same user.  

      So, you caught me.  I lied.  What you suggested is actually what I'm currently doing.  But it's so tedious to manage multiple instances of DOORS, scripts, outputs, etc. manually on a regular basis!  I'd like to find an automatic alternative...

      Thanks for your input!

      Grasswistle

  • Mathias Mamsch
    Mathias Mamsch
    1938 Posts
    ACCEPTED ANSWER

    Re: Launching many scripts to run in parallel

    ‏2013-11-11T00:08:20Z  in response to grasswistle

    I decided to start a tutorial about this topic and start with an implementation of a parallelization framework. 

    Find it on github: https://github.com/domoran/DXLParallels

    See the tutorial.txt for details.

    Regards, Mathias

    • M_vdLaan
      M_vdLaan
      27 Posts
      ACCEPTED ANSWER

      Re: Launching many scripts to run in parallel

      ‏2013-11-11T11:58:50Z  in response to Mathias Mamsch

      Hi Mathias,

       

      Great initiative and a very good write up of the design! I do have some different preferences with regard to some of the used (implementations of the) data structures, but that's probably just personal taste.

       

      Now, all I need to do is find an application to use it... ;-)

       

      Regards,

      M_vdLaan

       

       

  • Mathias Mamsch
    Mathias Mamsch
    1938 Posts
    ACCEPTED ANSWER

    Re: Launching many scripts to run in parallel

    ‏2013-11-12T23:43:56Z  in response to grasswistle

    Ok, so part 2 of the tutorial / parallelization framework is ready: The Command_Worker allows to parallelize commandline jobs. A first test with calculating primes in a batch file shows promising speedups when parallelized on a 8 core CPU :-) Next stop: parallelizing DOORS batch jobs.

    See the progress on github: https://github.com/domoran/DXLParallels

    Regards, Mathias

     

  • Mathias Mamsch
    Mathias Mamsch
    1938 Posts
    ACCEPTED ANSWER

    Re: Launching many scripts to run in parallel

    ‏2013-11-13T22:20:52Z  in response to grasswistle

    Update: Its done!!! The parallelization framework and a tutorial for the creation of the framework is ready - though still in a very drafty state. Code example:

    string sUser     = "Administrator"
    string sPassword = "";

    if (isBatch()) {
        // this is the code that will be executed by the parallelization framework!
        ModName_ mn = module getParallelsParameter "item";
        cout << "Processing module '" (rootName_ mn) "'"
        Module mod = read (fullName mn, false);
        close mod;
    } else {
        // this is the controller code, that will run the parallel jobs
        // and wait for their results ...

        Pool p = createPool 4 // four parallel workers
        
        Project prj = current
        // create a job for each formal module in the current project ...
        Item I; for I in prj do {
            if (type I != "Formal" || isDeleted I) continue
            // Note that the getCurrentFile function here depends on the code being run

            // from a file - e.g. from the DXL Editor or by an include statement in the
            // Exit DXL window. The code tries to find the current executing file.
            Worker wBatch = createDoorsBatchWorker (getCurrentFile(), sUser, sPassword)
            setParallelsParameter(wBatch, "item", fullName I); // here we define the parameters for the job
            enqueue(p, wBatch); // enqueue this job in the pool
        }
        
        // parallel processing starts here
        int iStart = getTickCount_()
        wait(p);
        int iEnd = getTickCount_()
       
        print "Time Taken: " (iEnd - iStart) " ms\n"
        string sReturn; for sReturn in getResults(p) do print sReturn "\n";
    }

    I am up to coding a real-world example to test if parallel batch jobs really will bring any speed benefit. And of course the framework will need to be made much more robust and better tested on different platforms. I will come back with a last post on a real world example, then I will consider this tutorial/framework done until I get feedback... Regards, Mathias