Topic
  • 17 replies
  • Latest Post - ‏2014-04-11T11:27:54Z by MaltePlath
hyaldo
hyaldo
17 Posts

Pinned topic DXL Script for Automatic Linking

‏2014-03-17T16:08:56Z |

Hello,

I need a DXL script that automatically links acronyms in a source module to acronyms defintion in a target module. Basically, it would read the object heading which is the acronym definition in my target module, then searches each object  in the source module for the acronym then automatically links the two objects if a match was found. Is there a script that would get this accomplished?

Thank,

~ Hanna

 

  • smarti.sj
    smarti.sj
    20 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-03-17T17:08:43Z  

    Hi Hanna

    I may not understand fully what you are trying to do, but what is the goal of these links? To be able to easily get to the acronym definition?  If so may I suggest you create a little script that prompts the user for the unknown acronym and then looks in the Acronym module for the definition then spits it back to the user?  further, assign it a keyboard shortcut that makes it even easier to use.  I am worried these links may detract from other more important links if you have many acronyms.

    Sara

  • hyaldo
    hyaldo
    17 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-03-17T17:23:53Z  
    • smarti.sj
    • ‏2014-03-17T17:08:43Z

    Hi Hanna

    I may not understand fully what you are trying to do, but what is the goal of these links? To be able to easily get to the acronym definition?  If so may I suggest you create a little script that prompts the user for the unknown acronym and then looks in the Acronym module for the definition then spits it back to the user?  further, assign it a keyboard shortcut that makes it even easier to use.  I am worried these links may detract from other more important links if you have many acronyms.

    Sara

    Hello Sara,

    The purpose of these links is to be able to generate a view to create an acronyms table for a specific project (I am using RPE also to generate my doc). My Acronym module is a common module that includes hundreds of acronyms for several projects. Currently, I link my acronyms manually. And instead of doing a manual linking to generate a view for my acronym table for my specific project, I am trying to do the linking automatically for new projects by searching for existing acronyms in my source module and then linking to my Acronyms module. I don't need a lookup script, I need to be able to automatically link my acronyms so I can generate a table of acronyms that are only applicable to my specific project. Is there a similar script that can accomplish this?

    Thanks!

    ~ Hanna
  • llandale
    llandale
    3035 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-03-17T17:30:33Z  
    • mAcronum = read(accronym module)
    • Skip skpAcs = createString()  // KEY 'string' Acronym, DATA 'Object' handle
    • for objAcr in mAcronum do
    • {  if this is not an acronym object then continue
    •    put(skpAcs, Acronym, objAcr
    • }
    • mSource = edit(Source module)
    • buffer bufText = create()
    • for oSource in mSource do
    • {  if oSource doesn't qualify then continue
    •    setempty(bufText)
    •    bufText = oSource."Object Heading"
    •    for Acronym in skpAcs do
    •    {  if (contains(bufText, Acronym))
    •       { oAcr = (Object key skpAcs)
    •          if there is no such link then create link from oSource to oAcr using the Acronym Link Module
    •       }
    •       else
    •       {  if there is indeed such a link then delete it
    •       }
    •    }
    • }
    • flushDeletions
    • save(mSource)

    The buffer "contains" function, I think, will find the acronym e.g. "ABC" but not when it is in some other string e.g. "sABC" or "ABCD".  You will need to test that.

    -Louie

     

  • hyaldo
    hyaldo
    17 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-03-17T21:40:21Z  
    • llandale
    • ‏2014-03-17T17:30:33Z
    • mAcronum = read(accronym module)
    • Skip skpAcs = createString()  // KEY 'string' Acronym, DATA 'Object' handle
    • for objAcr in mAcronum do
    • {  if this is not an acronym object then continue
    •    put(skpAcs, Acronym, objAcr
    • }
    • mSource = edit(Source module)
    • buffer bufText = create()
    • for oSource in mSource do
    • {  if oSource doesn't qualify then continue
    •    setempty(bufText)
    •    bufText = oSource."Object Heading"
    •    for Acronym in skpAcs do
    •    {  if (contains(bufText, Acronym))
    •       { oAcr = (Object key skpAcs)
    •          if there is no such link then create link from oSource to oAcr using the Acronym Link Module
    •       }
    •       else
    •       {  if there is indeed such a link then delete it
    •       }
    •    }
    • }
    • flushDeletions
    • save(mSource)

    The buffer "contains" function, I think, will find the acronym e.g. "ABC" but not when it is in some other string e.g. "sABC" or "ABCD".  You will need to test that.

    -Louie

     

    Thanks for the sample code. I tried your function but for some reason I am still having trouble getting it to work. The 'contains' function doesn't seem to work properly as it's constantly returning -1 and when I try to print the Acronyms from inside the for loop in  "for Acronym in skpAcs do" , I am getting garbage data...and it seems like it's looping on all objects instead of the Acronyms only that are in the skip list. I just found some code on Smart DXL website which has a simple GUI that I'd like to use. I modified it so that it works for my specific needs and replaced the "doCreateLinks" with your sample code which I am attaching here. If you please look at the "doCreateLinks" function and see what I am missing, I would appreciate it.

     Thanks!

     ~ Hanna

  • M_vdLaan
    M_vdLaan
    28 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-03-19T14:58:50Z  
    • llandale
    • ‏2014-03-17T17:30:33Z
    • mAcronum = read(accronym module)
    • Skip skpAcs = createString()  // KEY 'string' Acronym, DATA 'Object' handle
    • for objAcr in mAcronum do
    • {  if this is not an acronym object then continue
    •    put(skpAcs, Acronym, objAcr
    • }
    • mSource = edit(Source module)
    • buffer bufText = create()
    • for oSource in mSource do
    • {  if oSource doesn't qualify then continue
    •    setempty(bufText)
    •    bufText = oSource."Object Heading"
    •    for Acronym in skpAcs do
    •    {  if (contains(bufText, Acronym))
    •       { oAcr = (Object key skpAcs)
    •          if there is no such link then create link from oSource to oAcr using the Acronym Link Module
    •       }
    •       else
    •       {  if there is indeed such a link then delete it
    •       }
    •    }
    • }
    • flushDeletions
    • save(mSource)

    The buffer "contains" function, I think, will find the acronym e.g. "ABC" but not when it is in some other string e.g. "sABC" or "ABCD".  You will need to test that.

    -Louie

     

    Louie, Hanna,

     

    Looking at your code, Louie, it looks like you may have mixed up your key and object in the skpAcs; you're placing the object in the Skip with the acronym as the key, but int the loop " for Acronym in skpAcs do" you try to get the value of the Skip in the Acronym (which will probably not work since it's a string). Also you want to get the key using " oAcr = (Object key skpAcs)" but this is actually the string item. Am I making sense?

    You probably want something like:

    for oAcr in skpACs do {
        Acronym = (string key skpAcs)
        
        ...
        
    }
    

     

    This is probably the same thing you're getting mixed up, Hanna. Try swapping the key and value in the loop round line 139.

     

    Sorry, cannot test it now, I'm away from DOORS, currently.

     

    Good luck!

    Marcel

  • hyaldo
    hyaldo
    17 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-03-20T21:54:15Z  
    • M_vdLaan
    • ‏2014-03-19T14:58:50Z

    Louie, Hanna,

     

    Looking at your code, Louie, it looks like you may have mixed up your key and object in the skpAcs; you're placing the object in the Skip with the acronym as the key, but int the loop " for Acronym in skpAcs do" you try to get the value of the Skip in the Acronym (which will probably not work since it's a string). Also you want to get the key using " oAcr = (Object key skpAcs)" but this is actually the string item. Am I making sense?

    You probably want something like:

    <pre class="javascript dw" data-editor-lang="js" data-pbcklang="javascript" dir="ltr">for oAcr in skpACs do { Acronym = (string key skpAcs) ... } </pre>

     

    This is probably the same thing you're getting mixed up, Hanna. Try swapping the key and value in the loop round line 139.

     

    Sorry, cannot test it now, I'm away from DOORS, currently.

     

    Good luck!

    Marcel

    Thank you all for the responses. This was actually my issue and I did come up with a work-around which works perfect with smaller number of objects. However, I seem to run into a bigger issue now.  DOORS is timing out when I run my code with this warning: "DXL Execution timeout" because I am running my DXL against very large modules with hundreds of objects. I did disable the timer but my DXL is running extremely slow and I don't know how to optimize it. This is the code that's causing it to run slow:

    Any help is very much appreciated!!

    ~ Hanna

    for oSource in mSource do
     { 

      // check for references in the Object Text attribute
      if (ObjectTextVal == true)
      {
       for objAcr in mReference do
       { 
        //if this is not an reference object then continue   
        //string Reference= trg."Object Heading"
        
        Reference = objAcr."Object Heading"
       
        if (objAcr."Definition" "" != "")
        {
         put(skpAcs, Reference, objAcr)
           
         string bufText1 =  oSource."Object Text"
       
         //string Scenario1 = "( )" Reference "( )"
         string Scenario2 = "([ ]|[.]|[:]|[;]|[,]|[(])"Reference"([ ]|[.]|[:]|[;]|[,]|[)])"

         Regexp FindReference = regexp Scenario2
         match1 = matches (Reference, bufText1)
             
         if (FindReference bufText1)
         {
          //print Reference"\n"
          //print bufText1 "\n"
          oSource ->LinkModName->objAcr
         }

        
        }
       }
     
      }

  • M_vdLaan
    M_vdLaan
    28 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-03-21T19:26:25Z  
    • hyaldo
    • ‏2014-03-20T21:54:15Z

    Thank you all for the responses. This was actually my issue and I did come up with a work-around which works perfect with smaller number of objects. However, I seem to run into a bigger issue now.  DOORS is timing out when I run my code with this warning: "DXL Execution timeout" because I am running my DXL against very large modules with hundreds of objects. I did disable the timer but my DXL is running extremely slow and I don't know how to optimize it. This is the code that's causing it to run slow:

    Any help is very much appreciated!!

    ~ Hanna

    for oSource in mSource do
     { 

      // check for references in the Object Text attribute
      if (ObjectTextVal == true)
      {
       for objAcr in mReference do
       { 
        //if this is not an reference object then continue   
        //string Reference= trg."Object Heading"
        
        Reference = objAcr."Object Heading"
       
        if (objAcr."Definition" "" != "")
        {
         put(skpAcs, Reference, objAcr)
           
         string bufText1 =  oSource."Object Text"
       
         //string Scenario1 = "( )" Reference "( )"
         string Scenario2 = "([ ]|[.]|[:]|[;]|[,]|[(])"Reference"([ ]|[.]|[:]|[;]|[,]|[)])"

         Regexp FindReference = regexp Scenario2
         match1 = matches (Reference, bufText1)
             
         if (FindReference bufText1)
         {
          //print Reference"\n"
          //print bufText1 "\n"
          oSource ->LinkModName->objAcr
         }

        
        }
       }
     
      }

    Hello Hanna,

     

    I can indeed see a few places where you can optimize:

    1. You can optimize the loops;  you now how two nested loops where this is not really necessary. This is probably the number one resource hog. The inner loop can be moved out into its own. This will create the reference cache. In the second loop you can use this cache to link the objects
    2. You are creating a lot of regexp's which may not be necessary. Perhaps you can optimize this. But I don't know how many acronyms you have.

    I've throw together the following code as a framework for what I mean. This also incorporates a few other improvements.

    Buffer bufAllRefs = create
    
    // First build a cache of all objects that have a definition
    Skip skpAcs = createString
    string sReference
    Object objAcr
    for objAcr in mReference do { 
      if (objAcr."Definition" "" != "") {
        sReference = objAcr."Object Heading"
        put(skpAcs, sReference, objAcr)
        bufAllRefs += sReference 
        bufAllRefs += "|"
      }
    }
    bufAllRefs = tempStringOf(bufAllRefs)[0:length(bufAllRefs)-2]
    
    // Take the Regexp outside the loop since redefining thisevery loop iteration
    // is inefficient
    Regexp reReference = regexp "([ .:;,\\()])(" bufAllRefs ")([ .:;,\\)])"
    
    // Now we have te regular expression and the cache, iterate over the objects in 
    // the 'source' module mSource
    Object oSource
    string sText
    for oSource in mSource do {
      // Not sure what this next condition is meant to do.
      sText = oSource."Object Text"
      if (reReference sText) {
        sReference = sText[match 2]
        if (find(skpAcs, sReference, objAcr)) {
          // Assuming LinkModName is already defined...
          oSource ->LinkModName->objAcr
        }
      }
    }
    
    // clean up the Buffer
    delete bufAllRefs
    

     

    Hope this helps. Let me know how this works out for you.

     

    Good luck,

    Marcel

  • hyaldo
    hyaldo
    17 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-03-24T18:29:49Z  
    • M_vdLaan
    • ‏2014-03-21T19:26:25Z

    Hello Hanna,

     

    I can indeed see a few places where you can optimize:

    1. You can optimize the loops;  you now how two nested loops where this is not really necessary. This is probably the number one resource hog. The inner loop can be moved out into its own. This will create the reference cache. In the second loop you can use this cache to link the objects
    2. You are creating a lot of regexp's which may not be necessary. Perhaps you can optimize this. But I don't know how many acronyms you have.

    I've throw together the following code as a framework for what I mean. This also incorporates a few other improvements.

    <pre class="javascript dw" data-editor-lang="js" data-pbcklang="javascript" dir="ltr">Buffer bufAllRefs = create // First build a cache of all objects that have a definition Skip skpAcs = createString string sReference Object objAcr for objAcr in mReference do { if (objAcr."Definition" "" != "") { sReference = objAcr."Object Heading" put(skpAcs, sReference, objAcr) bufAllRefs += sReference bufAllRefs += "|" } } bufAllRefs = tempStringOf(bufAllRefs)[0:length(bufAllRefs)-2] // Take the Regexp outside the loop since redefining thisevery loop iteration // is inefficient Regexp reReference = regexp "([ .:;,\\()])(" bufAllRefs ")([ .:;,\\)])" // Now we have te regular expression and the cache, iterate over the objects in // the 'source' module mSource Object oSource string sText for oSource in mSource do { // Not sure what this next condition is meant to do. sText = oSource."Object Text" if (reReference sText) { sReference = sText[match 2] if (find(skpAcs, sReference, objAcr)) { // Assuming LinkModName is already defined... oSource ->LinkModName->objAcr } } } // clean up the Buffer delete bufAllRefs </pre>

     

    Hope this helps. Let me know how this works out for you.

     

    Good luck,

    Marcel

    Thank you Marcel! I tried your code and for some reason, I am getting an error on this line:

    bufAllRefs = tempStringOf(bufAllRefs) [0:length(bufAllRefs)-2]. [0:length(bufAllRefs)-2] seems to be invalid syntax and when I try to look up the tempStringOf function, I can't seem to find it in DXL reference manual? I tried the code without that portion, and I am having trouble now finding the acronyms. Also the \\) does not seem to be valid either. This code seems to only find one acronym...If I have more than one acronym in my object text, it won't link it. Please assist.

    Thank you so much for your help!!

    ~ Hanna

  • M_vdLaan
    M_vdLaan
    28 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-03-24T20:19:58Z  
    • hyaldo
    • ‏2014-03-24T18:29:49Z

    Thank you Marcel! I tried your code and for some reason, I am getting an error on this line:

    bufAllRefs = tempStringOf(bufAllRefs) [0:length(bufAllRefs)-2]. [0:length(bufAllRefs)-2] seems to be invalid syntax and when I try to look up the tempStringOf function, I can't seem to find it in DXL reference manual? I tried the code without that portion, and I am having trouble now finding the acronyms. Also the \\) does not seem to be valid either. This code seems to only find one acronym...If I have more than one acronym in my object text, it won't link it. Please assist.

    Thank you so much for your help!!

    ~ Hanna

    Hanna,

    Are you getting a compile-time error or run-time? Which version of DOORS are you running?

    Okay, you will need to take into account that that piece of code will fail if the no acronyms are found (in which case length(bufAllRefs) will be 0 and the substring extraction will indeed fail. Add a check before you continue or print out the buffer to check it's value. The tempStringOf function is an undocumented one. There are a quite a few of these. 

    Also, to find multiple acronyms in the second loop, you'll need to change the if to a while loop that finds the first one, extracts it, and reassigns the sText to be the remaining text. I think it's along the lines of:

    while (reReference sText) {
      sReference = sText[match 2]
      sText = sText[(end 2) + 1:] // Check the syntax here ...
      
      if (find(skpAcs, sReference, objAcr)) {
        // Assuming LinkModName is already defined...
        oSource ->LinkModName->objAcr
      }
      
    }
    

    You might need to check what happens when the same acronym is used in the text multiple times, though.

     

    What's not valid with the \\)? It might be a DOORS v8 vs v9 thing.

     

    Sorry I can't be more specific at the moment. No access to DOORS currently.

  • hyaldo
    hyaldo
    17 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-03-24T21:42:23Z  
    • M_vdLaan
    • ‏2014-03-24T20:19:58Z

    Hanna,

    Are you getting a compile-time error or run-time? Which version of DOORS are you running?

    Okay, you will need to take into account that that piece of code will fail if the no acronyms are found (in which case length(bufAllRefs) will be 0 and the substring extraction will indeed fail. Add a check before you continue or print out the buffer to check it's value. The tempStringOf function is an undocumented one. There are a quite a few of these. 

    Also, to find multiple acronyms in the second loop, you'll need to change the if to a while loop that finds the first one, extracts it, and reassigns the sText to be the remaining text. I think it's along the lines of:

    <pre class="javascript dw" data-editor-lang="js" data-pbcklang="javascript" dir="ltr">while (reReference sText) { sReference = sText[match 2] sText = sText[(end 2) + 1:] // Check the syntax here ... if (find(skpAcs, sReference, objAcr)) { // Assuming LinkModName is already defined... oSource ->LinkModName->objAcr } } </pre>

    You might need to check what happens when the same acronym is used in the text multiple times, though.

     

    What's not valid with the \\)? It might be a DOORS v8 vs v9 thing.

     

    Sorry I can't be more specific at the moment. No access to DOORS currently.

    I am getting a compile-time error. I use DOORS version 9.3. The length of the buffer is not 0 as there are acronyms in my object text that I was able to print. I added the code you suggested above with the while loop and I am still running into performance issues. It seems to work a little faster though but I have more than 800 acronyms that I am trying to auto link which is still causing my script to run failrly slow. Is there anything else that I am missing or should try? is there a way that you can only loop on the acronyms that you find and link those instead of looping on the entire acrony list? Here is the updated code that I have right now:

    Buffer bufAllRefs = create
     // build a cache of all objects that have a definitions
     string sReference
      
     for objAcr in mReference do
     { 
      if (objAcr."Definition" "" != "")
      {
       
       sReference = objAcr."Object Heading"
       put(skpAcs, sReference, objAcr)
       bufAllRefs +=sReference
       bufAllRefs +="|"
      }
     }


     bufAllRefs = tempStringOf(bufAllRefs)
     //[0:length(bufAllRefs)-2]


     string Scenario2 = "([ .:;,\\()])("bufAllRefs")([ .:;,\\)])"

     Regexp reReference = regexp Scenario2

     string sText
     for oSource in mSource do
     { 
      
      //match1 = matches (Reference, sText)
      sText = oSource."Object Text"
      match1 = matches (sReference, sText)
      print sReference"\n"
      print match1 "\n"
      while (reReference sText)
      {
       sReference = sText[match 2]
       sText = sText[(end 2) + 1:]

       if(find(skpAcs, sReference, objAcr))
       {
        print sReference"\n"
        print bufAllRefs "\n"
        oSource ->LinkModName->objAcr
       }
      }
     }


     delete bufAllRefs

     Thank you so much for your help!!

    ~ Hanna 

  • M_vdLaan
    M_vdLaan
    28 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-03-24T22:15:58Z  
    • hyaldo
    • ‏2014-03-24T21:42:23Z

    I am getting a compile-time error. I use DOORS version 9.3. The length of the buffer is not 0 as there are acronyms in my object text that I was able to print. I added the code you suggested above with the while loop and I am still running into performance issues. It seems to work a little faster though but I have more than 800 acronyms that I am trying to auto link which is still causing my script to run failrly slow. Is there anything else that I am missing or should try? is there a way that you can only loop on the acronyms that you find and link those instead of looping on the entire acrony list? Here is the updated code that I have right now:

    Buffer bufAllRefs = create
     // build a cache of all objects that have a definitions
     string sReference
      
     for objAcr in mReference do
     { 
      if (objAcr."Definition" "" != "")
      {
       
       sReference = objAcr."Object Heading"
       put(skpAcs, sReference, objAcr)
       bufAllRefs +=sReference
       bufAllRefs +="|"
      }
     }


     bufAllRefs = tempStringOf(bufAllRefs)
     //[0:length(bufAllRefs)-2]


     string Scenario2 = "([ .:;,\\()])("bufAllRefs")([ .:;,\\)])"

     Regexp reReference = regexp Scenario2

     string sText
     for oSource in mSource do
     { 
      
      //match1 = matches (Reference, sText)
      sText = oSource."Object Text"
      match1 = matches (sReference, sText)
      print sReference"\n"
      print match1 "\n"
      while (reReference sText)
      {
       sReference = sText[match 2]
       sText = sText[(end 2) + 1:]

       if(find(skpAcs, sReference, objAcr))
       {
        print sReference"\n"
        print bufAllRefs "\n"
        oSource ->LinkModName->objAcr
       }
      }
     }


     delete bufAllRefs

     Thank you so much for your help!!

    ~ Hanna 

    Hanna,

     

    Well, yes, a regular expression with 800 acronyms is not going to be fast...

    "is there a way that you can only loop on the acronyms that you find and link those instead of looping on the entire acrony list?"

    What do you mean by "acronyms that you find"? Are acronyms identifiable in some way in the Object Texts? If, for example, they are marked using braces or square brackets (e.g. [API]) then that would be a lot easier. Alternatively, you could look for words in ALL CAPS. But this very much depends on how you would identify an acronym in the Object Text...

    Regards,

    Marcel

  • hyaldo
    hyaldo
    17 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-03-25T18:05:26Z  
    • M_vdLaan
    • ‏2014-03-24T22:15:58Z

    Hanna,

     

    Well, yes, a regular expression with 800 acronyms is not going to be fast...

    "is there a way that you can only loop on the acronyms that you find and link those instead of looping on the entire acrony list?"

    What do you mean by "acronyms that you find"? Are acronyms identifiable in some way in the Object Texts? If, for example, they are marked using braces or square brackets (e.g. [API]) then that would be a lot easier. Alternatively, you could look for words in ALL CAPS. But this very much depends on how you would identify an acronym in the Object Text...

    Regards,

    Marcel

    What I meant is basically only looping on acronyms that are in the Object Text instead of looping on the 800 acronyms to find a match. The problem with that is my acronyms list is a mix of everything...(all caps, no caps, with braces, etc...), so it's hard to identify. Can we use another method other than regular expressions that may speed it up?

    Thanks!

    ~ Hanna

     

     

  • MaltePlath
    MaltePlath
    4 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-03-31T12:21:44Z  
    • hyaldo
    • ‏2014-03-25T18:05:26Z

    What I meant is basically only looping on acronyms that are in the Object Text instead of looping on the 800 acronyms to find a match. The problem with that is my acronyms list is a mix of everything...(all caps, no caps, with braces, etc...), so it's hard to identify. Can we use another method other than regular expressions that may speed it up?

    Thanks!

    ~ Hanna

     

     

    I am puzzled by the line

    match1 = matches (sReference, sText)
    

    As far as I could see, sReference holds the last acronym you found in mReference, and then the last acronym you matched.  Why do you want to match that at all, when you are applying your big compound regexp anyway?

    One thing that definitely slows down the script that you posted are the print statements.  (I know that you need them for debugging!).

    As for the regexp, you could optimize that, but that would be another adventure in DXL: Don't look for (AAA|AAB|ABC|ABD...) but for (A(A(A|B)|(B(C|D))...) - and then character classes will be faster than disjunctions ('|') - for the final letters.

    If you notice that it works fast at first and then slows down (use you can use a progressBar to observe that), I would take a look at your usage of strings.  The line

    sText = sText[(end 2) + 1:]
    

    in the while loop generates a lot of strings, which will use a lot of memory, esp. if your source module is large as you write.  You should try using a buffer for that purpose.  (Use setempty(Buffer) if you find that assigning your text to the buffer does not properly overwrite the previous content.)  setempty() is faster than deleting and creating a buffer.

    I hope that this gives you some ideas for further optimisations.

  • hyaldo
    hyaldo
    17 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-04-09T18:05:34Z  

    I am puzzled by the line

    <pre dir="ltr">match1 = matches (sReference, sText) </pre>

    As far as I could see, sReference holds the last acronym you found in mReference, and then the last acronym you matched.  Why do you want to match that at all, when you are applying your big compound regexp anyway?

    One thing that definitely slows down the script that you posted are the print statements.  (I know that you need them for debugging!).

    As for the regexp, you could optimize that, but that would be another adventure in DXL: Don't look for (AAA|AAB|ABC|ABD...) but for (A(A(A|B)|(B(C|D))...) - and then character classes will be faster than disjunctions ('|') - for the final letters.

    If you notice that it works fast at first and then slows down (use you can use a progressBar to observe that), I would take a look at your usage of strings.  The line

    <pre dir="ltr">sText = sText[(end 2) + 1:] </pre>

    in the while loop generates a lot of strings, which will use a lot of memory, esp. if your source module is large as you write.  You should try using a buffer for that purpose.  (Use setempty(Buffer) if you find that assigning your text to the buffer does not properly overwrite the previous content.)  setempty() is faster than deleting and creating a buffer.

    I hope that this gives you some ideas for further optimisations.

    Thank you all for the tips! I was able to optimize it based on the few suggestions and now it runs in less than 1 minute instead of hours which is awesome! Couple of things though, The script does not link to the following acronyms:

     CBM+

    ABD (P)

    (T)

    (O)

     All of the above are acronyms in my acronyms list and I tried to change the regular expressions to handle those cases but my methods didn't seem to work. Any ideas?

     Also I am trying to expand this script further to make it link to definitions also which I assumed this would be the exact same logic we have for acronyms which partially works. It would only link to definitions that contains one word. If my definitions contain 2 words or more, it won't link them. Any ideas?

     Thanks in advance for all the help!!

     ~ Hanna

    bool ObjectTextVal = get(ObjectTextControl)
    Buffer bufAllRefs = create
     
     // build a cache of all objects that have a definitions
     string sReference
      
     for objAcr in mReference do
     { 
      if (objAcr."Definition" "" != "")
      {
       
       sReference = objAcr."Object Heading"
        
       // don't link every in to "inches" and every T to Tons...
       if (sReference == "T" || sReference == "in" || sReference == "s" || sReference == "g" || sReference == "A")

       {
        print sReference "\n"
        // do nothing with it
       }
       
       // add it to the sReference list
       else
       {
        put(skpAcs, sReference, objAcr)
        bufAllRefs +=sReference
        bufAllRefs +="|"
       }
      }
     }


     bufAllRefs = tempStringOf(bufAllRefs)
     //[0:length(bufAllRefs)-2]

      string Scenario2 = "([ .:;,\\()])("bufAllRefs")([ .:;,\\()])"
     

     Regexp reReference = regexp Scenario2
     string sText
     
     // progress bar loop
     int nos = 0
     for oSource in mSource do nos++

     progressStart(addDB, "Linking References", "Something", nos)
     nos = 0
     
     // Linking Object text
     if (ObjectTextVal == true)
     { 
      for oSource in mSource do
      { 
       
       sText = oSource."Object Text"
       match1 = matches (sReference, sText)
     
      
       // to find multiple acronyms, find the first match, extract it,
       // and reassigns the sText to be the remaining text.
      
       bufText1 = null
       string n = number(oSource)
       string h = oSource."Object Heading"
        
       string message = "Linking Object Text" "\n" n " " h

       progressStep ++nos
       
       if (null h) h = "null heading"
              progressMessage message
              if (progressCancelled)
       {
        if (confirm("Exit loop?"))
        {
                progressStop
                       halt
        }
     
       }
     
       while (reReference sText)
       {
        sReference = sText[match 2]
        string sReference1 = sText[match 0]

        sText = sText[(end 2) + 1:]
        string h = oSource."Object Heading"
          
        if (find(skpAcs, sReference, objAcr) || find(skpAcs, sReference1, objAcr))
        {
             oSource ->LinkModName->objAcr
        }
       }
      }
     }

     

  • MaltePlath
    MaltePlath
    4 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-04-10T10:03:10Z  
    • hyaldo
    • ‏2014-04-09T18:05:34Z

    Thank you all for the tips! I was able to optimize it based on the few suggestions and now it runs in less than 1 minute instead of hours which is awesome! Couple of things though, The script does not link to the following acronyms:

     CBM+

    ABD (P)

    (T)

    (O)

     All of the above are acronyms in my acronyms list and I tried to change the regular expressions to handle those cases but my methods didn't seem to work. Any ideas?

     Also I am trying to expand this script further to make it link to definitions also which I assumed this would be the exact same logic we have for acronyms which partially works. It would only link to definitions that contains one word. If my definitions contain 2 words or more, it won't link them. Any ideas?

     Thanks in advance for all the help!!

     ~ Hanna

    bool ObjectTextVal = get(ObjectTextControl)
    Buffer bufAllRefs = create
     
     // build a cache of all objects that have a definitions
     string sReference
      
     for objAcr in mReference do
     { 
      if (objAcr."Definition" "" != "")
      {
       
       sReference = objAcr."Object Heading"
        
       // don't link every in to "inches" and every T to Tons...
       if (sReference == "T" || sReference == "in" || sReference == "s" || sReference == "g" || sReference == "A")

       {
        print sReference "\n"
        // do nothing with it
       }
       
       // add it to the sReference list
       else
       {
        put(skpAcs, sReference, objAcr)
        bufAllRefs +=sReference
        bufAllRefs +="|"
       }
      }
     }


     bufAllRefs = tempStringOf(bufAllRefs)
     //[0:length(bufAllRefs)-2]

      string Scenario2 = "([ .:;,\\()])("bufAllRefs")([ .:;,\\()])"
     

     Regexp reReference = regexp Scenario2
     string sText
     
     // progress bar loop
     int nos = 0
     for oSource in mSource do nos++

     progressStart(addDB, "Linking References", "Something", nos)
     nos = 0
     
     // Linking Object text
     if (ObjectTextVal == true)
     { 
      for oSource in mSource do
      { 
       
       sText = oSource."Object Text"
       match1 = matches (sReference, sText)
     
      
       // to find multiple acronyms, find the first match, extract it,
       // and reassigns the sText to be the remaining text.
      
       bufText1 = null
       string n = number(oSource)
       string h = oSource."Object Heading"
        
       string message = "Linking Object Text" "\n" n " " h

       progressStep ++nos
       
       if (null h) h = "null heading"
              progressMessage message
              if (progressCancelled)
       {
        if (confirm("Exit loop?"))
        {
                progressStop
                       halt
        }
     
       }
     
       while (reReference sText)
       {
        sReference = sText[match 2]
        string sReference1 = sText[match 0]

        sText = sText[(end 2) + 1:]
        string h = oSource."Object Heading"
          
        if (find(skpAcs, sReference, objAcr) || find(skpAcs, sReference1, objAcr))
        {
             oSource ->LinkModName->objAcr
        }
       }
      }
     }

     

    If you have special characters, ie. ones that have a special meaning in regular expressions, you need to escape them with '\' (Backslash) before you add them to bufAllRefs. For example "CBM+" must become "CBM\+" and "ABD (P)" should become "ABD \(P\)".

    For a list of characters to escape look at this thread: https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014958442&ps=25 and the one linked from one of the first posts.

    We have established a convention that certain entity names have to be written in bold face, which makes it possible (with some margin for error) to check the requirement text for the entities mentioned by looking for the Rich Text mark-up for bold face ("{\b (.+)}") and to validate the strings found against another list.  So if you could require your authors to use bold, italics, or underline for acronyms, that would solve your problem with ambiguous words (like "in") by turning the initial problem on its head: You would not be looking for given acronyms in your text but for "acronym mark-up".  You could use your current approach to convert the existing text to the appropriately marked up text.

  • hyaldo
    hyaldo
    17 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-04-10T20:24:03Z  

    If you have special characters, ie. ones that have a special meaning in regular expressions, you need to escape them with '\' (Backslash) before you add them to bufAllRefs. For example "CBM+" must become "CBM\+" and "ABD (P)" should become "ABD \(P\)".

    For a list of characters to escape look at this thread: https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014958442&ps=25 and the one linked from one of the first posts.

    We have established a convention that certain entity names have to be written in bold face, which makes it possible (with some margin for error) to check the requirement text for the entities mentioned by looking for the Rich Text mark-up for bold face ("{\b (.+)}") and to validate the strings found against another list.  So if you could require your authors to use bold, italics, or underline for acronyms, that would solve your problem with ambiguous words (like "in") by turning the initial problem on its head: You would not be looking for given acronyms in your text but for "acronym mark-up".  You could use your current approach to convert the existing text to the appropriately marked up text.

    I added the backslash to my acronoyms just to test it out and my bufAllRefs now looks like:

    Acronym1|CBM\+|Acronym2|

    and that still didn't seem to link it. Is there anything else that I am missing? Thanks!

     

     

  • MaltePlath
    MaltePlath
    4 Posts

    Re: DXL Script for Automatic Linking

    ‏2014-04-11T11:27:54Z  
    • hyaldo
    • ‏2014-04-10T20:24:03Z

    I added the backslash to my acronoyms just to test it out and my bufAllRefs now looks like:

    Acronym1|CBM\+|Acronym2|

    and that still didn't seem to link it. Is there anything else that I am missing? Thanks!

     

     

    If understand you correctly, you have change the Object Heading in the acronyms module.

    Now your escaped acronym (CBM\+) does not match what you find in the Object Text, because if you added the backslash in the acronym module, you are looking for "CBM+" (sText[match 2]) in your skip list but that contains "CBM\+".  You need to escape the special characters in the regexp but not in the skip list.

    For a trial run, you could do a literal comparison for "CBM+" (like you have for "T", "in", etc.) and put the escaped version in the buffer for the regexp.  For production, you would definitely want to pass all acronyms through a function for escaping special characters.