Topic
  • 8 replies
  • Latest Post - ‏2013-02-11T13:35:41Z by VsV
VsV
VsV
31 Posts

Pinned topic Load big data into ICA Studio(LanguageWare) dictionary database

‏2013-02-02T11:22:53Z |
Hello!

How to load some big data, for example, 10 000 000 records part of Wikipedia articles titles, into ICA Studio(LanguageWare) Dictionary Database?

I know one method : split the big file by 1 000 000 records and import them into 10 different Dictionary Databases but it is not very convenient for the user.

Thank you!
Updated on 2013-02-11T13:35:41Z at 2013-02-11T13:35:41Z by VsV
  • SystemAdmin
    SystemAdmin
    197 Posts

    Re: Load big data into ICA Studio(LanguageWare) dictionary database

    ‏2013-02-05T15:06:19Z  
    Why load them into 10 different databases? You should be able to import them into one database. The only issue I can see is the memory footprint of the database and the dictionary build.
  • VsV
    VsV
    31 Posts

    Re: Load big data into ICA Studio(LanguageWare) dictionary database

    ‏2013-02-07T06:19:39Z  
    Why load them into 10 different databases? You should be able to import them into one database. The only issue I can see is the memory footprint of the database and the dictionary build.
    If I load 10 000 000 records into one Dictionary database LW will show me 'Out of memory' exception.
  • SystemAdmin
    SystemAdmin
    197 Posts

    Re: Load big data into ICA Studio(LanguageWare) dictionary database

    ‏2013-02-07T08:08:03Z  
    • VsV
    • ‏2013-02-07T06:19:39Z
    If I load 10 000 000 records into one Dictionary database LW will show me 'Out of memory' exception.
    Did you try changing the java max memory? in your installation dir, open the studio.ini, and change the Xmx512m to Xmx1000m (or whatever value you want, based on the memory capacity of your machine). If you have 2M Ram, you can use the value 1200M or 1500M.
    Hope this helps.
  • VsV
    VsV
    31 Posts

    Re: Load big data into ICA Studio(LanguageWare) dictionary database

    ‏2013-02-07T16:43:40Z  
    Did you try changing the java max memory? in your installation dir, open the studio.ini, and change the Xmx512m to Xmx1000m (or whatever value you want, based on the memory capacity of your machine). If you have 2M Ram, you can use the value 1200M or 1500M.
    Hope this helps.
    I set Java heap to Xmx1500m. (I have 8 GB RAM).
    Imported all data successfully.
    But dictionary build phase throws exception 'Out of memory'
  • SystemAdmin
    SystemAdmin
    197 Posts

    Re: Load big data into ICA Studio(LanguageWare) dictionary database

    ‏2013-02-08T13:59:34Z  
    • VsV
    • ‏2013-02-07T16:43:40Z
    I set Java heap to Xmx1500m. (I have 8 GB RAM).
    Imported all data successfully.
    But dictionary build phase throws exception 'Out of memory'
    Please try increasing the heap and see if it works. May be use 2000M or higher. If not, please let us know.
  • VsV
    VsV
    31 Posts

    Re: Load big data into ICA Studio(LanguageWare) dictionary database

    ‏2013-02-09T14:42:12Z  
    Please try increasing the heap and see if it works. May be use 2000M or higher. If not, please let us know.
    If I set Java heap to 1800M or higher I will get "JVM terminated. Exit code =-1".
    Because LW uses 32 bits Java version. And I can't configure LW to use 64 bits Java.
    1700M is the max I could set.
  • SystemAdmin
    SystemAdmin
    197 Posts

    Re: Load big data into ICA Studio(LanguageWare) dictionary database

    ‏2013-02-11T12:25:52Z  
    • VsV
    • ‏2013-02-09T14:42:12Z
    If I set Java heap to 1800M or higher I will get "JVM terminated. Exit code =-1".
    Because LW uses 32 bits Java version. And I can't configure LW to use 64 bits Java.
    1700M is the max I could set.
    It looks like a memory limitation then.
    If you are using the dictionaries in annotations you can do the following:
    • Create more than one dictionary.
    • Create a rule that promotes the types generated by these dictionaries into an annotation (exmaple: Dictionary1: DictType_A, Dictionary2: DictType_B, Dictionary3: DictType_C, and then the rule will promote DictType_A, DictType_B and DictType_C to FinalType.

    Hope this helps
  • VsV
    VsV
    31 Posts

    Re: Load big data into ICA Studio(LanguageWare) dictionary database

    ‏2013-02-11T13:35:41Z  
    It looks like a memory limitation then.
    If you are using the dictionaries in annotations you can do the following:
    • Create more than one dictionary.
    • Create a rule that promotes the types generated by these dictionaries into an annotation (exmaple: Dictionary1: DictType_A, Dictionary2: DictType_B, Dictionary3: DictType_C, and then the rule will promote DictType_A, DictType_B and DictType_C to FinalType.

    Hope this helps
    Yes, I could - see my first post )))
    But the problem is not solved by splitting the big dictionary because LW crashes during document analysis if I have all my dictionaries in UIMA pipeline configuration.

    Is it possible to use 64 bits Java for LW?