Topic
8 replies Latest Post - ‏2013-02-11T13:35:41Z by VsV
VsV
VsV
31 Posts
ACCEPTED ANSWER

Pinned topic Load big data into ICA Studio(LanguageWare) dictionary database

‏2013-02-02T11:22:53Z |
Hello!

How to load some big data, for example, 10 000 000 records part of Wikipedia articles titles, into ICA Studio(LanguageWare) Dictionary Database?

I know one method : split the big file by 1 000 000 records and import them into 10 different Dictionary Databases but it is not very convenient for the user.

Thank you!
Updated on 2013-02-11T13:35:41Z at 2013-02-11T13:35:41Z by VsV
  • SystemAdmin
    SystemAdmin
    197 Posts
    ACCEPTED ANSWER

    Re: Load big data into ICA Studio(LanguageWare) dictionary database

    ‏2013-02-05T15:06:19Z  in response to VsV
    Why load them into 10 different databases? You should be able to import them into one database. The only issue I can see is the memory footprint of the database and the dictionary build.
    • VsV
      VsV
      31 Posts
      ACCEPTED ANSWER

      Re: Load big data into ICA Studio(LanguageWare) dictionary database

      ‏2013-02-07T06:19:39Z  in response to SystemAdmin
      If I load 10 000 000 records into one Dictionary database LW will show me 'Out of memory' exception.
      • SystemAdmin
        SystemAdmin
        197 Posts
        ACCEPTED ANSWER

        Re: Load big data into ICA Studio(LanguageWare) dictionary database

        ‏2013-02-07T08:08:03Z  in response to VsV
        Did you try changing the java max memory? in your installation dir, open the studio.ini, and change the Xmx512m to Xmx1000m (or whatever value you want, based on the memory capacity of your machine). If you have 2M Ram, you can use the value 1200M or 1500M.
        Hope this helps.
        • VsV
          VsV
          31 Posts
          ACCEPTED ANSWER

          Re: Load big data into ICA Studio(LanguageWare) dictionary database

          ‏2013-02-07T16:43:40Z  in response to SystemAdmin
          I set Java heap to Xmx1500m. (I have 8 GB RAM).
          Imported all data successfully.
          But dictionary build phase throws exception 'Out of memory'
          • SystemAdmin
            SystemAdmin
            197 Posts
            ACCEPTED ANSWER

            Re: Load big data into ICA Studio(LanguageWare) dictionary database

            ‏2013-02-08T13:59:34Z  in response to VsV
            Please try increasing the heap and see if it works. May be use 2000M or higher. If not, please let us know.
            • VsV
              VsV
              31 Posts
              ACCEPTED ANSWER

              Re: Load big data into ICA Studio(LanguageWare) dictionary database

              ‏2013-02-09T14:42:12Z  in response to SystemAdmin
              If I set Java heap to 1800M or higher I will get "JVM terminated. Exit code =-1".
              Because LW uses 32 bits Java version. And I can't configure LW to use 64 bits Java.
              1700M is the max I could set.
              • SystemAdmin
                SystemAdmin
                197 Posts
                ACCEPTED ANSWER

                Re: Load big data into ICA Studio(LanguageWare) dictionary database

                ‏2013-02-11T12:25:52Z  in response to VsV
                It looks like a memory limitation then.
                If you are using the dictionaries in annotations you can do the following:
                • Create more than one dictionary.
                • Create a rule that promotes the types generated by these dictionaries into an annotation (exmaple: Dictionary1: DictType_A, Dictionary2: DictType_B, Dictionary3: DictType_C, and then the rule will promote DictType_A, DictType_B and DictType_C to FinalType.

                Hope this helps
                • VsV
                  VsV
                  31 Posts
                  ACCEPTED ANSWER

                  Re: Load big data into ICA Studio(LanguageWare) dictionary database

                  ‏2013-02-11T13:35:41Z  in response to SystemAdmin
                  Yes, I could - see my first post )))
                  But the problem is not solved by splitting the big dictionary because LW crashes during document analysis if I have all my dictionaries in UIMA pipeline configuration.

                  Is it possible to use 64 bits Java for LW?