• 2 replies
  • Latest Post - ‏2013-09-11T14:01:51Z by smithha
1 Post

Pinned topic How to reduce return data amount in a formal column analysis?

‏2013-05-01T16:31:05Z | analyzer information


I'm new to IA. I did a column analysis returned 3.7 million records and it stays in Analysis Database if I need to keep this results. However, I need to do more column analysis, the analysis DB will run out table space quickly. I'd like to know if I should use Sample data instead of full table data, but how much data that I should use for a formal (not for test) run? I know this will depend on Business, but does IA have a way to work out the tabe space problem in column analsis? Please help.

  • brett@dq
    3 Posts

    Re: How to reduce return data amount in a formal column analysis?


    It is possible to delete the high volume of frequency data and retain the stats of the analysis run - this can be done manually or using the IAADMIN command line - deleteoutputtable and the deleteExecutionHistory sub command - should be able to find it in the docs - there are some options around what to retain when you run the command. Thats what we do to prune back the database.

  • smithha
    2 Posts

    Re: How to reduce return data amount in a formal column analysis?


    Whether you need to run full volume or a sample depends on what you are trying to identify.  If you need to find distinct outliers that have low percentage such as rare default values in a name, then you likely need to evaluate at least certain columns on the full volume.  If you're looking at general trends and common data then sampling will work fine.  A sample of roughly 16k will give you output with high confidence and generally low deviation for millions of rows of data.

    If you've identified that you need full volume for analysis, then your decision whether to retain, archive, or delete this information depends on how you want to use it downstream.

    If you want to identify and capture reference data, then keep it sufficiently long to construct those tables and delete.

    If you want to compare data across different domains, then you need to retain the detail output (cross-domain and foreign key analysis run off the frequency distribution tables) until you've run these additional analyses.

    Once you delete the frequency tables, you will still have the analysis summary, but you will not be able to view any values.