Collecting data with the Data Collection Component

This task lists the high-level steps with example for data collection using Data Collection Component.

About this task

Data collection jobs extract data from the registered applications (defined as resource groups) and load the data into the data warehouse. The process is also known as ETL (extract, transform, and load), and data collection jobs are sometimes referred to as ETL jobs. The data that is extracted during this process is used for historical reporting, in which reports access stored data about application and cross-application metrics, trends, aggregations, and other relationships to produce meaningful report output. Before a historical report can be successfully run, data collection jobs must be completed at least once so that there is data in the data warehouse for the report to access.

To run data collection jobs, you have two data collection tool options: Data Collection Component or Data Manager in Rational® Insight. Data Collection Component is designed to improve performance with easier setup, configuration and deployment. For a comparison between the data collection tools, see Deciding which data collection option to use? topic.

Remember: The goal of this task is to experiment and quickly set up for running data collection jobs with the Data Collection Component. Many examples in this task use Derby as the data warehouse and Apache Tomcat as the application server. Only use Derby and Tomcat in a testing and development environment. If you want to administer Data Collection Component in a production environment, you must use enterprise-level products, such as DB2® and WebSphere® Application Server.

Procedure

  1. Start the Data Collection Component server.
    The Data Collection Component server is an application server that is running the Data Collection Component application. For the application server, you have the following options:
    Apache Tomcat
    If you want to quickly get started with running data collection jobs, you can run Apache Tomcat application server that is packaged with the Jazz™ Team Server. The Data Collection Component server by default occupies the same ports as the Jazz Team Server: 9080 for HTTP and 9443 for HTTPS. If you need to change the ports, backup your server.xml file located in <JTSInstallDir>\JazzTeamServer\server\tomcat\conf and edit the appropriate port numbers in the original server.xml file.
    1. To start the Data Collection Component server running as an Apache Tomcat server that is packaged with the Jazz Team Server, select one of the following methods:
      • This is a Windows icon. On the Windows taskbar, click Start > All Programs > IBM Collaborative Lifecycle Management > Start the Jazz Team Server
      • In a command prompt, start the server from [JTSInstallDir]\JazzTeamServer\server\server.startup.bat
      Tip: A separate Apache Tomcat console window opens. If you close this window, the server stops. You need to wait a moment for the Tomcat server to start. After the server is started, the startup time in millisecond is displayed in the Tomcat window. For example, Server startup in 26964 ms.
      This is a screen capture of the Apache Tomcat console window that opens.
    2. If you want to later stop the Data Collection Component server running as an Apache Tomcat server that is packaged with the Jazz Team Server, select one of the following methods:
      • This is a Windows icon. On the Windows taskbar, click Start > All Programs > IBM Collaborative Lifecycle Management > Stop the Jazz Team Server
      • In a command prompt, stop the server from [JTSInstallDir]\JazzTeamServer\server\server.shutdown.bat
      Tip: You can determine the server stops completely, when the popup window that displays the console output for the Tomcat server disappears and is no longer available.
    WebSphere Application Server
    If you want to administer the Data Collection Component application in an enterprise environment, you can install and configure a new or existing WebSphere Application Server. For details on configuring Data Collection Component application in WebSphere Application Server, see Configure the application server topic.
    1. To start the Data Collection Component server running as a WebSphere Application Server by running the following command:
      <WASInstallDir>\profiles\<profile_name>\bin\StartServer.bat server1
      For example:
      C:\Program Files\ibm\WebSphere\AppServer\profiles\DCCProfile\bin\StartServer.bat server1
    2. If you want to later stop the Data Collection Component server running as a WebSphere Application Server, you can run the following command:
      <WASInstallDir>\profiles\<profile_name>\bin\StopServer.bat server1
      For example:
      C:\Program Files\ibm\WebSphere\AppServer\profiles\DCCProfile\bin\StopServer.bat server1
  2. Open a web browser to the dedicated Data Collection Component location.

    The URL is https://<server>:<port>/dcc/web, where <server> is the fully qualified domain name or localhost and the context root is /dcc/web.

    This is a Tomcat icon. In a web browser, type the address: https://localhost:9443/dcc/web

  3. When you are prompt, provide your user ID and password to authenticate the Data Collection Component application set up on the Jazz Team Server.

    The user ID must be a member in the JazzAdmins group.

    1. Default user ID (case-sensitive): ADMIN
    2. Default password (case-sensitive): ADMIN
    3. Click Log In.
    Tip: If the Data Collection Component is not registared with a Jazz Team Server, one of the following error messages displays in the web browser:
    Error!
    
    Data Collection Component could not be loaded due to a syntax error or missing dependency.
    Page ID: com.ibm.rational.datacollection.web.ui.internal.pages.DataCollectionManagementPage
    HTTP Status 503 - CRJAZ1173W The com.ibm.rational.datacollection.service.internal.web.IWebRedirector service is not available.
    For details on how to registrar Data Collection Component with the Jazz Team Server, see Configure Jazz Team Server for the Data Collection Component
  4. If this is your first time opening the Data Collection Component application, you might get the following error message:
    Failed to load Licenses Data Collection resource.
    Error 500: CRRCD9002E The Data Collection Component is still initializing. Refresh the page a few minutes later.
    You must wait a few minutes as the application is initializing and goes into a loading state. You can determine the initialization is complete, when you refresh the web browser and the pages in the Data Collection Component are populated. For example, in the Data Warehouse Connection page, wait for the sections under Data Warehouse Properties and Data Collection Properties to populate.This is a screen capture of the Data Warehouse Connection page.
  5. Test the data warehouse connection.
    1. On the left-pane of the Data Collection Component application, under the Configuration section, click the Data Warehouse Connection link.
    2. On the Data Warehouse Connection page, click the Test Connection button.
    3. A successful data warehouse connection displays the message: Connection tested successfully.

      A failed data warehouse connection displays the message: Failed to connect to the data warehouse. For more details, click the show details link.

  6. Disable data collection jobs running or scheduled to run from other data collection options, such as Data Manager in Rational Insight.
    Warning: You must disable the jobs from the other data collection option because the data collection jobs in CLM are based on Java™, which are automatically enabled and scheduled to run. Otherwise data corruption can occur, in particular if the warehouse database configured in Data Collection Component is the same as any of the other data collection options.

    For Data Manager in Rational Insight to collect data from CLM products using Data Collection Component, in the Data Manager job you must disable the CLM existing jobs and leave only the Data Collection Component job active. For example, in Cognos® you must disable Data Movement Task. For details, see Deleting a component from a job stream section in the Handling job streams topic.

  7. Configure resource groups.
    Important: This step is required if you want to collect data from applications on a remote Jazz Team Server.

    When you run Data Collection Component on a local Jazz Team Server, resource groups are automatically setup for the applications and data warehouse that are registered on this same Jazz Team Server. You can continue to the next step as you are ready to collect data from applications on the local Jazz Team Server.

    Resource Groups are the applications containing the data that you want to collect and register in Data Collection Component to run data collection jobs against, for example the Rational solution for CLM products:
    • Jazz Team Server
    • Rational Quality Manager
    • Rational Team Concert™
    • Rational DOORS® Next Generation
    In addition, there is a special resource group which is the data warehouse. This data warehouse is needed as a data source to create the fact and dimension tables for the data-mart. Under the Rational Reporting Data Warehouse Database section, you must specify the username, password and URL to the data warehouse.

    Before running data collection jobs, applications must be registered in Data Collection Component as resource groups. Resource groups are automatically configured for applications and data warehouse that are registered on same Jazz Team Server as Data Collection Component. You can discover the applications to automatically detect applications that are registered with a local or external Jazz Team Server. Or you can manually add the resource groups.

    1. On the left-pane of the Data Collection Component application, under the Configuration section, select the Resource Group Configuration link.
    2. To automatically detect applications that are registered with your Jazz Team Server, on the Resource Group Configuration page, click Discover. You have the option to discover applications on either a local or remote Jazz Team Server one at a time, however you can repeat this step to register a mixture of local and remote resource groups:
      • To discover applications on a local Jazz Team Server:
        1. In the Add Resource Group window, you can view the list of applications local to your Jazz Team Server where Data Collection Component is also registered. By default, under the Application Instance column, the check boxes are selected because the local applications have been automatically added as resource groups, and ready for running data collection jobs against them.
      • To discover applications on a remote Jazz Team Server:
        1. On the Add Resource Group window, select the Use an external JTS to discover applications check box. This check box allows you to discover applications registered on a remote machine hosted in an external Jazz Team Server.
        2. In the JTS Root Uri field, specify the location of the external JTS: https://<hostname>:<port>/jts. For example: https://my.remote.host.com:9443/jts
        3. Click the Discover button to display a list of applications registered on the external Jazz Team Server on a remote machine.
        4. For example, under the Application Instance column, select the following list of check boxes to the applications on the remote machine to add as a resource group for collecting and running data collection jobs:
          • /rm
          • /qm
          • my.remote.host.com:9443/jts
          • /ccm
        5. Click OK.
        6. Back on the Resource Group Configuration page, under each of the product groups you will see the resource groups that have been added.
    3. To manually detect applications that are registered with your Jazz Team Server:
      Tip: You can skip this step if you have already automatically detected and discovered the applications from the previous step.
      1. On the Resource Group Configuration page, choose and expand one of the product groups that you want to add a resource group.
      2. Click Add. A resource group entry is added under the selected product group.
      3. Expand the resource group entry that was added under the product group.
      4. You must configure the location of the resource group by specifying the URL to the application registered on your Jazz Team Server. For example:
        Jazz Team Server
        https://my.remote.host.com:9443/jts
        Quality Management (QM)
        https://my.remote.host.com:9443/qm
        Change and Configuration Management (CCM)
        https://my.remote.host.com:9443/ccm
        Rational DOORS Next Generation
        https:/my.remote.host.com:9443/rm
    4. For each of these resource group that was added, specify the additional configuration details, such as the authentication type and version:
      1. In the Authentication type list, select the authentication setting of the resource group. The valid values are Username and Password, OAuth-JTS or Jazz Security Architecture SSO:
        Username and Password
        When you select Username and Password as the authentication type, the Username and Password fields are displayed. In the Username field, specify the user name of the resource group. And in the Password field, specify the password of the resource group.
        OAuth-JTS
        When you select OAuth-JTS as the authentication type, the Consumer key and Secret fields are displayed. In the Consumer key field, specify the consumer key obtained from the Jazz Team Server. And in the Secret field, specify the secret of the consumer key.
        Tip: If the resource group is local to your Jazz Team Server where Data Collection Component is also registered, the Consumer key and Secret fields are automatically complete with the values configured when the application was registered and finalized during the Jazz Team Server setup.
        Jazz Security Architecture SSO
        This option is available, if you enabled Jazz Security Architecture single sign-on (SSO) authentication on all Jazz applications.
      2. In the Version list, select the version-level of the resource group is from 6.0 or 5.0 release of Rational solution for Collaborative Lifecycle Management (CLM).
      Remember: Under the Rational Reporting Data Warehouse Database > A Relational Database DataSource section, there is a special resource group which is the data warehouse. This data warehouse is needed as a data source to create the fact and dimension tables for the data-mart. You must configure the warehouse by specifying the username, password and URL to the data warehouse. For example for a Derby data warehouse, specify the following configuration settings:
      • Relational Database Type: Derby Client
      • URL: //localhost:1527/conf/jts/derby/warehouseDB
      • For the User name and Password, keep the default value to none.
      • For the Version, select the version-level of the resource group is from 6.0 or 5.0 release of Rational solution for Collaborative Lifecycle Management (CLM).
    5. Click the Test Connection link for each of the resource group that was added.
      • A successful connection to the resource group displays the following message: Successfully connected to the resource group.
      • A failed connection to the resource group displays the following message: Failed to connect to the resource group. For more details, click the show details link.
    6. Remember to click Save button in the upper right-hand corner of the Resource Group Configuration page to avoid losing your newly added settings.
  8. Specify the load type for the next data collection job.

    Full loads occur during the initial run of the data collection jobs, which builds all the data warehouse tables from scratch. These jobs take longer than typical data collection jobs that run delta builds, which only collect changes from the last time the job ran or a specified date. Any subsequent loads after the initial load or after running a job with a changed load type configuration, automatically defaults back to the Delta load since previous run setting.

    Remember: You must reserve full loads for populating a new or empty data warehouse database, and run delta loads for a data warehouse that is already populated with data.

    A full load can take a long period of time to complete, such as several days, depending on the amount of data to process. If you are required to do a full load, keep in mind that to generate complete and accurate reports, a full load needs to completely pull in all the data that exists in the configured resource group, for example a specified CLM application. Otherwise if a full load is interrupted, this interruption can cause your reports to contain incomplete or inaccurate data generated from a not entirely loaded data warehouse. In addition, an interrupted full load can cause the data collection jobs to fail.

    You can change the next load configuration settings or keep the default load configuration at Delta load since previous run setting:
    1. On the left-pane of the Data Collection Component application, under the Configuration section, select the Delta Load Configuration link.
    2. Under the Enabled column, select the check box of the resource group that you want to change the load configuration.
    3. Under the Load Type column, select one of the following load type option from the drop-down menu:
      Full load
      This load job rebuilds all the data warehouse tables from scratch.
      Delta load since previous run
      This load job only loads the data that has changed or been added since the last successful load. This is the default setting.
      Delta load since date
      This load job requires you to specify a date to load the data that has changed or been added since the selected date.
    4. To apply the changes, click Save.
    Remember: When you change the next load job to a Full load or Delta load since date option, after the job runs with this load type, the load type setting automatically defaults back to the Delta load since previous run setting.
  9. Verify the data collection jobs that you want to run are selected.
    Restriction: There is a known issue that after completing the upgrade or new installation of Data Collection Component 5.0.2, some of the job options are clear in the Data Collection Jobs page.
    1. In the Data Collection Jobs page, select the check box for each data collection job that you want to run.
      Tip: To enable all the data collection jobs, click the Select All button on the Data Collection Jobs page.
  10. Run the data collection jobs.

    You can choose to run all the data collection jobs, run a specific data collection, or run a specific data collection job.

    On the Data Collection Jobs page, you can see that the Data Collection Component categorize the jobs into data collections, such as Operational Data Store (ODS), Data-Mart, and Licenses.

    You no longer need to worry about running the jobs in a strict and proper sequence. The Data Collection Component offering is designed to improve performance by using parallel and concurrent processing. Jobs in a data collection, for example in an ODS data collection run in parallel concurrent process. But before the jobs in the Data-Mart data collection can start it needs to wait for the jobs in the ODS data collection to complete. The reason for this is because the Data-Mart data collection extracts data from the ODS data collection, transforms it using a set of rules, then loads it into the metrics tables as sets of point-in-time information and relationships. This process is cumulative, meaning that a new set of information is added each time the Data-Mart data collection job is run, resulting in a collection of many sets of point-in-time metrics and relationships. This type of data can be used for reports that show metrics, trends, aggregations, and other relationships among the data.
    Tip: Limit scheduling the Data-mart data collection jobs to run as infrequently as possible, for example at most once a day. Processing Data-mart data collections can take a significant amount of time because Data-mart jobs are not delta jobs and typically insert a lot of data in the data warehouse.

    Select your scope for running data collection jobs:

    You can trigger the data collection jobs for all the registered applications at once. The jobs within an Operational Data Store data collection run in parallel concurrent process. However, before the jobs in the Data-Mart data collection can start it needs to wait for the jobs in the Operational Data Store data collection to complete.

    1. On the Data Collection Jobs page, click the Run all data warehouse collection jobs link.

    You can trigger all the jobs in a data collection. The data collection groups are Operational Data Store (ODS), Data-Mart, and Licenses.

    1. On the Data Collection Jobs page, choose a data collection that you want to run:
      • ODS Data Collection
      • Data-Mart Data Collection
      • Licenses Data Collection
    2. Click the corresponding Run link available in the section heading of the data collection that you chose in the previous step.

    You can trigger a particular data collection job to run.

    1. On the Data Collection Jobs page, locate the job that you want to run.
    2. Under the Schedule column, hover and click the Run data collection job icon (This is an icon of the Run data collection job.).
    The status of each data collection job can change to one of these states:
    Idle
    The data collection job is inactive.
    Running
    The data collection job is currently in progress.
    Stopping
    The data collection job is terminating.
  11. Monitoring the data collection jobs.

    On the Data Collection Jobs Status page, you can view your collection status. You can check what jobs have completed or currently running, and find the job logs.

    1. Open the Data Collection Jobs Status page by going to the left-pane of the Data Collection Component application, under the Data Collection section, click the Data Collection Jobs Status link.
    2. To refresh and see what jobs are still running or completed, click the Refresh button (This is an image of the Refresh button) next to the Data Collection Jobs Status link. The Data Collection Jobs Status link is available on the left-pane of the Data Collection Component application, under the Data Collection section.
      This is a screen capture of the refresh button that is next to the Data Collection Job Status link.
    3. You can view the list of jobs that are running under the Currently Running Jobs section, which displays the following details:
      Job Name
      Displays the name of the data collection job that is running.
      Start Time
      Displays a Waiting to start status when a job is issued but has not yet started. Displays the date and 24-hour clock timestamp when a job is started, for example: 2014-04-23 15:53:56
      Running Time
      Displays the length of time the job has been running, since you last refreshed the Data Collection Jobs Status page. For example, 0 hours, 0 minutes, 12 seconds.
    4. You can view the list of jobs that have completed under the Job History section, which displays the following details:
      Job Name
      Displays the name of the data collection job that is completed.
      Status
      Displays if the job completed is Success, Failed, or Cancelled.
      Start Time
      Display the date and 24-hour clock timestamp when a job is started, for example: 2014-04-23 16:02:53
      End time
      Display the date and 24-hour clock timestamp when a job is completed, for example: 2014-04-23 16:02:55
      Time Taken
      Displays the total length of time to complete the job, for example 0 hours, 0 minutes, 2 seconds
    5. When you expand a job under the Job History section, you get the following additional details:
      Resource Group
      Displays the registered application that the job is collecting data from.
      Status
      Displays if the job completed is Success, Failed, or Cancelled.
      Start Time
      Displays the date and 24-hour clock timestamp when a job is started, for example: 2014-04-23 16:02:53
      End Time
      Displays the date and 24-hour clock timestamp when a job is completed, for example: 2014-04-23 16:02:55
      Running Time
      Displays the total length of time to complete the job, for example 0 hours, 0 minutes, 2 seconds
      Processed Resources
      Displays the number of records processed.
      Delivered Rows
      Displays the number of records that were delivered to the data warehouse.
      Failed Resources
      Displays the number of records that failed to deliver to the data warehouse.
      Log
      A Log link that you can click to download and view the log file.
    6. You can limit the number of records display in the job history. Under the Job History section, in the Number of previous jobs to display data for field, type the number of records you want to display in the job history, for example 20 and then click the Run button.
    7. You can delete the job history. Click the Delete History link for the following drop-down options:
      Delete All
      A Delete Job History window opens. Click OK to delete all job history. Or click Cancel.
      Delete Before
      You can select a date to delete all job history on or before the selected date.
      Delete After
      You can select a date to delete all job history on or after the selected date.
      Delete Between
      You can select two dates to delete all job history inclusively between the two selected dates.
  12. Schedule data collection jobs.
    After you are satisfied with the initial run of your data collection job, you have the option to schedule your data collection jobs to run in multiple intervals or times throughout the day. You can trigger to run a data collection job using either of the two schedule options:
    Interval schedule
    Runs a data collection job at every X amount of minutes of the day.
    Daily schedule
    Runs a data collection job at particular times of the day.
    1. Open the Data Collection Jobs page by going to the left-pane of the Data Collection Component application, under the Data Collection section, click the Data Collection Jobs link.
    2. On the Data Collection Jobs page, choose a data collection group that you want to schedule the jobs:
      • ODS Data Collection
      • Data-Mart Data Collection
      • Licenses Data Collection
    3. Click the Schedule link available on the right-side of your selected data collection group.
    4. In the Edit schedule window, use the drop-down menu on the upper right to select one of the following schedule options:
      Interval Schedule
      1. Under the Interval group, you can specify to collect data every X amount of minutes, by typing a number in the Current time interval field. The minimum amount of minutes is 5.
      2. Under the Week Days group, you can specify to collect data on certain days.
        • To select the multiple days of the week, press and hold down the Ctrl key and click the days of the week you want to collect data. For example, Monday, Wednesday, and Friday.
        • To select all the days of the week to collect data, click the All button.
      3. After you specify the interval schedule, click the OK button in the Edit schedule window.
        This is a screen capture of the Interval Schedule.
      Daily Schedule
      1. Under the Times group, you can specify more than one particular time of the day that you want to collect data. For example, run a 9:00 am, noon, and before end of day 4:00 pm data collection job throughout the day.
        1. Click the Create Time button each time you want to add a time of the day entry.
        2. In the first drop-down menu, specify the hour of the day in a 24-hour clock.
        3. In the second drop-down menu, specify the minutes of the selected hour.
        4. To remove a time, click the Remove time icon that is next to the time entry that you want to remove.
          This is a screen capture of the Times group with the remove icon highlighted.
      2. Under the Week Days group, you can specify to collect data on certain days at the previously specified times of the day.
        • To select the multiple days of the week, press and hold down the Ctrl key and click the days of the week you want to collect data. For example, Monday, through Friday.
        • To select all the days of the week to collect data, click the All button.
      3. After you specify the daily schedule, click the OK button in the Edit schedule window.
        This is a screen capture of the Daily Schedule.
    5. To apply schedule settings, you must click the Save button available on the upper-right hand corner of the Data Collection Jobs page.
    6. To remove a schedule setting:
      1. On the Data Collection Jobs page, choose a data collection group that you want to remove the schedule setting:
        • ODS Data Collection
        • Data-Mart Data Collection
        • Licenses Data Collection
      2. Click the Unschedule link available on the right-side of your selected data collection group.
      3. To apply the removal of the schedule settings, you must click the Save button available on the upper-right hand corner of the Data Collection Jobs page.

Feedback