The Harvest Tracker tool

Find details about the command syntax and the tool's output.

Command syntax

The Harvest Tracker tool is a Python script that you run on the data server. The command syntax is as follows:
Read syntax diagramSkip visual syntax diagrampython32/usr/local/storediq/bin/util/harvest_tracker.pyc-l  loops,--loop= loops-t  loop_time,--loop-time= loop_time-o  logfile,--logfile= logfile-q  qcount,--qcount= qcount
loops
Specifies how many times the tool is to check the data server services. The default value is maxint, which corresponds to 214,748,647.
loop_time
Specifies the time interval for the checks in seconds. The default value is 30.
logfile
Defines the path to the log file. The default log file is /deepfs/config/harvest_tracker.log
qcount
Specifies the maximum number of objects in queue to be listed by name.

Running the command python32 /usr/local/storediq/bin/util/harvest_tracker.pyc -h displays the supported options and the default values.

At any time, you can stop the tool by pressing Enter followed and then entering y as confirmation.

While the tool is running, you will see additional messages like this one on the terminal where you started the tool:
Tue Feb 26 16:41:14 2019:pubsub/reconnectingpbclientfactory.py:64:ReconnectingPBClientFactory._onRemoteOk
These messages come from the data server and cannot be suppressed. They have nothing to do with the Harvest Tracker tool and can, therefore, be ignored. They are not written to the harvest_tracker.log file.

Output for active harvests

The statistics are written to an individual output block for each harvest, where the blocks are separated by dashed lines. The first line of each block shows the volume ID and the start time of that specific harvest. The statistics include the following information:
Ingest Q: Cur len:
Number of objects in the queue that are waiting for text extraction
Ingest Q: Total in:
Total number of objected that entered the queue since the harvest started
Ingest Q: File:
Full path to the file in the queue
Output Q: Cur len:
Number of objects in the queue that are waiting for node-indexing (PostGres)
Output Q: Total in:
Total number of objects that entered the queue since the harvest started
Output Q: File:
Full path to the file present in the queue
Findex Q: Cur len:
Number of objects that are queued for full-text indexing (into Lucene)
Findex Q: Total in:
Total number of objects that entered the full-text indexing queue since the harvest started
Findex Q: File:
Full path to the file in the full-text indexing queue
TktMstr: Cur len:
Number of objects being actively tracked by Ticket Master
TktMstr: Total in:
Total number of objects tracked by Ticket Master so far
Harvest: Time to complete:
Expected remaining processing time
Harvest: Percent complete:
Percent complete
Harvest: Estimated vol size:
Estimated size of the volume being harvested
Harvest: Obj/s:
Object processing rate
Harvest: Max time file:
Name of the file taking longest processing time
Harvest: Max time val:
Processing time of the file listed under Harvest: Max time file:
Harvest: Max size file:
Name of the largest file encountered so far
Harvest: Max size val:
Size of the file listed under Harvest: Max size file:
ObjTracker: VolId:
The ID of the volume being harvested
ObjTracker: Files extended:
A comma separated list of file paths to files that are taking longer than the normal processing time is
ObjTracker: Files expired:
A comma separated list of file paths to files whose processing could not be completed in the allotted maximum time
[Thu Oct 31 20:46:38 2019] ------------------------------
[Thu Oct 31 20:47:08 2019] Harvest: ---- VolId: 260, Name: ThisVolume, Start: Thu Oct 31 20:41:23 2019 ----(active)
[Thu Oct 31 20:47:08 2019] Ingest Q: Cur len: 55, Total in: 813
[Thu Oct 31 20:47:08 2019] Ingest Q: File: d3/xlsfiles/Certificate or license number.xls
[Thu Oct 31 20:47:08 2019] Ingest Q: File: d3/xlsfiles/Companies.xls
[Thu Oct 31 20:47:08 2019] Ingest Q: File: d3/xlsfiles/Countries.xls
[Thu Oct 31 20:47:08 2019] Output Q: Cur len: 12, Total in: 1203
[Thu Oct 31 20:47:08 2019] Output Q: File: d3/xlsfiles/Allattributes3.xls
[Thu Oct 31 20:47:08 2019] Findex Q: Cur len: 472, Total in: 1085
[Thu Oct 31 20:47:08 2019] Findex Q: File: d3/nsf/smoke.nsf
[Thu Oct 31 20:47:08 2019] Findex Q: File: d3/nsf/smoke.nsf
[Thu Oct 31 20:47:08 2019] Findex Q: File: d3/nsf/smoke.nsf
[Thu Oct 31 20:47:08 2019] Tkt Mstr: Cur len: 34, Total in: 112
[Thu Oct 31 20:47:08 2019] Tkt Mstr: File: d3/pptfiles/Device identifier or serial.ppt
[Thu Oct 31 20:47:08 2019] Tkt Mstr: File: d3/pptfiles/Discharge date.ppt
[Thu Oct 31 20:47:08 2019] Tkt Mstr: File: d3/pptfiles/Relative's full name.ppt
[Thu Oct 31 20:47:08 2019] Harvest: Time to complete: 1 minute 29 seconds
[Thu Oct 31 20:47:08 2019] Harvest: Percent complete: 78.01
[Thu Oct 31 20:47:08 2019] Harvest: Estimated vol size: 773 objects
[Thu Oct 31 20:47:08 2019] Harvest: Obj/s: 3.03
[Thu Oct 31 20:47:08 2019] Harvest: Max time file: d1/pst/standard.pst, Max time val: 276.00
[Thu Oct 31 20:47:08 2019] Harvest: Max size file: sips/bigsip_58m.pdf, Max size val: 58616009
[Thu Oct 31 20:47:08 2019] ObjTracker: files extended: []
[Thu Oct 31 20:47:08 2019] ObjTracker: files expired: []
[Thu Oct 31 20:47:08 2019] ------------------------------

Output for finished harvests

After a harvest is complete, the tool can provide a summary of the harvest operation. The summary contains the following information:
Ingest Q: Longest Q len:
Largest number of objects in the queue that waited for text extraction
Output Q: Longest Q len:
Largest number of objects in the queue that waited for node-indexing (PostGres)
Findex Q: Longest Q len:
Largest number of objects in the queue that waited for full-text indexing (into Lucene)
TktMstr: Longest Q len:
Maximum number of objects tracked
Harvest: Min Obj/s:
Lowest object processing rate
Harvest: Max Obj/s:
Highest object processing rate
Harvest: Max time file:
Name of file that took the longest processing time
Harvest: Max time val:
Actual processing time of the file listed under Harvest: Max time file:
Harvest: Max size file:
Name of largest file processed
Harvest: Max size val:
Size of the file listed under Harvest: Max size file:
Harvest: Total:
Total harvest time
ObjTracker: VolId:
The ID of the volume ID that was harvested
ObjTracker: files extended:
A comma separated list of file paths to files that took longer than the normal processing time
ObjTracker: files expired:
A comma separated list of file paths to files whose processing could not be completed in the allotted max time
Object analysis: Total obj types:
Total number of different file extensions encountered so far
Object analysis: Total obj count:
Total number of system level objects processed so far. Note that this count currently does not include the count of objects in a container.
Object analysis: Longest tkt time:
Longest life of a ticket tracked by Ticket Master
Object analysis: ext:
Extension (type) of object
Object analysis: Count:
Number of tickets of extension/type processed
Object analysis: Max tkt time:
Longest life of ticket of this extension/type tracked by Ticket Master
[Thu Oct 31 20:48:38 2019] ------------------------------
[Thu Oct 31 20:48:38 2019] Harvest_Tracker Summary...
[Thu Oct 31 20:48:38 2019] Harvest: ---- VolId: 260, Name: ThisVolume, Start: Thu Oct 31 20:41:23 2019 ----(stale)
[Thu Oct 31 20:48:38 2019] IngestQ: Longest Q len: 813
[Thu Oct 31 20:48:38 2019] OutputQ: Longest Q len: 1324
[Thu Oct 31 20:48:38 2019] FindexQ: Longest Q len: 1196
[Thu Oct 31 20:48:38 2019] TktMstr: Longest Q len: 112
[Thu Oct 31 20:48:38 2019] Harvest: Min Obj/s: 0.01, Max Obj/s: 3.63
[Thu Oct 31 20:48:38 2019] Harvest: Max time file: d1/pst/standard.pst, Max time val: 282.00
[Thu Oct 31 20:48:38 2019] Harvest: Max size file: sips/bigsip_58m.pdf, Max size val: 58616009
[Thu Oct 31 20:48:38 2019] Harvest: total: 0d 0h:7m:15s
[Thu Oct 31 20:48:38 2019] ObjTracker: files extended: []
[Thu Oct 31 20:48:38 2019] ObjTracker: files expired: []
[Thu Oct 31 20:48:38 2019] Object analysis:
[Thu Oct 31 20:48:38 2019]     Total obj types: 22, total obj count: 793, Longest tkt time: 32.19:
[Thu Oct 31 20:48:38 2019]         ext: 'xls': count: 193 (24.00%), Max tkt time: 2.39
[Thu Oct 31 20:48:38 2019]         ext: 'ppt': count: 191 (24.00%), Max tkt time: 4.91
[Thu Oct 31 20:48:38 2019]         ext: 'doc': count: 185 (23.00%), Max tkt time: 32.19
[Thu Oct 31 20:48:38 2019]         ext: 'pdf': count: 181 (22.00%), Max tkt time: 15.01
[Thu Oct 31 20:48:38 2019]         ext: 'mail': count: 11 (1.00%), Max tkt time: 0.03
[Thu Oct 31 20:48:38 2019]         ext: 'msg': count: 7 (0.00%), Max tkt time: 0.02
[Thu Oct 31 20:48:38 2019]         ext: 'eml': count: 4 (0.00%), Max tkt time: 0.00
[Thu Oct 31 20:48:38 2019]         ext: 'mbx': count: 3 (0.00%), Max tkt time: 0.00
[Thu Oct 31 20:48:38 2019]         ext: 'note': count: 2 (0.00%), Max tkt time: 0.00
[Thu Oct 31 20:48:38 2019]         ext: 'pst': count: 2 (0.00%), Max tkt time: 0.00
[Thu Oct 31 20:48:38 2019]         ext: 'rar': count: 2 (0.00%), Max tkt time: 0.00
[Thu Oct 31 20:48:38 2019]         ext: 'nsf': count: 2 (0.00%), Max tkt time: 0.00
[Thu Oct 31 20:48:38 2019]         ext: 'jpg': count: 1 (0.00%), Max tkt time: 0.01
[Thu Oct 31 20:48:38 2019]         ext: 'rtf': count: 1 (0.00%), Max tkt time: 0.00
[Thu Oct 31 20:48:38 2019]         ext: 'h': count: 1 (0.00%), Max tkt time: 0.00