Content, or unstructured Data, has a lifecycle. These are the typical states that content goes through. I will elaborate each one in another entry.
Data Science, Machine Learning & API / SOA: Insights and Best Practices
Ali_Arsanjani 120000D8QB 8,927 Views
Ali_Arsanjani 120000D8QB Tags:  ai_ethics data_curation ethics ai machine_learning cognitive_systems augmented_intelligence 11,254 Views
Most major businesses are embarking on augmenting their analytics capabilities with Machine Learning. They are either demonstrating or are planning projects that showcase how machine learning can be applied to their applications, with Machine Learning. (This is called #MLInfused.) Enterprise software development firms are attempting to bring impactful predictive capabilities into their suites of products. IBM Watson through either Bluemix APIs or IBM ML On prem offers such capabilities.
When a mortgage application is submitted, ultimately human underwriters make the decision based on a set of rules. Although we cannot claim that the process is completely understandable, we can probably hold a human plus the regulations accountable in an audit, court case or compliance situation to demonstrate lack of unjust discrimination or bias. Enter Machine Learning. The data we use to train such a system, the humans who will curate and annotate the data, and the process they went through (biased or unbiased) will have a significant if not cardinal impact on the trained ML algorithm and thus it's recommendations and output.
To prepare ourselves as a society of scientists, engineers, businesspersons and regulators, etc., for a world where such processes and data will have major impact on individuals and society, we need a set of rules and regulations. They will come in due time forced by errors and omissions and uproar. But to prempt that we suggest for consideration some "laws" or best practices around machine learning of cognitive systems.
1. A cognitive system will not be trained on dark curated data . Yes maybe the hidden layers of neural networks are a black box, but the data itself should not be "dark": it's sourcing, inputs, annotations, training workflow and outcomes should be transparent. Training, testing and validation data sets should be traceable, white boxed and accessible for enterprise governance and compliance .
2. The data curation process will be transparent. Ends do not justify means : transparency of data curation process for machine learning is paramount . This means traceability and governance around where data is sourced, who curated and annotated it, who verified the annotations .
3. Cognitive system recommendations must provide traceable justification . Outcomes must be coupled with references to why they made the decision or recommendation.
4. Where human health is at stake cognitive systems with relevant but different training backgrounds will cross check each other before making a recommendation to a human expert .
These practices or initial variations of them should be considered as part of an overall governance process for the training, curating of datasets for machine learning of cognitive systems.