Data Science, Machine Learning & API / SOA: Insights and Best Practices
Ali_Arsanjani 120000D8QB Tags:  artificialintelligence ai_ethics ml machinelearning ai machine_learning 2,148 Views
Ali_Arsanjani 120000D8QB Tags:  virtual_agent intent-based_architecture watson_conversation_servi... chatbot 2,453 Views
Chatbots or Virtual Agents are rapidly ramping up to augment the human computer interaction starting from Self-help and gradaully moving up the Knowledge management chain to how an Agent (such as a technical support call center agent) uses a specialized chatbot inhouse for Agent Assist.
Some of these technologies are nascent, others more mature, such as Watson Virtual Agent.
The first thing that a virtual assistant does is to detect your intent: this is accomplished using Natural Language Understanding and Natural Language Classification. The process starts with a recognition of the goals of the personas interacting as well as being training on the most commonly recurring calls, requests, chat transcripts, etc.
What we are talking about here is more than Search or Text Analytics it is about Intent based understanding of content.
Search is more about Federation of search indices, Content level security, Content display facets, filtering, Community & Social extensions, Connectors ERP, DBs, etc.
Content Analytics on the other hand is about finding relevant words in text counting occurrences, Analytics for entity & associations extraction, Integration with analytics & prediction systems.
Intent-based Architectures allow you to understand the query down to the users intent, Execute interactive query refinement to be actionable, Generate a recommendation, Interactively access data with implied meaning & relationships Establish word / phrase proximity, document relationships.
All these capabilities are predicated upon the discovery of intent: intents that can be reused across personas in the interaction dialog .
An intent represents the purpose of a user's input. Each capability uses a natural language classifier that can evaluate a user utterance and find a predefined intent if it is present.
For example, a recommended practice is to review the intents that are identified most often just before users request to speak to a human agent. Investigating the causes of escalations can help you prioritize where to focus future training efforts. You can determine whether user inquiries are being misinterpreted, whether your service is missing a common intent altogether, whether the responses that are associated with an intent need improving, and so on.
Ali_Arsanjani 120000D8QB Tags:  cognitive_systems traceable_ml governance traceability cognitive_system_governan... 3,331 Views
In the age of voluminous unstructured data that can be curated manually, semi-automatically or eventually, with a fuller degree of automation with added approval processes and governance in place, it is more than ever necessary to secure not only the data, but secure the process around which data is curated, prepared, an ML algorithm(s) selected with resulting data being used trained for ingestion into ML algorithms, the curation of the data : data collection and aggregation sources (lineage), data preparation, including redundancy removal, null value replacement, range consolidation, homogeneity of content, etc.
Thus the age of Traceable Machine Learning Governance is born. Standards bodies should embark on this consolidation of views, across vendors, consumers, services organizations, data providers, curators and other stakeholders in the ML Training Life-cycle.
The problem of governance of ML is one of not only data governance as it passes through traceable, steps until ingested into an ML, but also of the process around the collection, curation, selection of ML and training set selections and partitioning.
So, here is our call to action:
1. Standards Bodies need to consolidate a standard around Traceable ML Governance to reduce the risk of fake training, bogus data used to train a "paper tiger ML", which has no substance.
2. Corporations should give serious consideration during the ML and Cognitive Computing Training Life-cycle to secure a Traceable ML and Cognitive Governance process. This process can begin in a lightweight fashion, but secure legal implications arising from the use and administration of Cognitive Systems using Machine Learning (ML) to make recommendations, provide insights, generate summaries, reports, news, reviews, etc.
3. Furthermore, as global impact of well-trained Cognitive Systems (ML systems, AIs) becomes more and more tangible, we will want to have demonstrable traceability on how these systems were trained, retrained and where the human in the loop influenced the AI and how it was brought to bear (SME, curation, etc) on the source data, where data was sourced, how it was curated (whether human, or other ML or Cognitive System). To do so, traceability in the Cognitive System life-cycle or the ML-training life-cycle will play a cardinal role in its adoption, trust and veracity of recommendations.
Many claims around Deep Learning being beyond governance or governability is a bogus claim founded only in the illusion that 'since the hidden layers of a Neural Network and their interactions are "dark matter" so we cannot govern their inputs and outputs.' In fact we can and should secure the training process, as well as the inputs and output configurations of ML systems in Logs that are themselves inspected by ML systems with a human in the loop approval process in place.
4. Organizations and Standard Bodies should consider the use of Blockchain technologies to secure and govern the process chain within the ML-training or Cognitive system life-cycle in a demonstrably traceable manner, which with little doubt will gradually find its way in local and global legislation.
Ali_Arsanjani 120000D8QB Tags:  longitudinal time-sequenced machinelearning rnn lstm ml 3,010 Views
Traditional artificial neural networks (ANN) have a memory problem : they cannot recall their previous reasoning about events to inform new ones.
If you have time-sequenced data, i.e., longitudinal data, you might want to consider using a Recurrent Neural Network (RNN) that allows information to flow from the past to the present using a feedback loop. Just as we experience the world and our sensors (senses) provide feedback to us constantly to reevaluate an unfolding situation.
Events are a connected series of vectors (or tensors) that have a way of bringing past experience to bear to the next step of the deliberation.
In recent years, RNN’s have enjoyed an “unreasonable success”, to quote Andrej Karpathy of Stanford University. These include speech recognition and categorization, language modeling and translation, image captioning and recognition at multiple points in a sequence of images, etc.
So RNNs add a feedback loop to the Neurons and allow information to flow without having a reset or restart everything they are executed.
Long Short-Term Memory (LSTM) is a variation of RNNs that have been most effective in processing these kinds of longitudinal data. We will further discuss them in a next entry.
Ali_Arsanjani 120000D8QB Tags:  gradientboosting machinelearning ai training ml cognitive 2,847 Views
Ali_Arsanjani 120000D8QB Tags:  mlinfused bigdata combine_structured_unstru... machinelearning datascience 3,217 Views
Context and problem. You have terabytes of structured data, petaBytes of unstructured data, that are not quite visible, and many areas are dark, they provide less valuable information. You have lots of dark data, but not a whole lot of insight. How can you not only see summaries and means and graphs of trends, but to gain insight, that you can count on to associate specific value-adding business activities and based off of them, make strategic business decisions. How can you shine light on all the dark data that is sitting in those ObjectStores, relational databases, datawarehouses, content stores, voice, images, text,....
Considerations. Leveraging data, through analytical processing is not just about processing structured data often residing in the organization's many systems of record or transaction-processing or even online analytics processing databases. It is rather about the ability to combine unstructured data coming in either from IoT devices, which are semi-structured data, and from content-based (think Enterprise Content Management (ECM)) systems that contain images, attachments, documents and text in free format.
Solution Path. Extracting data from unstructured content or text from images, transforming semi- and un-structured content into structured data, storing it in a DataLake and possibly into other Case Management systems, will allow you to start gathering the raw data you need to start your curation and data wrangling.
Then you can apply Data Science and Machine Learning to the curated Datalake to gain insights, and incorporate the conditional actions you wish to take as part of a BPM or Case Management solution. Alternatively you can wire a set of micro-services via APIs exposed from Software as a service vendors, to evaluate the insights and based on certain thresholds, invoke a workflow or display a result for human knowledge workers to take action.
Ali_Arsanjani 120000D8QB 3,061 Views
Ali_Arsanjani 120000D8QB Tags:  cognitive_systems ai ethics ai_ethics machine_learning data_curation augmented_intelligence 4,704 Views
Most major businesses are embarking on augmenting their analytics capabilities with Machine Learning. They are either demonstrating or are planning projects that showcase how machine learning can be applied to their applications, with Machine Learning. (This is called #MLInfused.) Enterprise software development firms are attempting to bring impactful predictive capabilities into their suites of products. IBM Watson through either Bluemix APIs or IBM ML On prem offers such capabilities.
When a mortgage application is submitted, ultimately human underwriters make the decision based on a set of rules. Although we cannot claim that the process is completely understandable, we can probably hold a human plus the regulations accountable in an audit, court case or compliance situation to demonstrate lack of unjust discrimination or bias. Enter Machine Learning. The data we use to train such a system, the humans who will curate and annotate the data, and the process they went through (biased or unbiased) will have a significant if not cardinal impact on the trained ML algorithm and thus it's recommendations and output.
To prepare ourselves as a society of scientists, engineers, businesspersons and regulators, etc., for a world where such processes and data will have major impact on individuals and society, we need a set of rules and regulations. They will come in due time forced by errors and omissions and uproar. But to prempt that we suggest for consideration some "laws" or best practices around machine learning of cognitive systems.
1. A cognitive system will not be trained on dark curated data . Yes maybe the hidden layers of neural networks are a black box, but the data itself should not be "dark": it's sourcing, inputs, annotations, training workflow and outcomes should be transparent. Training, testing and validation data sets should be traceable, white boxed and accessible for enterprise governance and compliance .
2. The data curation process will be transparent. Ends do not justify means : transparency of data curation process for machine learning is paramount . This means traceability and governance around where data is sourced, who curated and annotated it, who verified the annotations .
3. Cognitive system recommendations must provide traceable justification . Outcomes must be coupled with references to why they made the decision or recommendation.
4. Where human health is at stake cognitive systems with relevant but different training backgrounds will cross check each other before making a recommendation to a human expert .
These practices or initial variations of them should be considered as part of an overall governance process for the training, curating of datasets for machine learning of cognitive systems.
Ali_Arsanjani 120000D8QB Tags:  governance agile api microservices soa 1 Comment 3,895 Views
Fred Brooks has taught us in his Mythical man Month that if a project is going off track and running behind schedule, then adding people to the project will only make things worse: the ramp-up time to orient and on-board developers will have reached diminishing returns. We have have also seen this countless times on projects we have been involved.
But where is the root cause? Not what, but where. Say you have a multi-tier architecture, and the business logic tier or the service component layer, the layer or tier that implements the server side functionality. The backend logic is/can often be represented as some form of an object model, or class diagram. This object model is where the problem is.... Class diagrams or object models, represent the design that implement services in the services layer, which indeed do decrease complexity and risk and decouple systems and make life easier for everyone.
When a developer is onboarded they have to learn the complex convoluted backend dependencies of the business logic tier, essentially they need to digest the object model , the domain object model, the class diagram (or whichever other representation or name you choose to give it). Most of the time these dependencies are unnecessary and the rampup time increases because they feel they have to digest so much of what everyone else is doing.
So, partition the object model, the domain model, into a set of smaller (perhaps as close as you can get to 5-9 main classes) . These partitions of the domain model allow a natural separation of concerns and allow developers to ramp up much faster.
This has nothing to do with building microservices, or small grained services. In fact if you focus on smaller grained services only you will fall into the Service Proliferation trap.
It's about building tiny application not tiny services. Tiny applications can have medium or even one or more larger grained services.
The success of microapplications lies not in the granularity of the services they are composed of but on the size & complexity of their underlying domain model, which focus only on one functional area.
Remember, tiny domain models result in small code bases. Large code bases overwhelm developers, slow their development environment and increase dependencies and complexity, and make integration even harder.
Ali_Arsanjani 120000D8QB 3,802 Views
SOA (service -oriented architecture) evolved to reflect our deepening of understanding in separating interface from implementation from deployment, not just pro grammatically, but architecturally and in a business impactful fashion. It introduced a new layer in the software application architecture focused on well, just services or really, service description or service contracts. Interfaces were merrily separating interface from implementation in the object-oriented world which we take for granted today. But this was limited to mostly a programming paradigm.
SOA elevated this separation of concerns between interfaces/contracts, implementations and deployments higher in the foodchain: to the level of an architectural construct.
As we were compelled to move implementations/deployment locations off premise, in view of the savings, flexibility (elasticity) and consolidating power of the cloud, we moved to created or using software as a service. Software as a service, or , SaaS is primarily a software licensing and delivery model that has grown out of the SOA worldor consolidate them in a private cloud. Software is licensed on a subscription or "pay as you go" basis and is hosted often as a managed service, but sometimes in a public , private or most often hosted scenario. Platform, Infrastructure and other XaaS kinds of software creatures started evolving out of the cosmic goo of the ubiquitous and elastic cloud model.
SOA has morphed into RESTful APIs as the implementation or realization mechanism of choice. When people decide to use whatever underlying technology they wish and choose whatever programming language or framework they want and deploy little applications that are pretty much standalone, they tend to use the term microservices. This is often a misnomer, since the granularity may have historically started from fine grained services, but has gained stability and a more balanced medium grain service stature.
So how do we apply SOA principles to the build Cloudready applications ? We build APIs. Here are some tried and tested recommended practices for successful API development.
API best-practice 1: Use Tiny object models for design. SOA, implemented, not, with one huge underlying object model, but rather with a partitioned set of smaller object models in its design phases, smaller models that you can divvy up and feed to smaller independently functioning teams has been particularly useful and an adopted best practice. What is tiny? 7 +-2.
API Best Practice 2 : Each Team manages their own Service Portfolio. Plan the services that you need in your functional area. Maintain a categorized list of services that may breakdown into smaller grained services. Try to keep them as stateless as possible.
API Best Practice 3: Each team owns a Functional Area. When you start breaking up the design object model into tiny object model with minimal dependencies, each of those parts better align to an area of the business : a functional area. These can be different departments or more finer grained divisions such as a shopping cart, a product catalog, an order, etc.
API Best Practice 5: Each team Deploys Independently . This is so they are not on the critical path of another team, can be testing and running relatively independently. There will be semantic integration, dependencies and connections that are necessary, for example a single sign on security or session token that may need to be passed.
API Best Practice 6: Each Team Builds the finest grained APIs possible. Build RESTful APIs for the leaf nodes and medium grained services in your Service Portfolio.
API Best Practice 7: Have an Integration team that governs the overall Service Portfolio, and all things integration. Allow API access to the database that each team requested (fields, tables and all, or go NoSQL if you wish). The Integration team monitors and manages the dependencies between the teams, the functional areas.