Smarter Everyone, Smarter Everything, Smarter Everywhere
John M. Boyer 060000VMNY 1,042 Views
XForms 1.0 Second Edition has been published today at http://www.w3.org/TR/xforms/
To get an idea of the quality and quantity of improvements made to XForms, please see http://www.w3.org/2003/10/REC-xforms-10-20031014-errata.html
Based on this improved foundation, the XForms working group will now be focusing its energies on the completion of XForms 1.1. To get an idea of what will be available, check here: http://www.w3.org/TR/xforms11/
The one comment I would make about the above working draft is that the we will almost certainly revert to using the same namespace currently used for XForms 1.0, and instead use some mechanism within the language to do versioning.
In keeping with my prior post about signatures and namespaces, it is important to version a language either internally or by updating the namespace URI. Previously, we chose the namespace route because XForms is designed to be hosted within another language, so it has no root element of its own to which a version could be attached.
However, XForms 1.1 is using some special schema wizardry that allows it to have a "chameleon" namespace, which will make it easier to import XForms into a host language like XHTML without namespace qualification. I'm not a big fan of doing this, especially for host languages other than XHTML because it becomes harder to find the XForms within another document and host the XForms functionality separately from the original host language.
Nonetheless, the feature is there and it occurred to me during the W3C tech plenary that the chameleon namespace could be used to put XForms 1.1 back into the XForms 1.0 namespace. That means that XForms processors trying to determine what semantics to attach to the vocabulary need some other way to make their decisions. So we simply have to solve the versioning problem without using a change of namespace URI.
The camp that wanted us not to change namespaces will be happy. My own XSLTs will be happy too.[Read More]
John M. Boyer 060000VMNY 1,390 Views
A lightning talk at the tech plenary is about 3 minutes long and introduces something very specific to the W3C.
I gave a lightning talk on the effecs of adding or changing the stuff in a namespace. You can see the diagram and notes here: http://www.w3.org/2006/03/01-Boyer-Lightning/SignaturesAndNamespaces_Boyer.html
Basically, got a lot of nods all the way up to TimBL himself when I said you either have to use a new namespace, or you have to internally version the language so that old processors for a vocabulary don't try to render new documents with graceful degradation of unrecognized content when the documents have been signed.
Too bad this is exactly what happened with xml:id. It got added to the XML namespace rather than some other namespace, and the version of XML didn't change. Lo' and behold, it broke something. When doing a C14N of a document subset containing orphaned nodes, C14N copies XML namespaced attributes into orphaned nodes when they don't contain their own settings for the nodes.
This is good for xml:space, xml:lang and xml:base, but it isn't good for xml:id.
Truth be told, it's kind of an edge case. In XFDL, we don't even allow you to orphan nodes when signature filtering because the structure of the language is such that an orphaned node is useless without its ancestral chain.
Still, while the problem doesn't affect Workplace Forms, we (in W3C capacity) will still endeavor to fix the problem.
Since the ship has already saled on what namespace xml:id lives in, we're going to be doing a new C14N algorithm that doesn't do the inheritance behavior on xml:id OR one that doesn't do the inheritance behavior except for lang, space and base.
Actually, it's a little more complicated than that, since either of the above choices means that C14N will again be broken in the future when either a non-heritable or a heritable attribute, respectively, is added to the XML namespace.
I think we may have to add a parameterization to the new C14N that allows the author to specify the heritable attributes. This will allow document authors to keep up with adjustments to the XML namespace.
The core WG feels that further additions to the XML namespace are highly unlikely, but I'm not convinced. Just at this tech plenary alone, I heard calls for xml:role (like HTML's role) and xml:profile (like DOM's hasFeature, it would declare that a document has a feature so the processor needs to have the feature or the document won't work). In the past, I've heard a need for xml:src (like HTML src, except HTML's default is wrong-- content should override the attribute rather than the reverse). And my personal fave would be xml:compute to express that the content of an element is computationally derived from other content. The list really does go on once you start to think about XML as an intelligent object...
A Data Science Consumer's Lesson: Beware the Training Data, or How to Misuse IBM Watson Personality Insights
I recently ran across this artice (https://lctech.vn/blog/ibm-watson-compares-trumps-inauguration-speech-obamas/). It describes the author's attempt at a comparative analysis of the personalities of Barack Obama and Donald Trump based on applying the IBM Watson Personality Insights API to their US Presidential inauguration speeches. The article has many charts, figures and analyses, according to various capabilities of the API. But, these cannot make up for the logical fallacy under which the API was applied in the first place.
UPDATE: Even as I was publishing this article, a similar misuse of IBM Watson Personality Insights API was reported by CNBC (http://www.cnbc.com/2017/07/17/tim-cook-is-silicon-valleys-most-imaginative-ceo-says-ibm-data.html). The analysis produced results such as that Apple's CEO Tim Cook is the Silicon Valley's most imaginative tech leader and that Microsoft's CEO Satya Nadella is one of the most assertive tech leaders. These are non sequiturs (they may be true or false, but the analysis doesn't actually establish these truths that it asserts).
One of the most important principles in data science is the test set for a machine learned model must be a good representative of the expected usage of the machine learned model. Otherwise, the accuracy of the machine learned model on the test set will have little to do with its accuracy in practice. In the field of psychometrics, this principle actually has a name: construct validity. Generally, it makes sense to take cues on measuring machine learning from the vast experience of educational psychologists who measure human learning.
A corollary principle in data science is that the training set for a machine learned model must be consistent with the test set. Otherwise, the machine learning algorithm will not be likely to learn the construct that the test set tests. In fact, it's not uncommon to draw the test set randomly from the training set, in which case the two sets are likely to be consistent, and the challenge reduces to determining whether the training and test sets provide a good representation of the intended use case. Essentially, data scientists spend a lot of time thinking about and working on training set quality in order to attain high construct validity.
But, if you are a data science consumer, then you have to think about these principles in reverse. If you are a software developer who uses an API that offers the inferential function of a machine learned model trained by a data scientist or data science team, then you are a consumer their data science results. Such is the case when you use IBM Watson Personality Insights.
If this situation describes you, then it is important for you to look into how the API's machine learned model was trained so that you can determine whether that training reflects your use case. In the case of IBM Watson Personality Insights, this information is provided here: https://www.ibm.com/watson/developercloud/doc/personality-insights/science.html
According to this source, the API was trained based on mapping personality test results with the linguistic patterns of 200 tweets from the 600 participants. There is no evidence to suggest that our tweet writing is linguistically consistent with how we write emails, blogs, or other documents, much less speech transcripts from US Presidents' inauguration speeches or CEO speeches. For one thing, we know that except for tweet storms, successive tweets aren't necessarily all that much related to each other. But the sentences and paragraphs of these other forms of writing are much more logically and sequentially connected together. After all, that's why we have speech writers.
By comparison, if your use case is to determine personality traits of, say, a prospective customer or employee, based on their Twitter feed, then you're more likely to be appropriately using IBM Watson Personality Insights API.
In the case of this API, there are further questions that a psychologist would ask, and therefore that you should ask, too. In particular, the training data was drawn from a sample of 600 participants. But, are those participants representative of the target population on whom you will be doing the inferences with the API? For example, if your prospective customer or employee base comes from, say, the fashion industry, and if the training data participants came dominantly from, say, the tech industry or even from the population at large, then your results with the API may be significantly affected by the difference. Do your best to find out the demographics of the training data participants and your target population to see if there are mismatches. There are other similar questions. Are members of your target population more prone to tweet storms, retweeting, and/or replying to tweets than the training sample population? All of these tendencies are reflective of personality traits, so if there are differences between the training sample and the target population, then you may not be able to use the API.
For any API, you as a software developer are practicing a basic form of data science by checking these issues because you are ensuring construct validity between the inferences in your use case and the training data for the machine learned model you are consuming.
John M. Boyer 060000VMNY 1,440 Views
The Workplace Forms Designer allows the form author to use an XML schema to automatically generate the instance data for an XForms model. The data nodes are needed to allow drag-and-drop associations to be created between the data and user interface components. You can drag from an existing UI control onto the data to make the association, or you can drag the data onto the form design canvas to cause the creation of a UI control that will then be bound to the data node. Either way, in XForms the association is made between instance data nodes and UI controls.
This is true even when the XML schema contains an element whose content is a choice. When this happens, the Workplace Forms Designer generates data according to the content of all possible choices. This allows form design to proceed for each of the possible choices. The expectation from XForms is that the form author will make all of the possible choices be non-relevant except the one actually chosen by the end-user. For example, if you have a choice of address block based on country selection, once you select "UK" then you get the UK address block, and once you select "US" then you get the US address block, and so forth. All address blocks other than the one for the selected country become non-relevant.
The use of non-relevance in XForms is significant. It affects two things, but the most important is that all non-relevant nodes are removed from XML data at the start of submission processing. This is important because an XML element with a schema choice content model is not valid until all but one of the choices is removed. This is called "pruning non-relevant nodes" in XForms, and it means that XForms does not expect instance data to be schema valid
John M. Boyer 060000VMNY 1,443 Views
The powers that be asked the dW bloggers to help spread the word about the download availability today of DB2 Viper Release Candidate 1.
Please see www.ibm.com/db2/viper
You can also get more info from
Anyway, why does this make sense on a blog about Workplace Forms?
As I said before in this blog, the purpose of a form is to collect data. If you want to collect data about a pizza order, you don't want us, but obviously there are more sophisticated information needs than that, right!
Question is, once you've got the data, what do you do with it?
There's always a server side to any web application. Products like DB2, Content Manager, and Portal Document Manager are about providing high strength persistence tiers for the consumption of data collected across an enterprise or across enterprises. Products like WebSphere Application Server and WebSphere Portal Server are about creating that middle layer that creates the logical bridge between from the point where you have the data to the point where you know what to do with it (something has to decide where to put the data, what data to retrieve, etc.).
And IBM Workplace Forms is there to put a beautiful face on it all by intelligently collecting data of any level of complexity while providing a richly textured user experience. And DB2 Viper is an especially good fit with IBM Workplace Forms because our forms contain XForms, which is a pure XML play, and the most significant addition to DB2 in Viper is the native-XML data model.