First I had hoped to keep op with the fysical world and show live icons.
In the end, I decided to demonstrated the recognition skills by photographing the order in which the bricks were put on the transport belt.
This is how I mounted the Raspberry Pi on the Lego machine
I have trained Watson for 6 different bricks.
By grabbing enough frames for each picture (ca. 35) AND a lot of negatives (ca. 150) I found the identification results to be above 98% !
I used the official Lego brick-part numbers as labels (and named the zip files accordingly to keep things consistent)
When you use the Watson Java SDK the code for training is a simple as this:
VisualRecognition service = new VisualRecognition(VisualRecognition.VERSION_DATE_2016_05_20);
String classifier = "legos_123";
ClassifierOptions createOptions = new ClassifierOptions.Builder()
.addClass("300321", new File(basePath+"\\300321.zip"))
.addClass("303921", new File(basePath+"\\303921.zip"))
.addClass("300324", new File(basePath+"\\300324.zip"))
.addClass("366024", new File(basePath+"\\366024.zip"))
.addClass("4161674", new File(basePath+"\\4161674.zip"))
.addClass("4211637", new File(basePath+"\\4211637.zip"))
When you work with TensorFlow you need to know either Python or C++, so NO Java this time.
Since I know neither, I decided to learn a bit of Python on the go.
The people from PyDev , have put a lot of effort in creating an Eclipse environment for Python, that way it all feels a bit more familiar.
Google has some good tutorials on TensorFlow and the codelab : TensorFlow for Poets, describes almost exactly what I want to achieve.
This approach allows you to retrain an existing model, Inception, with your own images.
Now this is all running on my laptop, that means in this case, no limits on the trainingset, I used all 3000+ photos (per twin) to retrain.
The output of the proces is visible in the screenshot below:
As you can see, the test accuracy is not that high, but that is what I will use for now.
Again, the cloud (Watson) has provided the limit here, I have tested 20 images per twin.
The results are displayed in the table below
As you can see, Watson does not give any result in 50% of the cases. When it has a result the score is never higher than 0.6 and there are two wrong classifications.
TensorFlow always gives a result and from to time to with scores higher than 0.9. Eight of the classifications were wrong for TensorFlow.
Given the differences in training:
Cloud versus On-Prem
Limited (free) trainingset versus large trainingset
I find the final results of TensorFlow much more convincing.
The IBM Watson way of working is so much easier to get started (long live cloud and API's) that I am inclined to proceed with Watson for now, and let TensorFlow rest until Google's "MachineLearing as a Service" becomes available.
Sure once I had setup my TensorFlow environment, the actual training part was not that much work anymore.
A little bonus for me is that I took my first steps with Python, with feels very doable.
I created the crawler based on the sample you can find in the
As Deepika Devarajan points out, there is no publicly available documentation, so if you need more info please contact your IBM rep to ask for this documentation.
Custom Crawler configuration
Before you start coding, you first need to get the option "Custom Crawler" in the list of Crawler types:
To get this option in the list, modify the config.properties file in .../webapps/ESAdmin/WEB-INF/
Restart the admin session, to see the effect.
Another effect of this setting is the Custom Crawler tab in the System Settings:
Implementing the crawler
In the customcrawler sample code, you will find several classes.
The CustomManager's role is to instantiate a TopSpace. It is your CustomManager class that you specify in the Custom Crawler Type settings.
is used to collect information about the system you are connecting to (like hostname and credentials)
is used to generated SubSpace (s)
The information collected by the TopSpaceis accessible via the CustomInfoclass
can have their own configuration, this can also be accessed through the CustomInfo class
are responsible for getting the list of content
provide the fields that you want to add to index, additionally to the body and standard fields
CustomContentis used to get the actual content.
Implementation for CRM Dynamics
In our case the TopSpaceasks for this information:
To access CRM Dynamics Online you need to register you application with Azure AD. The configuration of this part is outside the scope of this article.
The SubSpaces for our case are the CRM Entities , the screenshot below shows a list of all the known entities in our CRM system (notice the scrollbar)
In this case I want to crawl Accounts and Activities. Unfortunately not all entities use the same field names for their title- and/or memo-fields.
In our case we need to specify the title-field to name for the entity account. The first three options are pre-populated by the TopSpace, description is the most common field name for the memo field, so this default value can stay.
Now we have two configured search spaces that we can start to crawl:
When the crawling has finished, we can go to the miner to inspect the results.
In the screenshot you can see that we added the entity as an extra field, this way we can see that we have 7979 accounts and that of all the activities, email has the most, but that we also have 1 fax record :-)
The timestamps on the documents also transfer nicely into (in this case) the deviations view.
- Security: we did not implement document level security
- Social features : it would be great if we could see who created these records (other than in text)
- Leverage scheduling: although the second time the process takes a lot less time, we still need to fetch to all documents to get modifiedon timestamps. If we, somehow, could know from the scheduler what the last crawl time was, we can use this information in our data retrieval.
2. While the crawler was running, I created a few "dictionaries" that should be used to analyse the content. I used Content Analytics Sudio for this purpose.
The dictionaries :
Companies (I imported 6k+ accounts from our CRM system)
Products (I manually entered those). I used the lemma (synonym) function to group products like Office365 amd Office 365
Digital Workplace vocabulary (buzzwords that match our corporate strategy, also entered manually)
Explorer itself has a couple of standard dictionaries that understand the language (in this case Dutch and English).
3. Parsing Rules: In Studio you can create rules that are used to match documents in the index to the specifications you formalized in such a rule (also called annotation)
Did I already mention, that these are first steps ? ....
I created a rule that should find content where abbreviated (company)names ( IBM, NASA, HP, ....) appear in a sentence that also has a verb and an entry from the products dictionary see screenshot below:.
The result of applying this rule to the content of the crawled websites is shown in a screenshot of the " Miner" application of Watson Explorer.
What is obvious is the incorrect match for "ICT" , this is an abbreviation but NOT for a company !
The next screenshot shows company types and how often each type occurs in the documents in the index. The type of a company is imported from our CRM system (next to its name)
To display documents that contain companies of the type "Influencer" ("Beinvloeder" in Dutch), use this view. The company names are highlighted.
Again there are mismatches; Ilse in the document is a reference to " Ilse de Lange" and not to the company Ilse.
The same analysis can be performed for the Digital Workspace vocabulary, here I have selected the term " EMM" and this is the result. Notice the synonyms that also show up.
I don't have the rule working yet that combines the CRM data dictionary with the Products dictionary (should be similar to the abbreviation rule). I think one of the reasons is that the formal company name is not often used in the websites that we have crawled.
Machine learning could help us out here, but since we like to communicate in Dutch, we cannot go there today. For now it will be Human Learning, meaning that I have to incrementally improve the rules and dictionaries :-)
To get started I have asked several questions in this forum