In part 1 of this blog we had seen how data that is handled in an ecommerce system could be bucketed into various perspectives and categorized into the role it plays or supports. And we discussed two use cases which I shall restate here:
UC1: A shopper must be allowed to define and manage any number of personal information about them.
UC2: A shopper must be shown personalized messages based on personal information they have shared
In this blog let us get down to the solutioning decisions and see how the puzzle pieces fall into place.
For properly solutioning this we must revisit the data that we will end up capturing and understand what qualities the data store must support. Apart from the obvious needs of storing, retrieving and updating data the choice should allow for discovery and exploration of the data over and beyond the relationship established through the data model. Let us discuss this need through a couple of examples of the second use case.
Beyond saved relations
Example 1: If the retailer wants to show a personalized message such as "Your favorite sports team is the most popular team among shoppers in our stores". For this we must be able to discover if indeed the sports team entered by the current shopper is mentioned by most of the users. A simple document store with some faceting capability can give us this information.
Example 2: On the weekend before Valentine's day every female shopper who have one of their favorite colors as red, pink, dark brown or chocolate will be shown an marketing widget that lists a set of select apparel accessories.
The above examples may or may not make sense as real use case examples, but are taken to illustrate two distinct problems - one is that of aggregation and other is that of segmentation. Similarly, from the retailer's perspective, this design should also support deduplication and standardization --- let me call this using a term that is superset of both: categorization. Without getting a view of categorization, merchandisers and marketers won't be able to utilize the rich data gathered. And we need robust analytics to help us translate the data into trends and relate them to behaviors that a retailer can target for their merchandising and marketing decisions.
Now NoSQL can help us solve the shopper perspective well and provide some help regarding categorization. It can integrate with an analytics solution for the last bit. The data we gather regarding shopper's personal profile should now be used to personalize his/her shopping experience. And the rules of influence could be self contained within the profile information or may need to be executed in the context or in combination of information from catalog, orders or marketing sub-systems. This is actually one of the reasons why we traditionally choose a relational data model solution for this.
And this leads to the first decision point - the data must be available for direct queries from client systems and from ecommerce business logic systems and should be map-able with existing relational data model. This is just to say that when I save information in my NoSQL data store, I will include a few key values such as member_id, store_id, catalog_id, address_id and language_id.
Second decision point is to mirror relevant promotional, marketing and search rules information from relational data store to the NoSQL data store. This is particularly easy since this information is available in XML format CLOBs in WebSphere commerce and lends itself to easy import into data bases like Cloudant or MongoDB. Obviously, every time a stageprop happens, a process should push updates from relational model to NoSQL model.
Third decision point is to extract information from NoSQL data store for analytics and feeding this back into the system of record for factoring it in promotional, pricing, search or merchandising rules. Now it may be attractive for analytics to actually run directly on the NoSQL store itself - but worst case we need a process to export, FTP to analytics system and import into its data store. I advocate the analytics to instead run on the NoSQL data store to become real time (subject to performance considerations) and create outcome of the analytics also in the same data store.
Fourth decision point is to keep authorization, contract entitlement and organizational logic away from this NoSQL data store. These can be enforced at a different layer.