How to make good recommendations

Today, most online shop systems offer product recommendations. In addition to the page navigation and the onsite search, recommendations are another good possibility to lead customers to suitable products. If recommendations are based on click streams and purchases of customers, usually an external standard software is used for generating these recommendations. Otto.de uses the prudsys Realtime Decisioning Engine (prudsys RDE).

How to get many recommendations

Usually a very simple system architecture is implemented:

simple_architecture

This scenario is particularly used if for example an initial positioning and configuration of recommendations cannot be found in the shop. In this case, the prudsys RDE only computes the products to be recommended while the concrete presentation of recommended products has to be done by the shop software itself. Hence, this simple architecture can compute a comparatively large number of recommendations.

How to get more recommendations

If product recommendations have finally been accepted by the customers, the necessity for a deeper technical integration into the shop rises: An uninterruptible delivery of recommendations becomes mandatory.
For this reason, the prudsys RDE by default ships with a load balancer allowing more than one system in parallel.

extended_architecture

This architecture allows delivery of product recommendations without any interruptions. Furthermore, load is now distributed over many recommendation servers. However, this combination of components has some disadvantages: Each recommendation server learns recommendation rules itself. While the load balancer component assigns a certain recommendation server for each customer, specific rule sets are generated (‚learned‘) on each server that have to be merged from time to time.
On the other side, the load balancer cannot distribute requests of one customer to more than one server, because in that case recommendations based on the click stream history of a customer have to be computed using many servers. This is not feasible in the given system architecture.

How to get recommendations for Otto.de

To meet Otto.de’s functional and non-functional goals, the internal architecture of the prudsys RDE was improved. The concrete activities were:

  • Reach a consistent set of recommendation rules: All customers should get recommendations on the same rule base.
  • Improve scalability: Scale-Out should be easily possible and additional operational effort minimized.
  • Improve availability: Updates of product data should be possible online and in an incremental mode. This is a challenging goal because the prudsys RDE currently has to function correctly with about 600.000 products as a base for products to be recommended. In the future, the number of products will increase to a multiple of that.

Therefore, the following system architecture was developed in collaboration with prudsys.

high_performance_architecture

What’s eye-catching is that the recommendation system is now separated in to two types of nodes (’node‘ meaning one server in the recommendation system). The learning node performs all operations necessary to maintain the rule base. This node also informs all other nodes about changes in the rule base. The non-learning nodes in that case only have to deliver product recommendations but don’t have to update their rule base. That works because the load balancer sends all customer requests to the learning node as well as to one non-learning node. The learning node immediately responds with http return code 200 and then starts computing the received request. The non-learning node in contrast performs a lookup into its rule base to find appropriate recommendations and sends them back to the load balancer which is now able to respond to the customer’s (which technically means Otto.de’s) request.

rde_request_flow

The fact that the learning node can now calculate a larger number of requests is based on the utilisation of a statistical effect: If a certain number of requests are exceeded, responses are always the same. For this reason the learning node has a queue for requests. If this queue reaches a certain level, the learning node starts to skip requests. Due to the large amount of previously processed data the computed results are the same.

Of course there is a number of parameters, e.g. hardware size, which determine when the queue should be used. In practice we have not needed the queue yet. For a state-of-the-art quad-core system our current traffic of about 100 requests per second is not a challenge. However, more important for us is that this system architecture has enough capacity for future peaks concerning load as well as the utilisation of sophisticated filters for selection and computation of suitable recommendations.

Another new function helps if a node crashes: Every node is now able to synchronize itself with the rules of another node. By default, the non-learning nodes sychronize themselves with the learning one. Furthermore, this ability allows us to easily set up test environments with the same data as the production system. Finally, to reduce the performance impact of a crashed node for the complete system, every non-learning node has a slave system which takes over if its master crashes.

Compared to former versions of the prudsys RDE, the load balancer has also been improved. It can now import updated product data in a full import mode as well as in an incremental update mode and distribute it to all system nodes. Additionally, the load balancer is now able to import new recommendation rules, e.g. for the computation of similar products.

Conclusion

The close collaboration with prudsys has led to a more powerful recommendation system and requirements for future capabilities were implemented. The system’s performance was improved and new functions added such as the continuos provisioning with product data or a cluster resync function for single systems after a restart.

As a result, OTTO now runs a high-performance recommendation system with fast request processing, high availability and a consistent base of recommendation data.