Project lead on a new data warehouse initiative based on the cutting edge Cloudera stack including use of HDFS, Yarn, Zookeeper, Hive, and Impala.
I led the design and implementation of the whole data ingestion and management pipeline which includes the real time synchronisation of multiple heavily used and geographically disparate trading systems, based on a range of storage technologies. Technologies employed include Cloudera Impala, Reactive Java, Rabbit MQ, Redis, HDFS, Zookeeper and Elastic Search. Our destination is a highly scalable Cloudera Impala cluster which provides real time cross asset analytics to the business.