Apache Kafka[1]
Apache kafka.png
Developer(s) Apache Software Foundation
Initial release January 2011; 6 years ago (2011-01)[2]
Stable release
0.10.20 / February 22, 2017; 32 days ago (2017-02-22)
Repository git-wip-us.apache.org/repos/asf/kafka.git
Development status Active
Written in Scala, Java
Operating system Cross-platform
Type Stream processing, Message broker
License Apache License 2.0
Website kafka.apache.org

Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a "massively scalable pub/sub message queue architected as a distributed transaction log,"[3] making it highly valuable for enterprise infrastructures to process streaming data. Additionally, Kafka connects to external systems (for data import/export) via Kafka Connect and provides Kafka Streams, a Java stream processing library.

The design is heavily influenced by transaction logs.[4]

Contents

HistoryEdit

Apache Kafka was originally developed by LinkedIn, and was subsequently open sourced in early 2011. Graduation from the Apache Incubator occurred on 23 October 2012. In November 2014, several engineers who worked on Kafka at LinkedIn created a new company named Confluent[5] with a focus on Kafka.

Enterprises that use KafkaEdit

The following is a list of notable enterprises that have used or are using Kafka:

Kafka performanceEdit

Due to its widespread integration into enterprise-level infrastructures, monitoring Kafka performance at scale has become an increasingly important issue. Monitoring end-to-end performance requires tracking metrics from brokers, consumer, and producers, in addition to monitoring ZooKeeper which is used by Kafka for coordination among consumers.[18][19] There are currently several monitoring platforms to track Kafka performance, both open-source, like LinkedIn's Burrow, as well as paid, like Datadog. In addition to these platforms, collecting Kafka data can also be performed using tools commonly bundled with Java, including JConsole.[20]

See alsoEdit

ReferencesEdit

  1. ^ "Mirror of Apache Kafka at GitHub]". github.com. Retrieved 6 March 2017. 
  2. ^ "Open-sourcing Kafka, LinkedIn's distributed message queue". Retrieved 27 October 2016. 
  3. ^ Monitoring Kafka performance metrics, Datadog Engineering Blog, accessed 23 May 2016/
  4. ^ The Log: What every software engineer should know about real-time data's unifying abstraction, LinkedIn Engineering Blog, accessed 5 May 2014
  5. ^ Primack, Dan. "LinkedIn engineers spin out to launch 'Kafka' startup Confluent". fortune.com. Retrieved 10 February 2015. 
  6. ^ "Exchange Market Data Streaming with Kafka". 
  7. ^ "OpenSOC: An Open Commitment to Security". Cisco blog. Retrieved 2016-02-03. 
  8. ^ "More data, more data". 
  9. ^ Doyung Yoon. "S2Graph : A Large-Scale Graph Database with HBase". 
  10. ^ "Kafka Usage in Ebay Communications Delivery Pipeline". 
  11. ^ "Kafka at HubSpot: Critical Consumer Metrics". 
  12. ^ Cheolsoo Park and Ashwin Shankar. "Netflix: Integrating Spark at Petabyte Scale". 
  13. ^ Shibi Sudhakaran of PayPal. "PayPal: Creating a Central Data Backbone: Couchbase Server to Kafka to Hadoop and Back (talk at Couchbase Connect 2015)". Couchbase. Retrieved 2016-02-03. 
  14. ^ "Shopify - Sarama is a Go library for Apache Kafka". 
  15. ^ "Concurrency and At Least Once Semantics with the New Kafka Consumer". 
  16. ^ Josh Baer. "How Apache Drives Spotify's Music Recommendations". 
  17. ^ "Stream Processing in Uber". InfoQ. Retrieved 2015-12-06. 
  18. ^ "Monitoring Kafka performance metrics". 2016-04-06. Retrieved 2016-10-05. 
  19. ^ Mouzakitis, Evan (2016-04-06). "Monitoring Kafka performance metrics". datadoghq.com. Retrieved 2016-10-05. 
  20. ^ "Collecting Kafka performance metrics - Datadog". 2016-04-06. Retrieved 2016-10-05. 

External linksEdit