Pulsar Clients | Apache Pulsar (2024)

Pulsar exposes a client API with language bindings for Java, C++, Go, Python, Node.js and C#. The client API optimizes and encapsulates Pulsar's client-broker communication protocol and exposes a simple and intuitive API for use by applications.

Pulsar client libraries support transparent reconnection and/or connection failover to brokers, queuing of messages until acknowledged by the broker, and heuristics such as connection retries with backoff.

Client setup phase

Before an application creates a producer/consumer, the Pulsar client library needs to initiate a setup phase including two steps:

  1. The client attempts to determine the owner of the topic by sending an HTTP lookup request to the broker. The request could reach one of the active brokers which, by looking at the (cached) zookeeper metadata knows who is serving the topic or, in case nobody is serving it, tries to assign it to the least loaded broker.
  2. Once the client library has the broker address, it creates a TCP connection (or reuses an existing connection from the pool) and authenticates it. Within this connection, the client and broker exchange binary commands from a custom protocol. At this point, the client sends a command to create producer/consumer to the broker, which will comply after having validated the authorization policy.

Whenever the TCP connection breaks, the client immediately re-initiates this setup phase and keeps trying with exponential backoff to re-establish the producer or consumer until the operation succeeds.

Producer

A producer is a process that attaches to a topic and publishes messages to a Pulsar broker. The Pulsar broker processes the messages.

Send mode

Producers send messages to brokers synchronously (sync) or asynchronously (async).

ModeDescription
Sync sendThe producer waits for an acknowledgment from the broker after sending every message. If the acknowledgment is not received, the producer treats the sending operation as a failure.
Async sendThe producer puts a message in a blocking queue and returns immediately. The client library sends the message to the broker in the background. If the queue is full (you can configure the maximum size), the producer is blocked or fails immediately when calling the API, depending on arguments passed to the producer.

Access mode

You can have different types of access modes on topics for producers.

Access modeDescription
SharedMultiple producers can publish on a topic.

This is the default setting.

ExclusiveOnly one producer can publish on a topic.

If there is already a producer connected, other producers trying to publish on this topic get errors immediately.

The "old" producer is evicted and a "new" producer is selected to be the next exclusive producer if the "old" producer experiences a network partition with the broker.

ExclusiveWithFencingOnly one producer can publish on a topic.

If there is already a producer connected, it will be removed and invalidated immediately.

WaitForExclusiveIf there is already a producer connected, the producer creation is pending (rather than timing out) until the producer gets the Exclusive access.

The producer that succeeds in becoming the exclusive one is treated as the leader. Consequently, if you want to implement a leader election scheme for your application, you can use this access mode. Note that the leader pattern scheme mentioned refers to using Pulsar as a Write-Ahead Log (WAL) which means the leader writes its "decisions" to the topic. On error cases, the leader will get notified it is no longer the leader only when it tries to write a message and fails on appropriate error, by the broker.

note

Once an application creates a producer with Exclusive or WaitForExclusive access mode successfully, the instance of this application is guaranteed to be the only writer to the topic. Any other producers trying to produce messages on this topic will either get errors immediately or have to wait until they get the Exclusive access.For more information, see PIP 68: Exclusive Producer.

You can set producer access mode through Java Client API. For more information, see ProducerAccessMode in ProducerBuilder.java file.

Consumer

A consumer is a process that attaches to a topic via a subscription and then receives messages.

Pulsar Clients | Apache Pulsar (1)

A consumer sends a flow permit request to a broker to get messages. There is a queue at the consumer side to receive messages pushed from the broker. You can configure the queue size with the receiverQueueSize parameter. The default size is 1000). Each time consumer.receive() is called, a message is dequeued from the buffer.

Receive mode

Messages are received from brokers either synchronously (sync) or asynchronously (async).

ModeDescription
Sync receiveA sync receive is blocked until a message is available.
Async receiveAn async receive returns immediately with a future value—for example, a CompletableFuture in Java—that completes once a new message is available.

Listener

Client libraries provide listener implementation for consumers. For example, the Java client provides a MesssageListener interface. In this interface, the received method is called whenever a new message is received.

Reader

In Pulsar, the "standard" consumer interface involves using consumers to listen on topics, process incoming messages, and finally acknowledge those messages when they are processed. Whenever a new subscription is created, it is initially positioned at the end of the topic (by default), and consumers associated with that subscription begin reading with the first message created afterward. Whenever a consumer connects to a topic using a pre-existing subscription, it begins reading from the earliest message un-acked within that subscription. In summary, with the consumer interface, subscription cursors are automatically managed by Pulsar in response to message acknowledgments.

The reader interface for Pulsar enables applications to manually manage cursors. When you use a reader to connect to a topic---rather than a consumer---you need to specify which message the reader begins reading from when it connects to a topic. When connecting to a topic, the reader interface enables you to begin with:

  • The earliest available message in the topic.
  • The latest available message in the topic.
  • Some other messages between the earliest and the latest. If you select this option, you'll need to explicitly provide a message ID. Your application will be responsible for "knowing" this message ID in advance, perhaps fetching it from a persistent data store or cache.

The reader interface is helpful for use cases like using Pulsar to provide effectively-once processing semantics for a stream processing system. For this use case, the stream processing system must be able to "rewind" topics to a specific message and begin reading there. The reader interface provides Pulsar clients with the low-level abstraction necessary to "manually position" themselves within a topic.

Internally, the reader interface is implemented as a consumer using an exclusive, non-durable subscription to the topic with a randomly-allocated name.

tip

Unlike subscription/consumer, readers are non-durable in nature and do not prevent data in a topic from being deleted, thus it is strongly advised that data retention be configured. If data retention for a topic is not configured for an adequate amount of time, messages that the reader has not yet read might be deleted. This causes the readers to essentially skip messages. Configuring the data retention for a topic guarantees the reader with a certain duration to read a message.

Please also note that a reader can have a "backlog", but the metric is only used for users to know how behind the reader is. The metric is not considered for any backlog quota calculations.

Pulsar Clients | Apache Pulsar (2)

TableView

The TableView interface serves an encapsulated access pattern, providing a continuously updated key-value map view of the compacted topic data. Messages without keys will be ignored.

With TableView, Pulsar clients can fetch all the message updates from a topic and construct a map with the latest values of each key. These values can then be used to build a local cache of data. In addition, you can register consumers with the TableView by specifying a listener to perform a scan of the map and then receive notifications when new messages are received. Consequently, event handling can be triggered to serve use cases, such as event-driven applications and message monitoring.

note

Each TableView uses one Reader instance per partition, and reads the topic starting from the compacted view by default. It is highly recommended to enable automatic compaction by configuring the topic compaction policies for the given topic or namespace. More frequent compaction results in shorter startup times because less data is replayed to reconstruct the TableView of the topic. Starting from Pulsar 2.11.0, TableView also supports reading non-persistent topics, but it does not guarantee data consistency.

The following figure illustrates the dynamic construction of a TableView updated with newer values of each key.

Pulsar Clients | Apache Pulsar (3)

Pulsar Clients | Apache Pulsar (2024)

FAQs

Which companies use Pulsar? ›

Who uses Apache Pulsar?
CompanyWebsiteCountry
Overstock.com Incoverstock.comUnited States
Narvarnarvar.comUnited States
Impulsoimpulso.itItaly
Alibaba Group Holding Ltdalibabagroup.comChina
1 more row

What is Apache Pulsar client? ›

Pulsar client APIs allow you to create and configure producers, consumers, and readers; produce and consume messages; perform authentication and authorization tasks, and so on via programmable interfaces.

How do I use Pulsar client? ›

To use the latest version, add the pulsar-client library to your build configuration. pulsar-client and pulsar-client-admin shade dependencies via maven-shade-plugin to avoid conflicts of the underlying dependency packages (such as Netty). If you do not want to manage dependency conflicts manually, you can use them.

Is Pulsar better than Kafka? ›

Kafka excels in event streaming, incorporating tools that enable the creation of streaming pipelines, features for processing events, ordered parallel message delivery, and more. Pulsar offers the majority of these features, with the exception of event streaming pipelines.

What is the market share of Pulsar? ›

The Pulsar series holds a dominant market share of 68.73 percent, with the 125cc variant garnering the most attention.

What are the different types of Pulsar consumers? ›

Apache Pulsar is a publish-subscribe distributed messaging system. When consumers subscribe to topics in Pulsar, there are four different types to choose from: Exclusive, Failover, Shared, and Key_Shared.

What is Pulsar used for? ›

Apache® Pulsar™ is an open-source, distributed messaging and streaming platform built for the cloud.

What is alternative to Pulsar Manager? ›

Alternatives to Pulsar Platform
  • Brandwatch Consumer Intelligence.
  • Meltwater.
  • NetBase Quid.
  • Sprout Social.
  • Hootsuite.
  • Cision Communications Cloud.
  • Reputation.
  • Sprinklr Social.

What is the difference between Pulsar and NiFi? ›

NiFi is focused on making it easy to move data between software systems, rather than doing anything with it long term. Pulsar, meanwhile, was designed to act as a long-term repository of event data and provides strong integration with popular stream processing frameworks such as Flink and Spark.

Does Pulsar have a software? ›

Software & Applications

Unique, cutting-edge mobile application that binds together opto-electronics and Android or iOS mobile devices to offer a symbiosis of advanced optic capabilities and the latest digital features.

What is the difference between Pulsar reader and consumer? ›

A reader is just a consumer without a cursor. This means that Pulsar does not keep track of your progress and there is no need to acknowledge messages. Here's an example that begins reading from the earliest available message on a topic.

What is a Pulsar subscription? ›

Subscriptions in Pulsar describe which consumers are consuming data from a topic and how they want to consume that data. Subscriptions are managed in the broker as a collection of metadata about a topic and its subscribed consumers.

Does Netflix use Apache Kafka? ›

Netflix embraces Apache Kafka® as the de-facto standard for its eventing, messaging, and stream processing needs. Kafka acts as a bridge for all point-to-point and Netflix Studio wide communications.

Does Tesla use Apache Kafka? ›

Tesla has built a Kafka-based data platform infrastructure “to support millions of devices and trillions of data points per day”.

Why use Pulsar over Kafka? ›

Pulsar provides the message streaming and publishing features that Kafka does, but adds the ability to persist the data for longer periods. Pulsar offers data storage persistence using Apache Bookkeeper. Bookkeeper maintains the data and helps offload the data persistence outside the cluster.

How popular is Apache Pulsar? ›

Apache Pulsar is the #12 ranked solution in Streaming Analytics tools.

What country is Pulsar from? ›

PULSAR is part of Yukon Advanced Optics Worldwide, a high-tech European company group with its head office located in Lithuania. Based in the North-Eastern part of Europe, we are constantly inspired by the surrounding nature, green woods, endless meadows, and calming rivers of this land.

What is Pulsar marketing? ›

We are a Digital Marketing agency for #SME (Small and medium-sized enterprises). Our main goal is to get more clients for #businessowners faster, cheaper, and effortless.

References

Top Articles
Latest Posts
Article information

Author: Reed Wilderman

Last Updated:

Views: 6274

Rating: 4.1 / 5 (52 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Reed Wilderman

Birthday: 1992-06-14

Address: 998 Estell Village, Lake Oscarberg, SD 48713-6877

Phone: +21813267449721

Job: Technology Engineer

Hobby: Swimming, Do it yourself, Beekeeping, Lapidary, Cosplaying, Hiking, Graffiti

Introduction: My name is Reed Wilderman, I am a faithful, bright, lucky, adventurous, lively, rich, vast person who loves writing and wants to share my knowledge and understanding with you.