Please enable JavaScript.

Coggle requires JavaScript to display documents.

Components (Consumer p86 (Poll Loop p92 (poll() (find GroupCoordinator…

- - - - find GroupCoordinator broker, join consumer group and receive partition assignment
      - rebalancing :question: see Rebalance Listener and Internals mindmap
      - heartbeating :question: see HeartBeating
    - - use another thread (e.g. ShutdownHook) to call consumer.wakeup() - the only consumer method safe to call from another thread
        wakeup() cause poll() throw WakeupException, must call close() after that
  - - - consumers with same groupId and same topic will read individual subsets of the messages in the topic
    - - controls what consumer to do when no offsets were committed or offsets not exist
      - earliest: start from beginning of the partition
        latest: start from end of the partition
    - - when true, no control over number of duplicated records (records not processed before auto-commit kick in)
    - - frequency of auto commit
  - - - action of updating the current position in the partition
      - consumer produce a message containing commited offset for each partition to __consumer_offsets topic
      - after a rebalance, a consumer may be assigned new set of partitions, then it continues from last committed offset on that partition
        :!: it's possible Kafka twice process messages in between last committed offset and last message the client processed
        :!: it's possible Kafka miss messages in between last message the client processed and last committed offset
      - auto commit
        
        every 5 sec the consumer commit largest offset the client received from poll()
        
        enable.auto.commit=true, and use auto.commit.interval.ms to set commit interval
      - manual commit
        
        Commit Latest Offset
        
        commitSync()
        
        commit latest offset returned by poll() and block until commit is done
        
        :!::!::!: make sure you call commitSync() only after you are done processing all the records returned by poll()
        
        automatically retries committing unless there is unrecoverable exception, CommitFailedException etc.
        
        simplest and most reliable
        
        commitAsync()
        
        no retry
        
        can take a OffsetCommitCallback
        
        Retrying Async Commits
        
        Compare-And-Set
        use a monotonically increasing sequence number.
        Increase the seq no each time you commit and put it into the callback.
        Before retrying, if the seq no in callback is lower than current seq no, newer commit has happened and do NOT retry
        
        Best Practice
        
        use commitAsync() for normal committing
        use commitSync() just before shutdown or exit
        
        Commit Specified Offset p102
        
        commit[A]Sync(Map<TopicPartition, OffsetAndMetadata>)
        
        OffsetAndMetadata contains the offset of next message to read, NOT the last committed offset
    - - committing an offset will commit all offsets before it, so do NOT commit if some records failed in the middle
      - option1, commit the last record succeeded, store the rest to buffer and keep trying to process.
        use consumer.pause() to ensure no new data returned in additional poll()
      - option 2, dead-letter-queue: write failed records to a separate retry topic, and use separate consumer group to handle retries there
  - - - onPartitionsRevoked(Collection<TopicPartition> partitions)
        
        called before rebalancing starts and after the consumer stopped consuming
        
        used to commit offsets
      - onPartitionsAssigned(Collection<TopicPartition> partitions)
        
        called after partitions are reassigned and before consumer start consuming
        
        seek(TopicPartition, offset) to correct offset
- - - - min #replicas committed before consider successful write
      - 0: fire-n-forget, no commit guarantee
        1: only wait for leader replica commit is done
        all: wait until ALL replica committed
        -1: wait until all In-Sync-Replicas committed
    - - :star: snappy
        
        low CPU overhead, good perf
      - gzip
        
        more CPU and time, better compression ratio
      - lz4
    - - make it longer than the recovery time of a crashed broker
    - - make this match message.max.bytes in broker config
        to avoid broker rejecting message due to size too large
  - - - send and don't know it succeed or not
    - - producer.send() returns a Future, use future.get() to block until a response received from broker
    - - producer.send(callback), execute callback async
- - - - change send/receive buffers memory size for each socket
        default (net.core.wmem_default = 131072/128KB, net.core.rmem_default = 131072/128KB) and
        maximum (net.core.wmem_max = 2097152/2MB, net.core.rmem_max = 2097152/2MB)
      - change send/receive buffer for TCP sockets via
        net.ipv4.tcp_wmem = 4096 65536 2048000 and
        net.ipv4.tcp_rmem = 4096 65536 2048000
      - enable TCP window scaling by net.ipv4.tcp_window_scaling = 1, client transfer data more efficiently and broker buffers data
      - increase net.ipv4.tcp_max_syn_backlog > 1024, allow greater number of simultaneous connections to be accepted
      - increase not.core.netdev_max_backlog > 1000, assist with bursts of network traffic
    - - use filesystem as XFS
      - set noatime mount option for the mount point.
        because (last access time) atime is updated everytime a file is read, which is useless and cost large amount of writes
    - - best to avoid swapping
      - set vm.dirty_background_ratio to lower than default of 10
      - increase vm.dirty_ratio to [60, 80]
      - monitor dirty pages at /proc/vmstat
  - - - use G1 for auto adjust to diff workloads and provide consistent pause GC time
      - MaxGCPauseMillis - preferred pause time for each GC cycle
      - InitiatingHeapOccupancyPercent - percentage of total heap usage to trigger G1 GC cycle
      - :!: default Kafka start script use ParNew and CMS, change env var KAFKA_JVM_PERFORMANCE_OPTS