跳至主要内容

博文

基于最终一致性的数据同步工具的设计与实现2

基于最终一致性的数据同步工具的设计与实现2 对于数据同步工具 Syncer,一致性是保证其正确性的关键因素,并且对于 实现中的很多选择提供了理论依据,所以本文着重对于相关理论进行一些简单的介绍。 FLP不可能性原理 FLP 不可能性原理(FLP Impossibility) [Fischer, M. J. et al., 1985] 的名 称起源于它的三位论文作者,Fischer、Lynch 和 Paterson。这篇论文研究的 是在异步系统中,想要解决共识问题,是否可能,会受到怎样的限制的讨论 (结论同样可以适用于,存在拜占庭故障的同步系统中)。 共识问题是普遍存在于分布式系统中的基本问题:即,使得分布式系统中 的各个处理者最后都对同一个结果值达成共识。一个可以解决停机故障的共识 问题的协议需要满足“终止”、“一致同意”和“有效性” [Consensus, 2019] 这三个特性: 所有无错误的进程(处理过程)最终都将决定一个值; 所有会做决定的无错误进程决定的都将是同一个值; 如果所有正确的进程提出了同一个值,那么任何正确的进程都将决定同一个值。 FLP 不可能性原理论文中的研究环境模型假设如下: 无拜占庭故障 拜占庭故障或称为拜占庭将军问题(The Byzantine Generals Problem) [Lamport, L. et al, 1982],是指在分布式系统中,某些处理 者可能在面对其他处理者时,展现不一致的结果,从而影响系统整体达 成一致的问题。拜占庭故障一般比较少见,在系统被黑客攻击等情况下 可能出现。 消息通信是可靠的,异步的 所有消息都可以正确的最终发送到接收方,并且只发送一次。此处异步 的定义在于:对于进程的处理消息延迟,对于传递消息的延迟都没有上 限。 通过反证法,FLP 不可能性原理形成了两条主要结论: 在异步模型环境下并不存在任何一个完全容错的分布式共识算法。除了 满足上述三个特性的,较“强”形式的共识算法不可能实现,论文还证 明了比它弱一些的、只需要“最终被决定的值必须被至少一个进程提出 过”的共识算法也是不可能实现的。 换句话说,在异步模型中,即使 仅仅只有一个进程可能崩溃的情况下,就已经不存在可以解决共识问题 分布式算法。这是该问题的理论
最新博文

基于最终一致性的数据同步工具的设计与实现1

基于最终一致性的数据同步工具的设计与实现1 这个系列的文章将基于本人的硕士毕业论文,介绍数据同步工具 Syncer 。 数据同步,顾名思义,就是在跨系统之间进行同步的数据。取决于不同的具体业务目的和应用场景,不同的数据同步工具及框架的功能侧重点往往并不相同,因而不同的人也会使用不同的名称来称呼这类工具,比如数据传输工具,数据采集工具等等。 在系统逐渐发展,复杂度增加之后会产生数据同步的需求。系统需要将数据同步到同构数据库做冗余备份,或者是同步到异构数据库提供其他服务。数据同步工具有许多应用场景:数据库管理员进行数据库实时备份以及数据恢复;微服务开发者为了进行数据自治进行数据冗余,或者为了提高查询效率从而将两张表进行Join;大数据开发平台为了进行大规模在线、离线计算而同步获取业务数据。 常见实现 为了解决数据同步的问题,有许多公司进行了尝试。在国内外许多数据同步工具中,有两种主要的实现方式: 批量查询数据,然后批量导入。常见例子有:为了将 MySQL 数据同步 到 Hive 中,可以直连 MySQL 数据库,然后 Select 表中数据,然后使 用特定格式保存到本地文件作为中间存储,最后把文件 Load 到 Hive 表中。这种方式实现简单,但是也有明显的缺陷:耗时随着数据量的变 大而逐渐变大;大量的 Select 语句对数据源的影响很大,从而影响正 常业务;实时性相对较差。所以这种实现方式逐渐被多数公司放弃,转而开发下面一种方式; 基于数据库的操作日志的数据变更分发。例如,Binary Log 是 MySQL 以二进制形式存储的日志,记录了 MySQL 中的数据变更,MySQL 集 群自身的主从同步也是基于 Binary Log 实现。同步工具通过实时获取 Binary Log 的变更事件,然后合并数据变更,从而还原数据源的数据。 通过这种异步变更通知的方式,可以有效降低对原有数据源的影响,并且具有相当好的实时性和性能。接下来将着重介绍这种数据同步工具的 实现。 目前在国内外基于 MySQL 数据库 Binary Log 的数据同步工具有许多,比如阿里巴巴的 Canal [Canal, 2019],Github 社区维护的 Debezium [Debezium, 2019]。Canal 支持从

Deadlock Diagnosis

Deadlock Diagnosis This blog records the process to diagnose a deadlock in test environment for future possible similar situation to reuse the methodology and commands. State Suddenly, requests to some paths of a microservice return 504 (Gateway timeout). The strange thing is other microservice works find and even stranger, some requests for other paths in that microservice can work. We soon find those failed requests are all write request for databases, so we guess it might be a transaction lock in MySQL. MySQL Lock In order to find whether there is some deadlocks in MySQL, we used the following commands: # identify locked tables show open tables where In_use > 0 ; # displays the [InnoDB Monitor](https://mariadb.com/kb/en/xtradbinnodb-monitors/) output, which is extensive InnoDB information (include transaction info) which can be useful in diagnosing problems. show engine innodb status ; # show process connected to mysql SHOW PROCESSLIST

Interrupt Thread

Recently, I am handling the graceful shutdown of Syncer . In the process of implementation, I was surprised by some counterintuitive part of Java. Let’s go on and see whether you know it. Thread States The following enumerations are states of Java Thread which is cited from Java documentation: NEW A thread that has not yet started is in this state. RUNNABLE A thread executing in the Java virtual machine is in this state. BLOCKED A thread that is blocked waiting for a monitor lock is in this state. WAITING A thread that is waiting indefinitely for another thread to perform a particular action is in this state. TIMED_WAITING A thread that is waiting for another thread to perform an action for up to a specified waiting time is in this state. TERMINATED A thread that has exited is in this state. The naming of state is fine, but those description is not clear enough though: What a Thread 's state if it blocked in IO operation? What a Thread 's state if it calle

Deep in Transaction (2)

Deep in Transaction (2) In last blog, we have mentioned that the job of a transaction is to keep state consistent when there are multiple transactions access object concurrent; there are server crash/failure; And we have discussed how a transaction will handle crash of server & client in deign level. Today, we focus on how transaction make objects consistent when multiple client access them, i.e. the Concurrency Control of transaction. Why Transaction Before dive into how to do it, we need to understand why we need it, or what problems may arise when concurrent access happens and we don’t control them. Access at Same Time We will illustrate two famous problems of concurrent problems by bank deposit/withdraw examples. Lost Update Transaction T Transaction U balance=b.getBalance(); balance=b.getBalance(); b.setBalance(balance+10); b.setBalance(balance+20); b 10 b 10 balance=b.getBalance(); 10 balance=b.getBalance();

Deep in Transaction (1)

Deep in Transaction (1) When it comes transaction, we think of ACID property, we think of a bunch of actions which should be done in all or nothing. Atomicity: guarantees that each transaction is treated as a single “unit” Consistency: ensures that a transaction can only bring the database from one valid state to another Isolation: ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially Durability: guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure It seems all four rules of ACID are the equally property for a transaction, but it is not. Why We Need Transaction To make it clear, the very first goal of transactions is to ensure that all of the objects managed by a server remain in a consistent state when they are accessed by multiple transactions and in the presence of server crashes.

Compile Java Using Java

Compile Java Using Java Recently, we decided to refactor the Filter module of [Syncer](https://github.com/zzt93/syncer/) , which is used to be implemented by Spring EL . We want to refactor it for three reasons: The Spring EL is not so fast compared with native Java code; The config specified by Spring EL is rather hard to write and maintain; The config expression defined by myself is rather limited in supported syntax; So, we came up with two options to upgrade it: First, we upgrade config expression with better hierarchy and support more powerful expression; Second, we introduced the Java code to do the config; The first option is not related with this post, so we will skip to next one: config application using Java code. Background In syncer, we listen on change of different input , and manipulate the data via filter module, then output to target output like following: version : 1.2 consumerId : sample1 input : masters :