This is awesome. I have experience with running a CitusDB cluster and it pretty much solved a lot of the scaling problems I was having at the time. For it to go open source now, is of huge benefit to the future projects I have.
> With the release of newly open sourced Citus v5.0, pg_shard's codebase has been merged into Citus...
This is fantastic, sounds like the setup process is much simpler.
I wonder if they have introduced the Active/Active Master solution they were working on? I know before, there is 1 Master and multiple Worker nodes. The solution before was to have a passive backup of the Master.
If say, they released the Active/Active Master later on this year. That's huge. I can pretty much think of my DB solution as done at this point.
(Ozgun from Citus Data)
We're working on making Citus masterless. In all openness, we evaluated two different approaches to this in the past six months, and wrapped up the design for one. This design works well on the cloud, and we already demonstrated a working version: https://youtu.be/_nun2S6EdWo?t=411
For on-premise deployments, the primary challenge is set-up complexity. We're now prototyping one of those designs to know more: https://github.com/citusdata/citus/issues/389
We expect to share all the details and a concrete timeline in April.
Excellent news! Really looking forward to this one! :)
Can Citus handle geospatial sharding?
You could always compute a geohash and use that as a shard key... I'm not familiar enough with Citus' specific approach here, but using a limited geohash would give you close to what you're looking for.
good work! congrats =)
also in Turkish: kolaylıklar dilerim :)
Would it be possible (eventually) to use Citus for sharding within the datacenter, and BDR for master/master replication between datacenters?
Or is Citus taking over the master/master replication? (or is it doing something different?)
(Marco from Citus Data)
It seems at least theoretically possible.
Since the Citus master executes distributed queries by sending regular SQL queries to the Citus workers, you could already use BDR servers as workers and replicate the data between pairs of workers in different data centers and copy over the metadata on the master manually. However, some distributed joins and data loading features wouldn't work.
For all features to work, and to replicate the master, you would have to compile Citus against BDR, which probably requires a few code changes.
Postgres with sharding and master/master replication would be so awesome.
What does "works well on the cloud" mean specifically? Is there some difference when run on your own hardware?
I imagine it means it tolerates unreliable or higher-ish latency networks, maybe avoids multicast.
Can you elaborate on the scaling problems you were having?