DB_ENV->rep_elect |
#include <db.h>int DB_ENV->rep_elect(DB_ENV *env, u_int32_t nsites, u_int32_t nvotes, u_int32_t flags);
The DB_ENV->rep_elect method holds an election for the master of a replication group.
The DB_ENV->rep_elect method is not called by most replication applications. It should only be called by applications implementing their own network transport layer, explicitly holding replication group elections and handling replication messages outside of the replication manager framework.
If the election is successful, Berkeley DB will notify the application of the results of the election by means of either the DB_EVENT_REP_ELECTED or DB_EVENT_REP_NEWMASTER events (see DB_ENV->set_event_notify method for more information). The application is responsible for adjusting its relationship to the other database environments in the replication group, including directing all database updates to the newly selected master, in accordance with the results of the election.
The thread of control that calls the DB_ENV->rep_elect method must not be the thread of control that processes incoming messages; processing the incoming messages is necessary to successfully complete an election.
ParametersElections are done in two parts: first, replication sites collect information from the other replication sites they know about, and second, replication sites cast their votes for a new master. The second phase is triggered by one of two things: either the replication site gets election information from nsites sites, or the election timeout expires. Once the second phase is triggered, the replication site will cast a vote for the new master of its choice if, and only if, the site has election information from at least nvotes sites. If a site receives nvotes votes for it to become the new master, then it will become the new master.
We recommend nvotes be set to at least:
(sites participating in the election / 2) + 1
to ensure there are never more than two masters active at the same time even in the case of a network partition. When a network partitions, the side of the partition with more than half the environments will elect a new master and continue, while the environments communicating with fewer than half of the environments will fail to find a new master, as no site can get nvotes votes.
We recommend nsites be set to:
number of sites in the replication group - 1
when choosing a new master after a current master fails. This allows the group to reach a consensus without having to wait for the timeout to expire.
When choosing a master from among a group of client sites all restarting at the same time, it makes more sense to set nsites to the total number of sites in the group, since there is no known missing site. Furthermore, in order to ensure the best choice from among sites that may take longer to boot than the local site, setting nvotes also to this same total number of sites will guarantee that every site in the group is considered. Alternatively, using the special timeout for full elections allows full participation on restart but allows election of a master if one site does not reboot and rejoin the group in a reasonable amount of time. (See the Elections section in the Berkeley DB Reference Guide for more information.)
Setting nsites to lower values can increase the speed of an election, but can also result in election failure, and is usually not recommended.
The DB_ENV->rep_elect method may fail and return one of the following non-zero errors:
Copyright (c) 1996,2008 Oracle. All rights reserved.