1 Documentation on various internal structures.
3 Most important structure use an anonymous shared mmap()
4 so that child processes can watch them. (All the cli connections
5 are handled in child processes).
7 TODO: Re-investigate threads to see if we can use a thread to handle
8 cli connections without killing forwarding performance.
11 An array of session structures. This is one of the two
12 major data structures that are sync'ed across the cluster.
14 This array is statically allocated at startup time to a
15 compile time size (currently 50k sessions). This sets a
16 hard limit on the number of sessions a cluster can handle.
18 There is one element per l2tp session. (I.e. each active user).
20 The zero'th session is always invalid.
23 An array of tunnel structures. This is the other major data structure
24 that's actively sync'ed across the cluster.
26 As per sessions, this is statically allocated at startup time
27 to a compile time size limit.
29 There is one element per l2tp tunnel. (normally one per BRAS
30 that this cluster talks to).
32 The zero'th tunnel is always invalid.
36 A table holding all the IP address in the pool. As addresses
37 are used, they are tagged with the username of the session,
38 and the session index.
40 When they are free'd the username tag ISN'T cleared. This is
41 to ensure that were possible we re-allocate the same IP
42 address back to the same user.
45 A table holding active radius session. Whenever a radius
46 conversation is needed (login, accounting et al), a radius
51 A mapping of IP address to session structure. This is a
52 tenary tree (each byte of the IP address is used in turn
53 to index that level of the tree).
55 If the value is postive, it's considered to be an index
56 into the session table.
58 If it's negative, it's considered to be an index into
61 If it's zero, then there is no associated value.
65 ============================================================
67 Clustering: How it works.
69 At a high level, the various members of the cluster elect
70 a master. All other machines become slaves. Slaves handle normal
71 packet forwarding. Whenever a slave get a 'state changing' packet
72 (i.e. tunnel setup/teardown, session setup etc) it _doesn't_ handle
73 it, but instead forwards it to the master.
75 'State changing' it defined to be "a packet that would cause
76 a change in either a session or tunnel structure that isn't just
77 updating the idle time or byte counters". In practise, this means
78 also all LCP, IPCP, and L2TP control packets.
80 The master then handles the packet normally, updating
81 the session/tunnel structures. The changed structures are then
82 flooded out to the slaves via a multicast packet.
86 The master sends out a multicast 'heartbeat' packet
87 at least once every second. This packet contains a sequence number,
88 and any changes to the session/tunnel structures that have
89 been queued up. If there is room in the packet, it also sends
90 out a number of extra session/tunnel structures.
92 The sending out of 'extra' structures means that the
93 master will slowly walk the entire session and tunnel tables.
94 This allows a new slave to catch-up on cluster state.
97 Each heartbeat has an in-order sequence number. If a
98 slave receives a heartbeat with a sequence number other than
99 the one it was expecting, it drops the unexpected packet and
100 unicasts C_LASTSEEN to tell the master the last heartbeast it
101 had seen. The master normally than unicasts the missing packets
102 to the slave. If the master doesn't have the old packet any more
103 (i.e. it's outside the transmission window) then the master
104 unicasts C_KILL to the slave asking it to die. (it should then
105 restart, and catchup on state via the normal process).
108 All slaves send out a 'ping' once per second as a
109 multicast packet. This 'ping' contains the slave's ip address,
110 and most importantly: The number of seconds from epoch
111 that the slave started up. (I.e. the value of time(2) at
112 that the process started).
117 All machines start up as slaves.
119 Each slave listens for a heartbeat from the master.
120 If a slave fails to hear a heartbeat for N seconds then it
121 checks to see if it should become master.
123 A slave will become master if:
124 * It hasn't heard from a master for N seconds.
125 * It is the oldest of all it's peers (the other slaves).
126 * In the event of a tie, the machine with the
127 lowest IP address will win.
129 A 'peer' is any other slave machine that's send out a
130 ping in the last N seconds. (i.e. we must have seen
131 a recent ping from that slave for it to be considered).
133 The upshot of this is that no special communication
134 takes place when a slave becomes a master.
136 On initial cluster startup, the process would be (for example)
138 * 3 machines startup simultaneously, all as slaves.
139 * each machine sends out a multicast 'ping' every second.
140 * 15 seconds later, the machine with the lowest IP
141 address becomes master, and starts sending
143 * The remaining two machine hear the heartbeat and
144 set that machine as their master.
148 When a slave become master, the only structure maintained up
149 to date are the tunnel and session structures. This means
150 the master will rebuild a number of mappings.
152 #0. All the session and table structures are marked as
153 defined. (Even if we weren't fully up to date, it's
156 #1. All the token bucket filters are re-build from scratch
157 with the associated session to tbf pointers being re-built.
159 TODO: These changed tbf pointers aren't flooded to the slave right away!
160 Throttled session could take a couple of minutes to start working again
163 #2. The ipcache to session hash is rebuilt. (This isn't
164 strictly needed, but it's a safety measure).
166 #3. The mapping from the ippool into the session table
167 (and vice versa) is re-built.
172 At startup the entire session and table structures are
175 As it seens updates from the master, the updated structures
176 are marked as defined.
178 When there are no undefined tunnel or session structures, the
179 slave marks itself as 'up-to-date' and starts advertising routes