二部图的最大匹配 (Maximum Matching in Bipartite Graph)

上几篇博文主要是关于范围更新和范围查询的几个数据结构，接下去的主题是图论和图算法，希望能够学习和回忆起大部分图算法。

今天遇到一个问题，可以转化成二部图 (bipartite graph) 上的完美匹配的存在性问题，我们先来看一下完美匹配的理论，Hall’s Marriage theorem；然后介绍两个算法，用于解决完美匹配的超集——最大匹配问题。

Hall’s Marriage Theorem#

令 $G$ 表示一个二部图，左部和右部分别为 $X$ 和 $Y$ 。令 $W \subset X$ , $N_G(W)$ 为 $W$ 在 $Y$ 中的相邻点的集合。

那么如果存在一个匹配方式覆盖整个 $X$ 当且仅当

$\forall W \subset X, |W| \le |N_G(W)|$ ，也就是说每个 $X$ 的子集都有足够的邻居做匹配。

Deduction 1 [1]#

加入一个二部图 $G(X + Y, E)$ ， $|X| = |Y|$ ， $G$ 是连通图，且每个 $X$ 中的点的度数都不相同，那么 $G$ 上一定存在完美匹配。

证明：

首先，因为 $G$ 是连通图，所以 $\forall u \in X, deg(u) >= 1$ 。那么 $\forall W \subset X$ ， $\max\limits_{u \in W}\{deg(u)\} \ge |S|$ ，满足 Hall’s Marriage Theorem，得证。

Hungarian Algorithm#

事实上，我对照着找了半天，并没有找到一个叫匈牙利算法 (Hungarian Algorithm) 的用于二部图最大匹配的算法，唯一找到的 Hungarian Algorithm 是用于任务分配问题 (Assignment Problem)，也就是带权二部图的最大匹配。

由于最大匹配问题与最大流问题能够很容易的互相转换，所以同时我也找到了许多思想类似甚至一致的算法，如 Ford-Fulkerson Algorihtm，Edmonds-Karp Algorithm 等。

至于匈牙利算法这个名字是哪本书上提出的，我已经记不太清了 … 应该是某本学习过的图论书，在此主要讲一下它的具体思想和证明。

Alternating Path & Augmenting Path#

假设我们有一个二部图 $G(U + V, E)$ ，现在有一个匹配 $M \subset E$ ，此时如果 $M$ 中的边上的点集中不存在一个点，我们就说该点是未匹配的，未匹配边的定义类似。

交替路径 (alternating path)：从一个未匹配点出发，经过未匹配边、匹配边、未匹配边…这样交替的路径叫做交替路径。增广路径 (augmenting path)：从 $U$ 中一个未匹配点出发，到达 $V$ 中一个未匹配点的交替路叫做增广路径。

显然，由增广路径的定义，可以知道增广路径上未匹配边比匹配边要多一条，并且将这条路径上所有未匹配边改为匹配边，匹配边改为未匹配边，则修改后的匹配 $M'$ 比原来大 1。

Theorem of Augmenting Path#

一个匹配 $M$ 是最大匹配当且仅当那么在图 $G$ 上不存在增广路径。

证明如下：

假设存在一条增广路径， $M$ 显然不是最大的。所以我们只需要证明当不存在增广路径时， $M$ 是最大的。

假设存在一个匹配 $M$ ，不存在增广路径并且 $M$ 不是最大匹配，我们令 $M^*$ 为 $G$ 上的一个最大匹配，显然 $|M^*| > |M|$ 。

所以同时有 $|M^* - M| > |M - M^*|$ 。

考察所有在 $M^*$ 和 $M$ 对称差 ( $M^* \cup M - M^* \cap M$ ) 中的边，令 $G'$ 是由点 $U + V$ 和上述边构成的图。

因为 $G'$ 中的边是来自两个匹配，所以 $G'$ 上任意一个点最多与两条边相连。

因此，对于 $G'$ 上的任意联通分支，只可能是一条路或者一个环，并且边的数目是偶数，并且路或者环上对于 $M^* - M$ 或者 $M - M^*$ 一定构成交替路。

因为 $|M^* - M| > |M - M^*|$ 并且环都是偶数条边，所以一定有一条路，它的边，它的起点和终点都在 $M^* - M$ 中，且 $M^* - M$ 和 $M - M^*$ 交替构成，显然这条路对于 $M$ 构成增广路径，矛盾！

得证。

Pseudocode#

匈牙利算法有两种实现，分别基于 DFS 和 BFS，时间复杂度都是 $\mathcal{O}(|V||E|)$ 。

下面是 BFS 版本的伪代码：

1
Algorithm MaximumBigartiteMatching(G)
2
    initialize set M of edges // can be the empty set
3
    initialize queue Q with all the free vertices in V
4
    while not Empty(Q) do
5
        w ← Front(Q)
6
        if w ε V then
7
            for every vertex u adjacent to w do // u must be in U
8
                if u is free then // augment
9
                    M ← M union (w, u)
10
                    v ← w
11
                    while v is labeled do // follow the augmenting path
12
                        u ← label of v
13
                        M ← M - (v, u)  // (v, u) was in previous M
14
                        v ← label of u
15
                        M ← M union (v, u) // add the edge to the path
16
                    // start over
17
                    remove all vertex labels
18
                    reinitialize Q with all the free vertices in V
19
                    break // exit the for loop
20
                else // u is matched
21
                    if (w, u) not in M and u is unlabeled then
22
                    label u with w // represents an edge in E-M
23
                    Enqueue(Q, u)
24
                    // only way for a U vertex to enter the queue
25

26
        else // w ε U and therefore is matched with v
27
            v  ←  w's mate // (w, v) is in M
28
            label v with w // represents in M
29
            Enqueue(Q, v) // only way for a mated v to enter Q

相比于 BFS，DFS 版本的匈牙利算法更容易实现，它 C++ 代码可以参考附录。

Hopcroft-Karp Algorithm#

Hopcroft-Karp 算法是一个专用于解二部图最大匹配问题的算法，它最差情况的时间复杂度为 $\mathcal{O}(|E|\sqrt{|V|})$ ，最差情况下的空间开销为 $\mathcal{O}(|V|)$ 。

Hopcroft-Karp 算法是在 1973 年由 Hohn Hopcroft 和 Richard Karp 两位计算机学者发现的。

和匈牙利算法一样，Hopcroft-Karp 算法同样是不断地通过寻找增广路径，来增大部分匹配。不同的是，匈牙利算法每次只找到一条增广路径，而该算法则每次找增广路径的一个最大集合，从而我们只需要进行 $\mathcal{O}(\sqrt{|V|})$ 次迭代。

Hopcroft-Karp 算法循环以下两个阶段：

用 BFS 寻找下一个长度的增广路径，并且能遍历该长度下所有增广路径 (也就是上面所说的最大集合)。
如果存在更长的增广路径，对每个可能的起点 u，用 DFS 寻找并记录增广路径

每一次循环，BFS 所找到的最短增广路径的长度至少增加 1，所以在 $\sqrt{|V|}$ 次循环以后，能找到的最短增广路径长度至少为 $\sqrt{|V|}$ 。假设当前的部分匹配集合为 $M$ (边集)， $M$ 和最大匹配的对称差组成了一组点不相交的增广路径和交替环。如果这个集合内所有的路径的长度都至少为 $\sqrt{|V|}$ ，那么最多只有 $\sqrt{|V|}$ 条路径，那么最大匹配的大小与 $|M|$ 最多为 $\sqrt{|V|}$ 。而每次循环至少将匹配大小增加 1，所以直到算法结束最多还有 $\sqrt{|V|}$ 次循环。

每次循环中，BFS 最多遍历图中每条边，DFS 也是最多遍历每条边，所以每一轮循环的时间复杂度为 $\mathcal{O}({|E|})$ ，总时间复杂度为 $\mathcal{O}({|E|\sqrt{|V|}})$ 。

Pseudocode#

1
/*
2
 G = U ∪ V ∪ {NIL}
3
 where U and V are partition of graph and NIL is a special null vertex
4
*/
5

6
function BFS ()
7
    for each u in U
8
        if Pair_U[u] == NIL
9
            Dist[u] = 0
10
            Enqueue(Q,u)
11
        else
12
            Dist[u] = ∞
13
    Dist[NIL] = ∞
14
    while Empty(Q) == false
15
        u = Dequeue(Q)
16
        if Dist[u] < Dist[NIL]
17
            for each v in Adj[u]
18
                if Dist[ Pair_V[v] ] == ∞
19
                    Dist[ Pair_V[v] ] = Dist[u] + 1
20
                    Enqueue(Q,Pair_V[v])
21
    return Dist[NIL] != ∞
22

23
function DFS (u)
24
    if u != NIL
25
        for each v in Adj[u]
26
            if Dist[ Pair_V[v] ] == Dist[u] + 1
27
                if DFS(Pair_V[v]) == true
28
                    Pair_V[v] = u
29
                    Pair_U[u] = v
30
                    return true
31
        Dist[u] = ∞
32
        return false
33
    return true
34

35
function Hopcroft-Karp
36
    for each u in U
37
        Pair_U[u] = NIL
38
    for each v in V
39
        Pair_V[v] = NIL
40
    matching = 0
41
    while BFS() == true
42
        for each u in U
43
            if Pair_U[u] == NIL
44
                if DFS(u) == true
45
                    matching = matching + 1
46
    return matching

A Problem#

一个可以转换成做完美匹配的题目，题目大意：

二维平面上一共 n 个人和 n 个防空洞，现在你需要将 n 个人分配到防空洞中，使得每个防空洞仅容纳一个人，并且所有人进入防空洞的时间最短，即最晚进入防空洞的人的时间最短。一个人从 (X, Y) 移动到 (X1, Y1) 所需时间为 |X - X1| + |Y - Y1|。 1 <= n <= 100

这道题直接做我没有想到什么好办法，但是观察到可能解一定为某个人移动到某个防空洞的时间，所以我们将所有人移动到所有防空洞的时间全部计算出来并排序。

对于某个人移动到某个防空洞，假设耗时为 T，那么所有耗时小于等于 T 的移动操作是可行的。我们建立一张二部图，左边是人的集合，右边是防空洞的集合，对于所有可行操作，我们在二部图上添加一条对应的边。那么如果此时存在一种分配方式满足上述条件，它在图上一定是一个完美匹配，而目标就是找到这样最小的一个 T。

对于我们的二部图，最差情况为完全二部图，对于匈牙利算法判定完美匹配的时间复杂度为 $\mathcal{O}(n^3)$ ，Hopcroft-Karp 算法为 $\mathcal{O}(n^2\sqrt{n})$ ，所以判定复杂度还是比较高的。

假如我们一条一条添加，也就是按照耗时顺序添加，那么最坏情况一共要判定 $n^2$ 次，这太高了，这里数据比较小还可以，但是万一 n 大到 1000 就难说了。

还记得之前我们提过的减小判定次数的方式嘛？对，二分查找，一共判定 $2\log n$ 次。

在这里，我们同时也存在模拟复杂度，这里模拟为构造对应的二部图，每次构造的最坏时间复杂度为 $n^2$ ，所以总计时间复杂度为 $\mathcal{O}(n^3\log n)$ 或者 $\mathcal{O}(n^2\sqrt{n}\log n)$ 。

代码实现在附录中。

References#

[1] https://en.wikipedia.org/wiki/Hall%27s_marriage_theorem

[2] https://math.stackexchange.com/questions/1204270/bipartite-graph-has-perfect-matching

[3] https://en.wikipedia.org/wiki/Matching_(graph_theory)

[4] https://en.wikipedia.org/wiki/Hopcroft%E2%80%93Karp_algorithm

[5] https://www.topcoder.com/community/data-science/data-science-tutorials/maximum-flow-augmenting-path-algorithms-comparison/

[6] http://www.csl.mtu.edu/cs4321/www/Lectures/Lecture%2022%20-%20Maximum%20Matching%20in%20Bipartite%20Graph.htm

Appendix#

Air Defense Exercise#

1
#include <cstdio>
2
#include <cstdlib>
3
#include <iostream>
4
#include <algorithm>
5
#include <vector>
6
#include <string>
7
#include <stack>
8
#include <cmath>
9
#include <deque>
10
#include <queue>
11
#include <map>
12
#include <bitset>
13
#include <set>
14
#include <list>
15
#include <unordered_map>
16
#include <unordered_set>
17
#include <sstream>
18
#include <numeric>
19
#include <climits>
20
#include <utility>
21
#include <iomanip>
22
#include <cassert>
23

24
using namespace std;
25

26
using ll = long long;
27
using ii = pair<int, int>;
28
using iii = pair<int, ii>;
29
template <class T>
30
using vv = vector<vector<T>>;
31

32
#define rep(i, b) for (int i = 0; i < int(b); ++i)
33
#define reps(i, a, b) for (int i = int(a); i < int(b); ++i)
34
#define rrep(i, b) for (int i = int(b) - 1; i >= 0; --i)
35
#define rreps(i, a, b) for (int i = int(b) - 1; i >= a; --i)
36
#define repe(i, b) for (int i = 0; i <= int(b); ++i)
37
#define repse(i, a, b) for (int i = int(a); i <= int(b); ++i)
38
#define rrepe(i, b) for (int i = int(b); i >= 0; --i)
39
#define rrepse(i, a, b) for (int i = int(b); i >= int(a); --i)
40

41
#define all(a) a.begin(), a.end()
42
#define rall(a) a.rbegin(), a.rend()
43
#define sz(a) int(a.size())
44
#define mp(a, b) make_pair(a, b)
45

46
#define inf (INT_MAX / 2)
47
#define infl (LONG_MAX / 2)
48
#define infll (LLONG_MAX / 2)
49

50
#define X first
51
#define Y second
52
#define pb push_back
53
#define eb emplace_back
54

55
// tools for pair<int, int> & graph
56
template <class T, size_t M, size_t N>
57
class graph_delegate_t {
58
    T (&f)[M][N];
59

60
public:
61
    graph_delegate_t(T (&f)[M][N]) : f(f) {}
62
    T& operator[](const ii& s) { return f[s.first][s.second]; }
63
    const T& operator[](const ii& s) const { return f[s.first][s.second]; }
64
};
65
ii operator+(const ii& lhs, const ii& rhs) {
66
    return mp(lhs.first + rhs.first, lhs.second + rhs.second);
67
}
68

69
// clang-format off
70
template <class S, class T> ostream& operator<<(ostream& os, const pair<S, T>& t) { return os << "(" << t.first << "," << t.second << ")"; }
71
template <class T> ostream& operator<<(ostream& os, const vector<T>& t) { os << "{"; rep(i, t.size() - 1) { os << t[i] << ","; } if (!t.empty()) os << t.back(); os << "}"; return os; }
72
vector<string> __macro_split(const string& s) { vector<string> v; int d = 0, f = 0; string t; for (char c : s) { if (!d && c == ',') v.pb(t), t = ""; else t += c; if (c == '\"' || c == '\'') f ^= 1; if (!f && c == '(') ++d; if (!f && c == ')') --d; } v.pb(t); return v; }
73
void __args_output(vector<string>::iterator, vector<string>::iterator) { cerr << endl; }
74
template <typename T, typename... Args>
75
void __args_output(vector<string>::iterator it, vector<string>::iterator end, T a, Args... args) { cerr << it->substr((*it)[0] == ' ', it->length()) << " = " << a; if (++it != end) { cerr << ", "; } __args_output(it, end, args...); }
76
#define out(args...) { vector<string> __args = __macro_split(#args); __args_output(__args.begin(), __args.end(), args); }
77
// clang-format on
78

79
const int MAX_N = 100;
80
int n;
81
ii p[MAX_N], h[MAX_N];
82
iii d[MAX_N * MAX_N];
83

84
vector<int> edges[MAX_N];
85
int match[MAX_N];
86
bool visited[MAX_N];
87

88
void link(int u, int v) { edges[u].push_back(v); }
89

90
bool dfs(int u) {
91
    for (auto v : edges[u]) {
92
        if (visited[v]) continue;
93
        visited[v] = true;
94
        if (match[v] == -1 || dfs(match[v])) {
95
            match[v] = u;
96
            return true;
97
        }
98
    }
99
    return false;
100
}
101

102
bool hungarian() {
103
    int m = 0;
104
    fill_n(match, n, -1);
105
    rep(i, n) {
106
        fill_n(visited, n, false);
107
        if (dfs(i)) ++m;
108
    }
109
    return m == n;
110
}
111

112
int match_u[MAX_N], match_v[MAX_N];
113
int dist[MAX_N];
114
int NIL = MAX_N;
115

116
bool bfs() {
117
    queue<int> q;
118
    rep(u, n) {
119
        if (match_u[u] == NIL) {
120
            dist[u] = 0;
121
            q.push(u);
122
        } else {
123
            dist[u] = inf;
124
        }
125
    }
126

127
    dist[NIL] = inf;
128
    while (!q.empty()) {
129
        auto u = q.front();
130
        q.pop();
131
        for (auto v : edges[u]) {
132
            if (dist[match_v[v]] == inf) {
133
                dist[match_v[v]] = dist[u] + 1;
134
                if (match_v[v] != NIL) q.push(match_v[v]);
135
            }
136
        }
137
    }
138
    return dist[NIL] != inf;
139
}
140

141
bool dfs_h(int u) {
142
    if (u == NIL) return true;
143
    for (auto v : edges[u]) {
144
        if (dist[match_v[v]] == dist[u] + 1 && dfs_h(match_v[v])) {
145
            match_u[u] = v;
146
            match_v[v] = u;
147
            return true;
148
        }
149
    }
150
    dist[u] = inf;
151
    return false;
152
}
153

154
bool hopcraft_karp() {
155
    int m = 0;
156
    fill_n(match_u, n, NIL);
157
    fill_n(match_v, n, NIL);
158
    while (bfs()) {
159
        rep(u, n) {
160
            if (match_u[u] == NIL && dfs_h(u)) {
161
                ++m;
162
            }
163
        }
164
    }
165
    return m == n;
166
}
167

168
bool pfmatch() { return hopcraft_karp(); }
169

170
int main() {
171
    cin >> n;
172
    rep(i, n) { cin >> p[i].first >> p[i].second; }
173
    rep(i, n) { cin >> h[i].first >> h[i].second; }
174
    rep(i, n) {
175
        rep(j, n) {
176
            int idx = i * n + j;
177
            int dis = abs(p[i].X - h[j].X) + abs(p[i].Y - h[j].Y);
178
            d[idx] = {dis, {i, j}};
179
        }
180
    }
181
    sort(d, d + n * n);
182
    int l = 0, r = n * n - 1;
183
    // binary search
184
    // time complexity: O(n^3lgn) for hungarian,
185
    // O(n^2√n * lgn) for hopcroft-karp
186
    while (l < r && d[l].first != d[r].first) {
187
        // replay
188
        rep(i, n) { edges[i].clear(); }
189
        int mid = (l + r) / 2;
190
        repe(i, mid) {
191
            int pi = d[i].second.first, hj = d[i].second.second;
192
            link(hj, pi);
193
        }
194
        if (pfmatch()) {
195
            r = mid;
196
        } else {
197
            l = mid + 1;
198
        }
199
    }
200
    cout << d[l].first << endl;
201
    return 0;
202
}