1. Hyperplanes

Consider a set of d+1d+1 points in Rd\mathbb{R}^{d} dimensions and assume that no group of three points is collinear- this way, any set of dd points forms a hyperplane. Firstly, we shall demonstrate that if a set of dd points is shattered in Rd1\mathbb{R}^{d-1} dimensions, then d+1d+1 points are also shattered in Rd\mathbb{R}^d. We can use this to reduce the problem to two dimensions, where we have seen that VCdim=3VC_{\text{dim}}=3.

Consider the representation in the picture below. Choose dd points and take the hyperplane formed by these. If the remaining point belongs to the hyperplane, then we can consider the projection to d1d-1 dimensions, and we are left with the case of (d1)+2(d-1)+2 points in Rd1\mathbb{R}^{d-1}, which we shall analyze later. If this is not the case, then we can show that if the dd points on the hyperplane are separable, we can always find a hyperplane in Rd\mathbb{R}^d that separates all the points. In the figure below, the dashed line on HdH_d represents the hyperplane in Rd1\mathbb{R}^{d-1} that separates the set of dd points. It is easy to see that any hyperplane that contains the remaining point and the dashed line (hyperplane in one lower dimension) is the solution to this problem.

We shall consider now the case of d+2d+2 points in Rd\mathbb{R}^d. For this purpose, we shall use Radon’s theorem that states that any set of d+2d+2 points in Rd\mathbb{R}^d can be partitioned in two sets X1X_1 and X2X_2 such that the corresponding convex hulls intersect. This theorem implies that d+2d+2 points in Rd\mathbb{R}^d cannot be shattered because if they were, then we would have two non-intersecting convex hulls separated by a plane, thus contradicting the theorem.

Proof

For d+2d+2 points xix_i in Rd\mathbb{R}^d one can always choose d+2d+2 parameters αi\alpha_i such that:

i=1d+2αixi=0,    i=1d+2αi=0\sum_{i=1}^{d+2}\alpha_ix_i=0,\;\; \sum_{i=1}^{d+2}\alpha_i=0

The reason is because one has d+2d+2 unknowns (αi\alpha_i) for d+1d+1 equations (dd coming from the first vector equation and an additional from the constraint on α\alpha). The second equation can be rewritten as a sum over positive α>\alpha_{>} and negative α<\alpha_{<}, that is, iαi>=iαi<\sum_{i}\alpha_i^{>}=\sum_{i}\alpha_i^{<}. Define α=iαi>\alpha=\sum_i\alpha_i^{>}, then we have

iαi>α=iαi<α\sum_i\frac{\alpha_i^{>}}{\alpha}=\sum_i\frac{\alpha_i^{<}}{\alpha}

which is a sum over numbers in the interval (0,1](0,1]. The vector equation separates into two terms

iαi>αxi=iαi<αxi\sum_{i}\frac{\alpha_i^{>}}{\alpha}x_i=\sum_i\frac{\alpha_i^{<}}{\alpha}x_i

and each of the sets X1=xi:αi>0X_1={x_i: \alpha_i^{>}\neq 0} and X2=xi:αi<0X_2={x_i: \alpha_i^{<}\neq 0} form convex hulls. This means that X1X_1 and X2X_2 intersect.

References


[1] Foundations of machine learning, M. Mohri, A. Rostamizadeh, A. Talwalkar