Si instanciamos la clase PCA asignando al parámetro n_components el valor None (valor por defecto, de hecho) no se seleccionaría un subconjunto de los componentes principales:
pca = PCA(n_components = None)
pca.fit(X_scaled)
pca.fit(X_scaled)
El atributo .explained_variance_ratio_ nos devuelve el porcentaje de varianza explicada por cada componente principal:
explained_variance = pca.explained_variance_ratio_.round(3)
explained_variance
explained_variance
array([0.366, 0.139, 0.116, 0.091, 0.066, 0.056, 0.046, 0.033, 0.028,
0.023, 0.015, 0.015, 0.006])
0.023, 0.015, 0.015, 0.006])
Vemos que los dos primeros componentes principales ya explican el 50% de la varianza.
Podemos mostrar gráficamente el porcentaje de la varianza explicado por cada componente principal y el porcentaje acumulado:
fig, ax = plt.subplots(figsize = (7, 5))
ax.bar(x = x_range, height = explained_variance, zorder = 20,
color = "navy", label = "% de varianza")
ax.bar(x = x_range, height = cum_explained_variance, alpha = 0.3,
label = "% de varianza acumulado")
ax.step(x = x_range, y = cum_explained_variance, where = "mid")
ax.set_xlabel("Componente principal")
ax.set_ylabel("Porcentaje de varianza")
for i, v in enumerate(explained_variance):
ax.text(x = i + 0.65, y = v + 0.01, s = str(round(v * 100)) + "%")
for i, v in enumerate(cum_explained_variance[1:], start = 1):
ax.text(x = i + 0.65, y = v + 0.01, s = str(round(v * 100)) + "%")
ax.set_xticks(x_range, labels = x_range)
ax.legend()
plt.show()
ax.bar(x = x_range, height = explained_variance, zorder = 20,
color = "navy", label = "% de varianza")
ax.bar(x = x_range, height = cum_explained_variance, alpha = 0.3,
label = "% de varianza acumulado")
ax.step(x = x_range, y = cum_explained_variance, where = "mid")
ax.set_xlabel("Componente principal")
ax.set_ylabel("Porcentaje de varianza")
for i, v in enumerate(explained_variance):
ax.text(x = i + 0.65, y = v + 0.01, s = str(round(v * 100)) + "%")
for i, v in enumerate(cum_explained_variance[1:], start = 1):
ax.text(x = i + 0.65, y = v + 0.01, s = str(round(v * 100)) + "%")
ax.set_xticks(x_range, labels = x_range)
ax.legend()
plt.show()