hi! I'm just following along with this challenge for the learning experience, and noticed that in the example resource provided for subchallenge 2, there is a line that says ``` # TODO: Investigate which features were chosen... ``` can someone explain to me how to go about this? I'm not sure I fully understand the what this output means: ``` pandas.Series(gcv.best_estimator_.coef_[:,0]).to_numpy().nonzero() (array([ 20, 26, 55, 85, 93, 99, 159, 187, 200, 317, 460, 511, 512, 522, 637, 653, 660, 679, 705, 717, 768, 772, 909, 916, 954, 1071, 1085, 1099, 1148, 1160, 1166, 1177, 1180, 1194, 1203, 1209, 1210]),) ``` Thanks!

Created by sererenaa
Hi @sererenaa , The TODO is beyond the scope of this challenge, but it gets down to some interesting biology - "Which features are most predictive of survival? Is it patient age, or expression of some genes, or ex-vivo sensitivity to some drug?" To expand on Team Resham's response, `gcv.best_estimator.coef[:,0]` is the vector of feature weights produced by our linear regression. Our regularization made it so that, out of more than a thousand features, only the few most useful features are used and the rest are zero. So like Team Resham says - that little bit of python displays the indices of the non-zero features, aka the most useful features. Best, Jacob
Hi @sererenaa , If you do: X.columns[pandas.Series(gcv.best_estimator_.coef_[:,0]).to_numpy().nonzero()], it will tell you which features (feature names) are important. Hope this helps. Regards, Team Resham

features chosen for subchallenge 2 page is loading…