features chosen for subchallenge 2

hi! I'm just following along with this challenge for the learning experience, and noticed that in the example resource provided for subchallenge 2, there is a line that says ``` # TODO: Investigate which features were chosen... ``` can someone explain to me how to go about this? I'm not sure I fully understand the what this output means: ``` pandas.Series(gcv.best_estimator_.coef_[:,0]).to_numpy().nonzero() (array([ 20, 26, 55, 85, 93, 99, 159, 187, 200, 317, 460, 511, 512, 522, 637, 653, 660, 679, 705, 717, 768, 772, 909, 916, 954, 1071, 1085, 1099, 1148, 1160, 1166, 1177, 1180, 1194, 1203, 1209, 1210]),) ``` Thanks!

Created by sererenaa
Hi @sererenaa , The TODO is beyond the scope of this challenge, but it gets down to some interesting biology - "Which features are most predictive of survival? Is it patient age, or expression of some genes, or ex-vivo sensitivity to some drug?" To expand on Team Resham's response, `gcv.best_estimator.coef[:,0]` is the vector of feature weights produced by our linear regression. Our regularization made it so that, out of more than a thousand features, only the few most useful features are used and the rest are zero. So like Team Resham says - that little bit of python displays the indices of the non-zero features, aka the most useful features. Best, Jacob
Hi @sererenaa , If you do: X.columns[pandas.Series(gcv.best_estimator_.coef_[:,0]).to_numpy().nonzero()], it will tell you which features (feature names) are important. Hope this helps. Regards, Team Resham

Your web browser must have JavaScript enabled in order for this application to display correctly.
If you are an automated web crawler from a search engine, follow this AJAX application crawl link

Drop files to upload

features chosen for subchallenge 2 page is loading…