So my question is, based on my understanding, we have all expression matrix **_E_** with dimension around 9,000 * 1,297. We also have the "true" cell locations calculated from 84 in situ matrix, we name it **_M1_** with dimension 3,039 * 1,297. We pre-select the subset 60 out of 84 in situ matrix, name it **_A_**, with dimension 3,039 * 60. We should use some machine learning function **_f_** to predict **_M2_**: ####**_$$\(M2 = f(E, A, M1)\)$$_** such that minimizing the top 10 locations of ####$$\(\Sigma_{i=1}^{10}\)$$dist(**$$\(columnMax\)$$(M1)**, **$$\(columnMax_i\)$$(M2)**) right? In that case, I would rather simply let the **_f_** return **_M1_**. Then what's the challenging part? Thanks!

Created by Zhi Huang hz9423
That's more clear. Thank you!
Indeed you CANNOT iterate and change the subset of matrix A you use, this problem is equivalent to using all 84 genes
Oh thanks, that's more clear. One more question: We have the original in-situ matrix $$\(A\)$$ (with dimension 3039 x 84), during the learning process, can we: subsetting matrix $$\(A\)$$ to get matrix $$\(A_{active}^k\)$$ (with dimension 3039 x 60) , while $$\(A_{active}^k\)$$ can be different over iterations $$\(k\)$$, without utilizing the information of the corresponding remained $$\(A_{inactive}^k\)$$ (with dimension 3039 x 24)? (one way is purely use the 60 genes that determined from $$\(E\)$$ over iterations) Or we cannot do such subsetting process during iteration? i.e. we have to fix the $$\(A_{active}\)$$ (with dimension 3039 x 60) arbitrarily and never change it again during the training? (Kindly note: This is different from the question in the thread [thread 3472](https://www.synapse.org/#!Synapse:syn15665609/discussion/threadId=3472))
No, as you are not allowed to use what you define as M1, you are only allowed to use the max(MCC) for each 1297. So what I meant is partition max(MCC), anyway I am sure you can figure it out
Dear Pablo: I might not quite understand. My $$\(M1\)$$ is in fact the "true" MCC matrix. Did you mean partitioning $$\(M1\)$$ (with dimension 3039 x 1297) to $$\(M1_{train}\)$$ (with dimension 3039 x 1000) and $$\(M1_{test}\)$$ (with dimension 3039 x 297)? Then use the $$\(M1_{train}\)$$ (with dimension 3039 x 1000) to train the machine learning function $$\(f\)$$, given the input $$\(M1_{train}\)$$, $$\(A\)$$, and $$\(E_{train}\)$$ (where $$\(E_{train}\)$$ is with dimension 8934 x 1000)? Then by using the trained function $$\(f\)$$, with inputs $$\(A\)$$, and $$\(E_{test}\)$$ (where $$\(E_{test}\)$$ is with dimension 8934 x 297) to derive $$\(M2\)$$ (with dimension 3039 x 297) to calculate the top 10 distance loss $$\(\Sigma_{i=1}^{10} dist(columnMax(M1_{test}), columnMax_i(M2))\)$$ ? Thanks for your replying.
Hi Zhi, well of course you have to somehow partition M1 so that you can train and test your ML function f... if you don't to that there is no problem to be solved Pablo

Question about using "True" cell locations from 84 in situ matrix page is loading…