Hi, Could anyone clarify how the bins are defined using the expression data. E.g., a sequence with the value 11.3 is part of the 11th bin or 12th.

Created by Mohan Dash pyareedash
Thank you, that cleared things up!
Hi @pyareedash The bins are defined when quantifying expression by cell sorting. There are additional details in [this paper](https://www.nature.com/articles/s41587-019-0315-8) and the competition dataset uses the same approach. The expression values come from a weighted average of the bins in which that promoter was seen. So for instance, you could get 11.3 if you got 7 reads in bin 11 and 3 reads in bin 12 ( (7*11 + 3*12 / (3+7) ) - it's actually slightly more complex than this because we also account for the number of cells sorted per bin and the number of reads per bin. So in that example, the promoter was in both bin 11 and bin 12. However, it could be neither bin 11 nor bin 12. (e.g. x reads in bin 10 and y reads in bin 13 that result in a weighted average of 11.3).

GPRA sort-by-expression: How are bins defined? page is loading…