Dear organizers, We are working on parametric models for the single-cell phosphorylation data, and we have a question on how to properly interpret the distribution data for the markers. If we aggregate the markers measurements by cell-line, most of their expression distribution presents a peak in the very initial value; taking as example the p.RB marker 100-bin expression distribution for the HCC1187 cell-line, under the EGF treatment in time 0.0: https://drive.google.com/file/d/1YhN7vJXVUVRRkOu-NUONVclmDP3SdH-L/view?usp=sharing From the 12,738 measured cells, 644 presented the p.RB expression of 1.65488, which converting back to the original values represents a count of 13 ions. This is way more (roughly 10 times) from the following reading. Checking the other cell-lines for the same marker, we identified the initial value of 2.168072 (22 ions), which is even more frequent (roughly 20 times the next reading). We are trying to understand why those peaks occur, to define how to better treat them on the data distribution. They seem to accommodate lower counts which, for some reason, could not be properly expressed by the Mass Cytometry data pipeline, but as we are new to this kind of data, we are not sure. Some questions we have so far: 1. Are the initial peak expression counts accumulating lower counts which, if not accumulated, would follow the same distribution of the expression counts? 2. Why does the initial value varies according to the cell-line analyzed? Is due to a parameter setting (or calibration) of each Mass Cytometry run? Can someone help us better understand this phenomena? Thanks!

Created by Eduardo Seiti de Oliveira eduseiti
Hi @attilagabor , thanks a lot for your quick response. I'll try further understanding the batch effects removal process, since I haven't really considered that might be the cause for that artifact on the expression distribution. Hopefully @marco.tognetti can further clarify this point. Thanks a lot once again!
Dear @eduseiti , I'm tagging here @marco.tognetti who generated the data, but my guess is that the first bin represents the cells for which no marker expression was detected. It is a bit tricky to compute the raw counts from the asinh transformed value, because we removed batch effects, which shifted the asinh transformed data. If you want to investigate this further, I would start from the raw data from the paper: Tognetti et al (2021) [Deciphering the signaling network of breast cancer improves drug sensitivity prediction](https://www.sciencedirect.com/science/article/pii/S2405471221001113#sec4.1), Cell Systems. The batch effect removal is discussed in the "Mass cytometry analysis" subsection of Methods. The rawdata in .fcs format is available from Medley data: [here](https://doi.org/10.17632/gvh2vtg86r.1) best, Attila

Why there is a peak on most of the markers phosphorylation expression distribution? page is loading…