Hi, in the all_pbmcs_metadata .csv file for all_pbmcs in the GEX data, why does the D06 sample have 57 and 60 years old. I see that the paper shows that the samples were collected over a three year period, what is the reason for this situation? Thanks.

Created by ?? ? suyunjin1234
My problem is solved. Thanks again for your patience and for providing the great data!!
Yes, it is the same as your first question. For A33 the number of days between the first visit and follow-up was 358, so probably their birthday is in that couple of days.
Thank you for your patience! I used the Donor_id and Age columns to filter the sample size and got a series of incorrect numbers. I noticed that the same Donor_id may have more than one Tube_id at the same age (e.g. A33 at 31 years old), and I didn't figure out why - the paper says that each donor is only sampled once a year. Could it be because the sampling is early and the donor hasn't had a new birthday yet? Thanks!! ``` import pandas as pd df=pd.read_csv('all_pbmcs_metadata.csv') data_subset = df[['Donor_id', 'Age']] unique_data_subset = data_subset.drop_duplicates(data_subset) print(len(unique_data_subset))# error: Overall 313 samples ```
Hi! Could you please let me know how you calculate it? This is my script for calculating the amount of donors with 3 visits (3 samples) ``` data <- read.csv('all_pbmcs_metadata.csv') tube_donor <- data[, c('Tube_id', 'Donor_id')] tube_donor <- tube_donor[!duplicated(tube_donor),] # Overall 317 samples number_of_tubes <- as.data.frame(table(tube_donor$Donor_id)) # Calcualte how many samples for each donor three_tubes <- number_of_tubes %>% filter(Freq == 3) # Choose donors with 3 samples print(dim(three_tubes)[1]) # 49 ``` So the output is 49, which is the same as in the paper. You can do the same calculation for the donors which has only 2 visits: ``` two_tubes <- number_of_tubes %>% filter(Freq == 2) # Choose donors with 2 samples print(dim(two_tubes)[1]) # 53 ``` 53 is 39 + 14 (not including the ones that have 3 visits). Kind regards, Marina
Is it because there are four individuals with an anomalies sample data ?
So it is! Another question I have is that the number of people with a two-year sample (whichever two years are chosen) in the csv is 57, while the number in the paper is 53 (39+14), and the number of people with a three-year sample in the csv is 45, while the number in the paper is 49. Why is this occurring? Thanks again for your time!
Dear @suyunjin1234, It appeared because between first and third collection of samples from donor D06 passed slightly more than 2 years, exactly 2 years 4 months, so probably donor had birthday during that 4 months. Thanks, Petr Tsurinov

Problems with the age of the sample(Completed) page is loading…