

Clones are predominantly defined based on the complementarity determining region 3 (CDR3) of BCR heavy chains or TCR beta chains, which contributes most to the BCR/TCR binding specificity. In fact, due to both biological and technological reasons, immune repertoire data are quasi-distinct across individuals (humans or mice) with respect to their clonal composition. While immune repertoire sequencing datasets have steadily increased from 10 3 to 10 6 sequencing reads per sample, it has still remained a challenge to extract from large-scale repertoire data immunological status-specific fingerprints of entire repertoires for systems medicine and immunodiagnostics application. Only very recently, through the advent of high-throughput sequencing (HTS), has it become possible to capture the immense clonal diversity and distribution of BCR and TCR repertoires at high resolution. There is an enormous diversity of B-cell receptors (BCRs, antibodies) and T-cell receptors (TCRs), theoretically approaching 10 13 and 10 18 protein sequences, respectively. Therefore, the immune receptor clonal diversity and distribution, which summarize the state of clonal selection and expansion, may serve as a fingerprint of an individual’s current immunological status (e.g., healthy, infected, vaccinated), and may thus be exploited for immunodiagnostic applications. The lymphocyte repertoire of B and T cells is shaped throughout the lifetime of an individual in response to environmental and pathogenic antigen challenge, lymphocytes clonally expand and are selected in a highly specific manner. Our framework offers the possibility to advance immune-repertoire-based fingerprinting, which may in the future enable a systems immunogenomics approach for vaccine profiling and the accurate and early detection of disease and infection. Our framework is highly scalable as it easily allowed for the analysis of 1000 simulated immune repertoires this exceeds the size of published immune repertoire datasets by one to two orders of magnitude. We could predict with high accuracy (greater than or equal to 80 %) a wide range of immunological statuses such as healthy, transplantation recipient, and lymphoid cancer, suggesting as a proof of principle that diversity profiling can recover a large amount of immunodiagnostic fingerprints from immune repertoire data. We coupled diversity profiles with unsupervised (hierarchical clustering) and supervised (support vector machine and feature selection) machine learning approaches in order to correlate patients’ immunological statuses with their B- and T-cell repertoire data. The framework relies on Hill-based diversity profiles composed of a continuum of single diversity indices, which enable the quantification of the extent of immunological information contained in immune repertoires. Here, we introduce a bioinformatics repertoire-profiling framework that possesses the advantage of capturing the diversity and distribution of entire immune repertoires, as opposed to singular public clones. Disconcertingly, this means that the wealth of information gained from repertoire sequencing remains largely unused for determining the current status of immune responses, thereby hampering the implementation of immune-repertoire-based diagnostics. However, steadily increasing sequencing depth has revealed that immune repertoires vary greatly among individuals in their composition correspondingly, it has been reported that there are few shared sequences indicative of immunological status ('public clones'). The advent of high-throughput immune repertoire sequencing now enables the interrogation of immune repertoire diversity in an unprecedented and quantitative manner. Thus, they may serve as a fingerprint of an individual’s ongoing immunological status (e.g., healthy, infected, vaccinated), with far-reaching implications for immunodiagnostics applications.

Lymphocyte receptor repertoires are continually shaped throughout the lifetime of an individual in response to environmental and pathogenic exposure.
