Starts at: 2025-03-06 03:10PM
Ends at: 2025-03-06 03:30PM
Abstract:
When studying mechanisms of DNA repair, short mutations often arise at the repair site, frequently manifesting as the insertion of short nucleotide sequences from the alphabet {A, C, G, T}. Each of these insertions occurs across millions of DNA molecules, generating a set of short words with varying frequencies. Our goal is to identify a suitable mathematical object to analyze these word sets and distinguish patterns across different experimental conditions. In this talk, we introduce the Insertion Chain Complex, a higher-dimensional generalization of insertion graphs, where homology serves as a measure of the complexity of a set of words. We present its construction, fundamental properties, and applications to biological data. In our case study, we analyze data from human cells in which DNA breaks were induced and the repaired sequences were sequenced. Our findings demonstrate that counting the highest-dimensional cells in these insertion complexes effectively distinguishes between different break locations.