k2_dna_structure.tex 3.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
  1. %SUMMARY
  2. %- ABSTRACT
  3. %- INTRODUCTION
  4. %# BASICS
  5. %- \acs{DNA} STRUCTURE
  6. %- DATA TYPES
  7. % - BAM/FASTQ
  8. % - NON STANDARD
  9. %- COMPRESSION APPROACHES
  10. % - SAVING DIFFERENCES WITH GIVEN BASE \acs{DNA}
  11. % - HUFFMAN ENCODING
  12. % - PROBABILITY APPROACHES (WITH BASE?)
  13. %
  14. %# COMPARING TOOLS
  15. %-
  16. %# POSSIBLE IMPROVEMENT
  17. %- \acs{DNA}S STOCHASTICAL ATTRIBUTES
  18. %- IMPACT ON COMPRESSION
  19. \chapter{The Structure of the Human Genome and how its Digital Form is Compressed}
  20. \section{Structure of Human \acs{DNA}}
  21. To strengthen the understanding of how and where biological information is stored, this section starts with a quick and general rundown on the structure of any living organism.\\
  22. \begin{figure}[ht]
  23. \centering
  24. \includegraphics[width=6cm]{k2/cell.png}
  25. \caption{A superficial representation of the physical positioning of genomes. Showing a double helix (bottom), a chromosome (upper rihgt) and a chell (upper center).}
  26. \label{k2:gene-overview}
  27. \end{figure}
  28. All living organisms, like plants and animals, are made of cells. To get a rough impression, a human body can consist out of several trillion cells.
  29. A cell in itself, is the smallest living organism. Most cells consists of a outer section and a core which is a called nucleus. In \ref{k2:gene-overview} the nucleus is illustrated as a purple, cirlce like scheme, inside a lighter circle. The nucleus contains chromosomes. Those chromosomes contain genetic information, about its organism, in form of \ac{DNA} \cite{cells}.\\
  30. \acs{DNA} is often seen in the form of a double helix, like shown in \ref{k2:dna-struct}. A double helix consists, as the name suggests, of two single helix \cite{dna_structure}.
  31. \begin{figure}[ht]
  32. \centering
  33. \includegraphics[width=15cm]{k2/dna.png}
  34. \caption{A purely diagrammatic figure of the components \acs{DNA} is made of. The smaller, inner rods symbolize nucleotide links and the outer ribbons the phosphate-sugar chains \cite{dna_structure}.}
  35. \label{k2:dna-struct}
  36. \end{figure}
  37. Each of them consists of two main components: the sugar phosphate backbone, which is not relevant for this work and the bases. The suggar phosphate backbones are illustrated as flat stripes, circulating aroung the horizontal line in \ref{k2:dna-struct}. Pairs of bases are symbolized as vertical bars between the suggar phosphates.
  38. The arrangement of Bases represents the information, stored in the \acs{DNA}. Whar is here described as base is a organic molecule, which is also called nucleotide \cite{dna_structure}.\\
  39. For this work, nucleotides are the most important parts of the \acs{DNA}. A nucleotide can occur in one of four forms: it can be either adenine, thymine, guanine or cytosine. Each of them got a Counterpart with which a bond can be established: adenine can bond with thymine, guanine can bond with cytosine.\\
  40. From the perspective of an computer scientist: The content of one helix must be stored, to persist the full information. In more practical terms: The nucleotides of only one (entire) helix needs to be stored physically, to save the information of the whole \acs{DNA}. The other half can be determined by ``inverting'' the stored one.
  41. % todo OPT -> figure?
  42. An example would show the counterpart for e.g.: \texttt{adenine, guanine, adenine} chain which would be a chain of \texttt{thymine, cytosine, thymine}. For the sake of simplicity, one does not write out the full name of each nucleotide, but only its initiat. So the example would change to \texttt{AGA} in one Helix, \texttt{TCT} in the other.\\
  43. This representation ist commonly used to store \acs{DNA} digitally. Depending on the sequencing procedure and other factors, more information is stored and therefore more characters are required but for now 'A', 'C', 'G' and 'T' should be the only concern.