u 3 anni fa
parent
commit
91ee5e93de

BIN
latex/tex/bilder/k2/dna.png


+ 0 - 0
latex/tex/bilder/k_datatypes/01_sam-structure.png → latex/tex/bilder/k3/01_sam-structure.png


+ 1 - 0
latex/tex/kapitel/abkuerzungen.tex

@@ -7,4 +7,5 @@
 \begin{acronym}[IEEE]
   \acro{DNA}{Deoxyribonucleic acid}
   \acro{ANS}{Arithmetic numeral system}
+  \acro{GA4GH}{Global Alliance for Genomics and Health}
 \end{acronym}

+ 6 - 1
latex/tex/kapitel/k1_introduction.tex

@@ -1,2 +1,7 @@
 \chapter{Introduction}
-
+% general information and intro
+Understanding how things in our cosmos work was and still ist a pleasure the human being always wants to fullfill. Gettings insights into the rawest form of organic live is possible through reading and storing information embeded in genecode. Since live is complex, this information requires a lot of memory to be stored digitally. Communication with other researches means sending huge chunks of data through cables or through waves over the air, which costs time and makes raw data vulnerable to erorrs.\\
+% compression values and goals
+With compression tools this problem is reduced. Compressed data requires less space, requires less time to be send over networks and do to its smaller size, is statistically a little less vulnerable to errors. This advantage is scaleable and since there is much to discover about genomes, new findings in this field are nothing unusuall. From some of this findings, new tools can be developed which optimally increase two factors: the speed at which data is compressed and the compresseion ratio, meaning the difference between uncompressed and compressed data.\\
+% more exact explanation
+New discoveries in the universal rules of stochastical organisation of genomes might provide a base for new algoriths and therefore new tools for genome compression. The aim of this work is to analyze the current state of the art for probabilistic compression tools and their algorithms, and ultimately determine whether mentioned discoveries are already used. If this is not the case, there will be an analysation of how this new approach could improve compression methods.\\

+ 14 - 7
latex/tex/kapitel/k2_dna_structure.tex

@@ -18,17 +18,24 @@
 %- IMPACT ON COMPRESSION
 
 \chapter{Structure Of Biological Data}
-To strengthen the understanding how and where biological information is stored, this section starts with a quick and general rundown on the structure of any living organism.
+To strengthen the understanding of how and where biological information is stored, this section starts with a quick and general rundown on the structure of any living organism.
 % todo add picture
-All living organisms, like plants and animals, are made of cells (a human body can consist out of several trillion cells).
-% human body estimated 3.72 x 10^13 cells https://www.tandfonline.com/doi/full/10.3109/03014460.2013.807878
-A cell in itsel is a living organism, the smalles one possible. A cell got two layers, the inner one is called nucleus wich contains chromosomes. The chromosomes hold the genetic information in form of \acs{DNA}. 
+All living organisms, like plants and animals, are made of cells (a human body can consist out of several trillion cells) \cite{cells}.
+A cell in itself is a living organism; The smalles one possible. It has two layers from which the inner one is called nucleus. The nucleus contains chromosomes and those chromosomes hold the genetic information in form of \ac{DNA}. 
  
 \section{DNA}
-\ac{DNA} is often seen in the form of a double helix. A double helix consists, as the name suggestes, of two single helix. Each of them consists of two main components: the Suggar Phosphat backbone, which is irelavant for this Paper and the Bases. The arrangement of Bases represents the Information stored in the \acs{DNA}. A base is an organic molecule, they are called Nucleotides. %Nucleotides have special attributes and influence other Nucleotides in the \acs{DNA} Sequence
+\ac{DNA} is often seen in the form of a double helix. A double helix consists, as the name suggestes, of two single helix. 
+
+\begin{figure}[ht]
+  \centering
+  \includegraphics[width=15cm]{k2/dna.png}
+  \caption{A purely diagrammatic figure of the components \ac{DNA} is made of. The smaller, inner rods symbolize nucleotide links and the outer ribbons the phosphate-suggar chains \cite{dna_structure}.}
+  \label{k2:dna-struct}
+\end{figure}
+
+Each of them consists of two main components: the Suggar Phosphat backbone, which is irelavant for this Paper and the Bases. The arrangement of Bases represents the Information stored in the \ac{DNA}. A base is an organic molecule, they are called Nucleotides \cite{dna_structure}. %Nucleotides have special attributes and influence other Nucleotides in the \acs{DNA} Sequence
 % describe Genomes?
 
 \section{Nucleotides}
-For this paper, nucleotides are the most important parts of the \acs{DNA}. A Nucleotide can have one of four forms: it can be either adenine, thymine, guanine or cytosine. Each of them got a Counterpart on the helix, to be more explicit: adenine can only bond with thymine, guanine can only bond with cytosine. This means with the content of one helix, the other one can be determined by ``inverting'' the first. The counterpart for e.g.: adenine, guanine, adenine would be: thymine, cytosine, thymine. For the sake of simplicity, one does not write out the full name of each nucleotide but only use its initial: AGA in one Helix, TCT in the other.
+For this work, nucleotides are the most important parts of the \acs{DNA}. A Nucleotide can have one of four forms: it can be either adenine, thymine, guanine or cytosine. Each of them got a Counterpart with which a bond can be established: adenine can bond with thymine, guanine can bond with cytosine. For someone who whishes to persist this information it means the content of one helix can be determined by ``inverting'' the other one, in other words: the nucleotides of only one helix needs to be stored physically to save the information of the whole \ac{DNA}. The counterpart for e.g.: \texttt{adenine, guanine, adenine} chain would be a chain of \texttt{thymine, cytosine, thymine}. For the sake of simplicity, one does not write out the full name of each nucleotide but only use its initial: \texttt{AGA} in one Helix, \texttt{TCT} in the other.
 
-% it there is only one section -> remove it or move everything into introduction

+ 11 - 3
latex/tex/kapitel/k_datatypes.tex → latex/tex/kapitel/k3_datatypes.tex

@@ -19,9 +19,11 @@
 
 \chapter{Datatypes}
 % \section{overview}
-\acs{DNA} can be represented by a String with the buildingblocks A,T,G and C. Using a common fileformat for saving text would be impractical because the encoding defines that other possible letters require more space per letter. So storing a single \textit{A} in ASCII encoding requires 4 bit (excluding the magic bytes in the fileheader), whereas only two bits are needed to save a letter with a four letter alphabet e.g.: \texttt{00 -> A, 01 -> T, 10 -> G, 11 -> C}. More common Text encodings like unicode require even more storage spcae per letter. So settling with ASCII has improvement capabilitie but is, on the other side, more efficient than using bulkier alternatives.
+As described in previous chapters \ac{DNA} can be represented by a String with the buildingblocks A,T,G and C. Using a common fileformat for saving text would be impractical because the encoding defines that other possible letters require more space per letter. So storing a single \textit{A} in ASCII encoding requires 4 bit (excluding the magic bytes in the fileheader), whereas only two bits are needed to save a letter with a four letter alphabet e.g.: \texttt{00 -> A, 01 -> T, 10 -> G, 11 -> C}. More common Text encodings like unicode require even more storage spcae per letter. So settling with ASCII has improvement capabilitie but is, on the other side, more efficient than using bulkier alternatives like unicode.
 \\
-To optimize this, people have developed other filetypes, that focuse on storing nucleotides only. Standard formats include:
+Several people and groups have developed different fileformats to store genomes. Unfortunally for this work, there is no defined standard filetype or set of filetypes therefor one has to gather information on which types exist and how they function by themself. In order to not go beyond scope, this work will focus only on fileformats that fullfill two factors://
+1. it has reputation, either through a scientific paper that proove its superiority by comparison with other, relevant tools or through a broad ussage of the format.//
+2. 
 \begin{itemize}
   \item{FASTQ}
   \item{SAM/BAM}
@@ -75,7 +77,7 @@ The regulare expression, shown above, filters touple of characters from a to z i
 - allows viewing BAM data (localy and remote via ftp/http)
 - file extention: <filename>.bam.bai
 
-- saves more data than FASTQ
+- stores more data than FASTQ
  
 % src: https://support.illumina.com/help/BS_App_RNASeq_Alignment_OLH_1000000006112/Content/Source/Informatics/BAM-Format.htm
 - allignment section includes
@@ -87,3 +89,9 @@ The regulare expression, shown above, filters touple of characters from a to z i
  - XN amplicon name tag
 
 - BAM index files nameschema: <filename>.bam.bai 
+
+\subsection{CRAM - Compressed Reference-oriented Ailgnment Map}
+% src https://ena-docs.readthedocs.io/en/latest/retrieval/programmatic-access.html#cram-format
+% ga4ah https://www.ga4gh.org/cram/
+A highly space efficient file format for sequenced data, maintained by \ac{GA4GH}. It features both lossy and lossless compression mode. Even though it is part of \ac{GA4GH} suite, the file format can be used independently.\\
+The basic idea behind this format, is to split data into smaller sections 

+ 46 - 0
latex/tex/kapitel/k4_algorithms.tex

@@ -0,0 +1,46 @@
+%SUMMARY
+%- ABSTRACT
+%- INTRODUCTION
+%# BASICS
+%- \acs{DNA} STRUCTURE
+%- DATA TYPES
+% - BAM/FASTQ
+% - NON STANDARD
+%- COMPRESSION APPROACHES
+% - SAVING DIFFERENCES WITH GIVEN BASE \acs{DNA}
+% - HUFFMAN ENCODING
+% - PROBABILITY APPROACHES (WITH BASE?)
+%
+%# COMPARING TOOLS
+%- 
+%# POSSIBLE IMPROVEMENT
+%- \acs{DNA}S STOCHASTICAL ATTRIBUTES 
+%- IMPACT ON COMPRESSION
+
+\chapter{Compression aproaches}
+The process of compressing data serves the goal to generate an output that is smaller than its input data. In many cases, like in gene compressing, the compression is idealy lossless. This means it is possible for every compressed data, to receive the full information that were available in the origin data, by decompressing it. Lossy compression on the other hand, might excludes parts of data in the compression process, in order to increase the compression rate. The excluded parts are typicaly not necessary to transmit the origin information. This works with certain audio and pictures files or network protocols that are used to transmit video/audio streams live.
+For \acs{DNA} a lossless compression is needed. To be preceice a lossy compression is not possible, because there is no unnecessary data. Every nucleotide and its position is needed for the sequenced \acs{DNA} to be complete.
+
+\subsection{Huffman encoding}
+% list of algos and the tools that use them
+The well known Huffman coding, is used in several Tools for genome compression. This subsection should give the reader a general impression how this algorithm works, without going into detail. To use Huffman coding one must first define an alphabet, in our case a four letter alphabet, containing \texttt{A, C, G and T} is sufficient. The basic structure is symbolized as a tree. With that, a few simple rules apply to the structure:
+% binary view for alphabet
+% length n of sequence to compromize
+% greedy algo
+\begin{itemize}
+  \item every symbol of the alphabet is one leaf
+  \item the right branch from every not is marked as a 1, the left one is marked as a 0
+  \item every symbol got a weight, the weight is defined by the frequency the symbol occours in the input text
+  \item the less weight a node has, the higher the probability is, that this node is read next in the symbol sequence
+\end{itemize}
+The process of compromising starts with the nodes with the lowest weight and buids up to the hightest. Each step adds nodes to a tree where the most left branch should be the shortest and the most right the longest. The most left branch ends with the symbol with the highest weight, therefore occours the most in the input data.
+Following one path results in the binary representation for one symbol. For an alphabet like the one described above, an binary representation could initially look like this \texttt{A -> 00, C -> 01, G -> 10, T -> 11} with a sequence that has this distribution \texttt{A -> 10, C - 8, G -> 3, T -> 2} with a corresponding tree the compromised data would look like this: \texttt{}
+
+% begriffdef. alphabet,
+% leafs
+% weights
+% paths
+
+% (genomic squeeze <- official | inofficial -> GDC, GRS). Further \ac{ANS} or rANS ... TBD.
+
+\section{Probability aproaches}

+ 0 - 29
latex/tex/kapitel/k_algorithms.tex

@@ -1,29 +0,0 @@
-%SUMMARY
-%- ABSTRACT
-%- INTRODUCTION
-%# BASICS
-%- \acs{DNA} STRUCTURE
-%- DATA TYPES
-% - BAM/FASTQ
-% - NON STANDARD
-%- COMPRESSION APPROACHES
-% - SAVING DIFFERENCES WITH GIVEN BASE \acs{DNA}
-% - HUFFMAN ENCODING
-% - PROBABILITY APPROACHES (WITH BASE?)
-%
-%# COMPARING TOOLS
-%- 
-%# POSSIBLE IMPROVEMENT
-%- \acs{DNA}S STOCHASTICAL ATTRIBUTES 
-%- IMPACT ON COMPRESSION
-
-\chapter{Compression aproaches}
-The process of compressing data serves the goal to generate an output that is smaller than its input data. In many cases, like in gene compressing, the compression is idealy lossless. This means it is possible for every compressed data, to receive the full information that were available in the origin data, by decompressing it. Lossy compression on the other hand, might excludes parts of data in the compression process, in order to increase the compression rate. The excluded parts are typicaly not necessary to transmit the origin information. This works with certain audio and pictures files or network protocols that are used to transmit video/audio streams live.
-For \acs{DNA} a lossless compression is needed. To be preceice a lossy compression is not possible, because there is no unnecessary data. Every nucleotide and its position is needed for the sequenced \acs{DNA} to be complete.
-
-% list of algos and the tools that use them
-The well known Huffman coding, is used in several Tools for genome compression (genomic squeeze <- official | inofficial -> GDC, GRS). Further \ac{ANS} or rANS ... TBD.
-
-\subsection{Huffman encoding}
-
-\section{Probability aproaches}

+ 208 - 117
latex/tex/literatur.bib

@@ -1,118 +1,209 @@
-@online{Gao2017,
-	Author = {Gao, Liangcai and Yi, Xiaohan and Hao, Leipeng and Jiang, Zhuoren and Tang, Zhi},
-	Date-Added = {2017-06-19 19:21:12 +0000},
-	Date-Modified = {2017-06-19 19:21:12 +0000},
-	Title = {{ICDAR 2017 POD Competition: Evaluation}},
-	Url = {http://www.icst.pku.edu.cn/cpdp/ICDAR2017_PODCompetition/evaluation.html},
-	Urldate = {2017-05-30},
-	Year = {2017},
-	Bdsk-Url-1 = {http://www.icst.pku.edu.cn/cpdp/ICDAR2017_PODCompetition/evaluation.html}}
-
-@book{Kornmeier2011,
-	Author = {Martin Kornmeier},
-	Date-Added = {2012-04-04 12:07:45 +0000},
-	Date-Modified = {2012-04-04 12:09:25 +0000},
-	Edition = {4. Auflage},
-	Keywords = {Writing},
-	Publisher = {UTB},
-	Title = {Wissenschaftlich schreiben leicht gemacht},
-	Year = {2011}}
-
-@book{Kramer2009,
-	Author = {Walter Kr{\"a}mer},
-	Date-Added = {2011-10-27 13:55:22 +0000},
-	Date-Modified = {2011-10-27 14:01:55 +0000},
-	Edition = {3. Auflage},
-	Keywords = {Writing},
-	Month = {9},
-	Publisher = {Campus Verlag},
-	Title = {Wie schreibe ich eine Seminar- oder Examensarbeit?},
-	Year = {2009}}
-
-@book{Willberg1999,
-	Author = {Hans Peter Willberg and Friedrich Forssmann},
-	Date-Added = {2011-11-10 08:58:09 +0000},
-	Date-Modified = {2012-01-24 19:24:12 +0000},
-	Keywords = {Writing},
-	Publisher = {Verlag Hermann Schmidt},
-	Title = {Erste Hilfe in Typographie},
-	Year = {1999}}
-
-@book{Forssman2002,
-	Author = {Friedrich Forssman and Ralf de Jong},
-	Date-Added = {2012-01-24 19:20:46 +0000},
-	Date-Modified = {2012-01-24 19:21:56 +0000},
-	Keywords = {Writing},
-	Publisher = {Verlag Hermann Schmidt},
-	Title = {Detailtypografie},
-	Year = {2002}}
-
-@online{Weber2006,
-	Author = {Stefan Weber},
-	Date-Added = {2011-10-27 14:30:30 +0000},
-	Date-Modified = {2011-10-27 14:32:34 +0000},
-	Journal = {Telepolis},
-	Keywords = {Writing},
-	Lastchecked = {2011-10-27},
-	Month = {12},
-	Title = {Wissenschaft als Web-Sampling},
-	Url = {http://www.heise.de/tp/druck/mb/artikel/24/24221/1.html},
-	Urldate = {2011-10-27},
-	Year = {2006},
-	Bdsk-Url-1 = {http://www.heise.de/tp/druck/mb/artikel/24/24221/1.html}}
-
-@online{Wikipedia_HarveyBalls,
-	Author = {{Harvey Balls}},
-	Date-Added = {2011-10-27 14:30:30 +0000},
-	Date-Modified = {2011-10-27 14:32:34 +0000},
-	Lastchecked = {2018-02-07},
-	Month = {4},
-	Title = {Harvey Balls -- Wikipedia},
-	Url = {https://de.wikipedia.org/w/index.php?title=Harvey_Balls&oldid=116517396},
-	Urldate = {2018-02-07},
-	Year = {2013}}
-
-@online{Volere,
-	Author = {{Volere Template}},
-	Date-Added = {2011-10-27 14:30:30 +0000},
-	Date-Modified = {2011-10-27 14:32:34 +0000},
-	Lastchecked = {2019-01-31},
-	Month = {1},
-	Title = {Snowcards -- Volere},
-	Url = {http://www.volere.co.uk},
-	Urldate = {2019-01-31},
-	Year = {2018}}
-
-@techreport{Barbacci2003,
-	abstract = {The Quality Attribute Workshop (QAW) is a facilitated method that engages system stake- holders early in the life cycle to discover the driving quality attributes of a software-intensive system. The QAW was developed to complement the Architecture Tradeoff Analysis Meth- odSM (ATAMSM) and provides a way to identify important quality attributes and clarify system requirements before the software architecture has been created. This is the third edition of a technical report describing the QAW. We have narrowed the scope of a QAW to the creation of prioritized and refined scenarios. This report describes the newly revised QAW and describes potential uses of the refined scenarios generated during it.},
-	address = {Pttsburgh},
-	author = {Barbacci, Mario R. and Ellison, Robert and Lattanze, Anthony J. and Stafford, Judith A. and Weinstock, Charles B. and Wood, William G.},
-	booktitle = {Quality},
-	file = {::},
-	institution = {Software Engineering Institue - Carnegie Mellon},
-	keywords = {QAW,Quality Attribute Workshop,attribute requirements,attribute tradeoffs,quality attributes,scenarios},
-	mendeley-groups = {SEI,Architecture},
-	number = {August},
-	title = {{Quality Attribute Workshops (QAWs), Third Edition}},
-	year = {2003}}
-
-@book{Bass2003,
-author = {Bass, Len and Clements, Paul and Kazman, Rick},
-edition = {2nd editio},
-keywords = {Architecture},
-publisher = {Addison-Wesley},
-series = {SEI Series in Software Engineering},
-title = {{Software Architecture in Practice}},
-year = {2003}
-}
-@techreport{ISO25010,
-author = {{International Organization for Standardization}},
-type = {Standard},
-key = {ISO/IEC 25010:2011(E)},
-month = mar,
-year = {2011},
-title = {{Systems and software engineering -- Systems and software Quality Requirements -- and Evaluation (SQuaRE) -- System and software quality models}},
-volume = {2011},
-address = {Case postale 56, CH-1211 Geneva 20},
-institution = {International Organization for Standardization}
+@Online{Gao2017,
+  author        = {Gao, Liangcai and Yi, Xiaohan and Hao, Leipeng and Jiang, Zhuoren and Tang, Zhi},
+  title         = {{ICDAR 2017 POD Competition: Evaluation}},
+  url           = {http://www.icst.pku.edu.cn/cpdp/ICDAR2017_PODCompetition/evaluation.html},
+  urldate       = {2017-05-30},
+  bdsk-url-1    = {http://www.icst.pku.edu.cn/cpdp/ICDAR2017_PODCompetition/evaluation.html},
+  date-added    = {2017-06-19 19:21:12 +0000},
+  date-modified = {2017-06-19 19:21:12 +0000},
+  ranking       = {rank1},
+  relevance     = {relevant},
+  year          = {2017},
 }
+
+@Book{Kornmeier2011,
+  author        = {Martin Kornmeier},
+  title         = {Wissenschaftlich schreiben leicht gemacht},
+  edition       = {4. Auflage},
+  publisher     = {UTB},
+  date-added    = {2012-04-04 12:07:45 +0000},
+  date-modified = {2012-04-04 12:09:25 +0000},
+  keywords      = {Writing},
+  ranking       = {rank1},
+  relevance     = {relevant},
+  year          = {2011},
+}
+
+@Book{Kramer2009,
+  author        = {Walter Kr{\"a}mer},
+  title         = {Wie schreibe ich eine Seminar- oder Examensarbeit?},
+  edition       = {3. Auflage},
+  publisher     = {Campus Verlag},
+  date-added    = {2011-10-27 13:55:22 +0000},
+  date-modified = {2011-10-27 14:01:55 +0000},
+  keywords      = {Writing},
+  month         = {9},
+  ranking       = {rank1},
+  relevance     = {relevant},
+  year          = {2009},
+}
+
+@Book{Willberg1999,
+  author        = {Hans Peter Willberg and Friedrich Forssmann},
+  title         = {Erste Hilfe in Typographie},
+  publisher     = {Verlag Hermann Schmidt},
+  date-added    = {2011-11-10 08:58:09 +0000},
+  date-modified = {2012-01-24 19:24:12 +0000},
+  keywords      = {Writing},
+  ranking       = {rank1},
+  relevance     = {relevant},
+  year          = {1999},
+}
+
+@Book{Forssman2002,
+  author        = {Friedrich Forssman and Ralf de Jong},
+  title         = {Detailtypografie},
+  publisher     = {Verlag Hermann Schmidt},
+  date-added    = {2012-01-24 19:20:46 +0000},
+  date-modified = {2012-01-24 19:21:56 +0000},
+  keywords      = {Writing},
+  ranking       = {rank1},
+  relevance     = {relevant},
+  year          = {2002},
+}
+
+@Online{Weber2006,
+  author        = {Stefan Weber},
+  title         = {Wissenschaft als Web-Sampling},
+  url           = {http://www.heise.de/tp/druck/mb/artikel/24/24221/1.html},
+  urldate       = {2011-10-27},
+  bdsk-url-1    = {http://www.heise.de/tp/druck/mb/artikel/24/24221/1.html},
+  date-added    = {2011-10-27 14:30:30 +0000},
+  date-modified = {2011-10-27 14:32:34 +0000},
+  journal       = {Telepolis},
+  keywords      = {Writing},
+  lastchecked   = {2011-10-27},
+  month         = {12},
+  ranking       = {rank1},
+  relevance     = {relevant},
+  year          = {2006},
+}
+
+@Online{Wikipedia_HarveyBalls,
+  author        = {{Harvey Balls}},
+  title         = {Harvey Balls -- Wikipedia},
+  url           = {https://de.wikipedia.org/w/index.php?title=Harvey_Balls&oldid=116517396},
+  urldate       = {2018-02-07},
+  date-added    = {2011-10-27 14:30:30 +0000},
+  date-modified = {2011-10-27 14:32:34 +0000},
+  lastchecked   = {2018-02-07},
+  month         = {4},
+  ranking       = {rank1},
+  relevance     = {relevant},
+  year          = {2013},
+}
+
+@Online{Volere,
+  author        = {{Volere Template}},
+  title         = {Snowcards -- Volere},
+  url           = {http://www.volere.co.uk},
+  urldate       = {2019-01-31},
+  date-added    = {2011-10-27 14:30:30 +0000},
+  date-modified = {2011-10-27 14:32:34 +0000},
+  lastchecked   = {2019-01-31},
+  month         = {1},
+  ranking       = {rank1},
+  relevance     = {relevant},
+  year          = {2018},
+}
+
+@TechReport{Barbacci2003,
+  author          = {Barbacci, Mario R. and Ellison, Robert and Lattanze, Anthony J. and Stafford, Judith A. and Weinstock, Charles B. and Wood, William G.},
+  institution     = {Software Engineering Institue - Carnegie Mellon},
+  title           = {{Quality Attribute Workshops (QAWs), Third Edition}},
+  number          = {August},
+  abstract        = {The Quality Attribute Workshop (QAW) is a facilitated method that engages system stake- holders early in the life cycle to discover the driving quality attributes of a software-intensive system. The QAW was developed to complement the Architecture Tradeoff Analysis Meth- odSM (ATAMSM) and provides a way to identify important quality attributes and clarify system requirements before the software architecture has been created. This is the third edition of a technical report describing the QAW. We have narrowed the scope of a QAW to the creation of prioritized and refined scenarios. This report describes the newly revised QAW and describes potential uses of the refined scenarios generated during it.},
+  address         = {Pttsburgh},
+  booktitle       = {Quality},
+  file            = {::},
+  keywords        = {QAW, Quality Attribute Workshop, attribute requirements, attribute tradeoffs, quality attributes, scenarios},
+  mendeley-groups = {SEI,Architecture},
+  ranking         = {rank1},
+  relevance       = {relevant},
+  year            = {2003},
+}
+
+@Book{Bass2003,
+  author    = {Bass, Len and Clements, Paul and Kazman, Rick},
+  title     = {{Software Architecture in Practice}},
+  edition   = {2nd editio},
+  publisher = {Addison-Wesley},
+  series    = {SEI Series in Software Engineering},
+  keywords  = {Architecture},
+  ranking   = {rank1},
+  relevance = {relevant},
+  year      = {2003},
+}
+
+@TechReport{ISO25010,
+  author      = {{International Organization for Standardization}},
+  institution = {International Organization for Standardization},
+  title       = {{Systems and software engineering -- Systems and software Quality Requirements -- and Evaluation (SQuaRE) -- System and software quality models}},
+  type        = {Standard},
+  address     = {Case postale 56, CH-1211 Geneva 20},
+  key         = {ISO/IEC 25010:2011(E)},
+  month       = mar,
+  ranking     = {rank1},
+  relevance   = {relevant},
+  volume      = {2011},
+  year        = {2011},
+}
+
+@Article{Al_Okaily_2017,
+  author       = {Anas Al-Okaily and Badar Almarri and Sultan Al Yami and Chun-Hsi Huang},
+  date         = {2017-04-01},
+  journaltitle = {Journal of Computational Biology},
+  title        = {Toward a Better Compression for {DNA} Sequences Using Huffman Encoding},
+  doi          = {10.1089/cmb.2016.0151},
+  number       = {4},
+  pages        = {280--288},
+  volume       = {24},
+  publisher    = {Mary Ann Liebert Inc},
+}
+
+@Online{bam,
+  author  = {The SAM/BAM Format Specification Working Group},
+  date    = {2022-08-22},
+  title   = {Sequence Alignment/Map Format Specification},
+  url     = {https://github.com/samtools/hts-specs},
+  urldate = {2022-09-12},
+  version = {44b4167},
+}
+
+@Article{Cock_2009,
+  author       = {Peter J. A. Cock and Christopher J. Fields and Naohisa Goto and Michael L. Heuer and Peter M. Rice},
+  date         = {2009-12},
+  journaltitle = {Nucleic Acids Research},
+  title        = {The Sanger {FASTQ} file format for sequences with quality scores, and the Solexa/Illumina {FASTQ} variants},
+  doi          = {10.1093/nar/gkp1137},
+  number       = {6},
+  pages        = {1767--1771},
+  volume       = {38},
+  publisher    = {Oxford University Press ({OUP})},
+}
+
+@Article{cells,
+  author       = {Eva Bianconi and Allison Piovesan and Federica Facchin and Alina Beraudi and Raffaella Casadei and Flavia Frabetti and Lorenza Vitale and Maria Chiara Pelleri and Simone Tassani and Francesco Piva and Soledad Perez-Amodio and Pierluigi Strippoli and Silvia Canaider},
+  date         = {2013-07},
+  journaltitle = {Annals of Human Biology},
+  title        = {An estimation of the number of cells in the human body},
+  doi          = {10.3109/03014460.2013.807878},
+  number       = {6},
+  pages        = {463--471},
+  volume       = {40},
+  publisher    = {Informa {UK} Limited},
+}
+
+@Article{dna_structure,
+  author       = {J. D. WATSON and F. H. C. CRICK},
+  date         = {1953-04},
+  journaltitle = {Nature},
+  title        = {Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid},
+  doi          = {10.1038/171737a0},
+  number       = {4356},
+  pages        = {737--738},
+  volume       = {171},
+  publisher    = {Springer Science and Business Media {LLC}},
+}
+
+@Comment{jabref-meta: databaseType:biblatex;}

+ 206 - 0
latex/tex/literatur.bib.sav.tmp

@@ -0,0 +1,206 @@
+@Online{Gao2017,
+  author        = {Gao, Liangcai and Yi, Xiaohan and Hao, Leipeng and Jiang, Zhuoren and Tang, Zhi},
+  title         = {{ICDAR 2017 POD Competition: Evaluation}},
+  url           = {http://www.icst.pku.edu.cn/cpdp/ICDAR2017_PODCompetition/evaluation.html},
+  urldate       = {2017-05-30},
+  bdsk-url-1    = {http://www.icst.pku.edu.cn/cpdp/ICDAR2017_PODCompetition/evaluation.html},
+  date-added    = {2017-06-19 19:21:12 +0000},
+  date-modified = {2017-06-19 19:21:12 +0000},
+  ranking       = {rank1},
+  relevance     = {relevant},
+  year          = {2017},
+}
+
+@Book{Kornmeier2011,
+  author        = {Martin Kornmeier},
+  title         = {Wissenschaftlich schreiben leicht gemacht},
+  edition       = {4. Auflage},
+  publisher     = {UTB},
+  date-added    = {2012-04-04 12:07:45 +0000},
+  date-modified = {2012-04-04 12:09:25 +0000},
+  keywords      = {Writing},
+  ranking       = {rank1},
+  relevance     = {relevant},
+  year          = {2011},
+}
+
+@Book{Kramer2009,
+  author        = {Walter Kr{\"a}mer},
+  title         = {Wie schreibe ich eine Seminar- oder Examensarbeit?},
+  edition       = {3. Auflage},
+  publisher     = {Campus Verlag},
+  date-added    = {2011-10-27 13:55:22 +0000},
+  date-modified = {2011-10-27 14:01:55 +0000},
+  keywords      = {Writing},
+  month         = {9},
+  ranking       = {rank1},
+  relevance     = {relevant},
+  year          = {2009},
+}
+
+@Book{Willberg1999,
+  author        = {Hans Peter Willberg and Friedrich Forssmann},
+  title         = {Erste Hilfe in Typographie},
+  publisher     = {Verlag Hermann Schmidt},
+  date-added    = {2011-11-10 08:58:09 +0000},
+  date-modified = {2012-01-24 19:24:12 +0000},
+  keywords      = {Writing},
+  ranking       = {rank1},
+  relevance     = {relevant},
+  year          = {1999},
+}
+
+@Book{Forssman2002,
+  author        = {Friedrich Forssman and Ralf de Jong},
+  title         = {Detailtypografie},
+  publisher     = {Verlag Hermann Schmidt},
+  date-added    = {2012-01-24 19:20:46 +0000},
+  date-modified = {2012-01-24 19:21:56 +0000},
+  keywords      = {Writing},
+  ranking       = {rank1},
+  relevance     = {relevant},
+  year          = {2002},
+}
+
+@Online{Weber2006,
+  author        = {Stefan Weber},
+  title         = {Wissenschaft als Web-Sampling},
+  url           = {http://www.heise.de/tp/druck/mb/artikel/24/24221/1.html},
+  urldate       = {2011-10-27},
+  bdsk-url-1    = {http://www.heise.de/tp/druck/mb/artikel/24/24221/1.html},
+  date-added    = {2011-10-27 14:30:30 +0000},
+  date-modified = {2011-10-27 14:32:34 +0000},
+  journal       = {Telepolis},
+  keywords      = {Writing},
+  lastchecked   = {2011-10-27},
+  month         = {12},
+  ranking       = {rank1},
+  relevance     = {relevant},
+  year          = {2006},
+}
+
+@Online{Wikipedia_HarveyBalls,
+  author        = {{Harvey Balls}},
+  title         = {Harvey Balls -- Wikipedia},
+  url           = {https://de.wikipedia.org/w/index.php?title=Harvey_Balls&oldid=116517396},
+  urldate       = {2018-02-07},
+  date-added    = {2011-10-27 14:30:30 +0000},
+  date-modified = {2011-10-27 14:32:34 +0000},
+  lastchecked   = {2018-02-07},
+  month         = {4},
+  ranking       = {rank1},
+  relevance     = {relevant},
+  year          = {2013},
+}
+
+@Online{Volere,
+  author        = {{Volere Template}},
+  title         = {Snowcards -- Volere},
+  url           = {http://www.volere.co.uk},
+  urldate       = {2019-01-31},
+  date-added    = {2011-10-27 14:30:30 +0000},
+  date-modified = {2011-10-27 14:32:34 +0000},
+  lastchecked   = {2019-01-31},
+  month         = {1},
+  ranking       = {rank1},
+  relevance     = {relevant},
+  year          = {2018},
+}
+
+@TechReport{Barbacci2003,
+  author          = {Barbacci, Mario R. and Ellison, Robert and Lattanze, Anthony J. and Stafford, Judith A. and Weinstock, Charles B. and Wood, William G.},
+  institution     = {Software Engineering Institue - Carnegie Mellon},
+  title           = {{Quality Attribute Workshops (QAWs), Third Edition}},
+  number          = {August},
+  abstract        = {The Quality Attribute Workshop (QAW) is a facilitated method that engages system stake- holders early in the life cycle to discover the driving quality attributes of a software-intensive system. The QAW was developed to complement the Architecture Tradeoff Analysis Meth- odSM (ATAMSM) and provides a way to identify important quality attributes and clarify system requirements before the software architecture has been created. This is the third edition of a technical report describing the QAW. We have narrowed the scope of a QAW to the creation of prioritized and refined scenarios. This report describes the newly revised QAW and describes potential uses of the refined scenarios generated during it.},
+  address         = {Pttsburgh},
+  booktitle       = {Quality},
+  file            = {::},
+  keywords        = {QAW, Quality Attribute Workshop, attribute requirements, attribute tradeoffs, quality attributes, scenarios},
+  mendeley-groups = {SEI,Architecture},
+  ranking         = {rank1},
+  relevance       = {relevant},
+  year            = {2003},
+}
+
+@Book{Bass2003,
+  author    = {Bass, Len and Clements, Paul and Kazman, Rick},
+  title     = {{Software Architecture in Practice}},
+  edition   = {2nd editio},
+  publisher = {Addison-Wesley},
+  series    = {SEI Series in Software Engineering},
+  keywords  = {Architecture},
+  ranking   = {rank1},
+  relevance = {relevant},
+  year      = {2003},
+}
+
+@TechReport{ISO25010,
+  author      = {{International Organization for Standardization}},
+  institution = {International Organization for Standardization},
+  title       = {{Systems and software engineering -- Systems and software Quality Requirements -- and Evaluation (SQuaRE) -- System and software quality models}},
+  type        = {Standard},
+  address     = {Case postale 56, CH-1211 Geneva 20},
+  key         = {ISO/IEC 25010:2011(E)},
+  month       = mar,
+  ranking     = {rank1},
+  relevance   = {relevant},
+  volume      = {2011},
+  year        = {2011},
+}
+
+@Article{Al_Okaily_2017,
+  author       = {Anas Al-Okaily and Badar Almarri and Sultan Al Yami and Chun-Hsi Huang},
+  date         = {2017-04-01},
+  journaltitle = {Journal of Computational Biology},
+  title        = {Toward a Better Compression for {DNA} Sequences Using Huffman Encoding},
+  doi          = {10.1089/cmb.2016.0151},
+  number       = {4},
+  pages        = {280--288},
+  volume       = {24},
+  publisher    = {Mary Ann Liebert Inc},
+}
+
+@Online{bam,
+  author  = {The SAM/BAM Format Specification Working Group},
+  date    = {2022-08-22},
+  title   = {Sequence Alignment/Map Format Specification},
+  url     = {https://github.com/samtools/hts-specs},
+  urldate = {2022-09-12},
+  version = {44b4167},
+}
+
+@Article{Cock_2009,
+  author       = {Peter J. A. Cock and Christopher J. Fields and Naohisa Goto and Michael L. Heuer and Peter M. Rice},
+  date         = {2009-12},
+  journaltitle = {Nucleic Acids Research},
+  title        = {The Sanger {FASTQ} file format for sequences with quality scores, and the Solexa/Illumina {FASTQ} variants},
+  doi          = {10.1093/nar/gkp1137},
+  number       = {6},
+  pages        = {1767--1771},
+  volume       = {38},
+  publisher    = {Oxford University Press ({OUP})},
+}
+
+@Article{cells,
+  author       = {Eva Bianconi and Allison Piovesan and Federica Facchin and Alina Beraudi and Raffaella Casadei and Flavia Frabetti and Lorenza Vitale and Maria Chiara Pelleri and Simone Tassani and Francesco Piva and Soledad Perez-Amodio and Pierluigi Strippoli and Silvia Canaider},
+  date         = {2013-07},
+  journaltitle = {Annals of Human Biology},
+  title        = {An estimation of the number of cells in the human body},
+  doi          = {10.3109/03014460.2013.807878},
+  number       = {6},
+  pages        = {463--471},
+  volume       = {40},
+  publisher    = {Informa {UK} Limited},
+}
+
+@Article{dna_structure,
+  author       = {J. D. WATSON and F. H. C. CRICK},
+  date         = {1953-04},
+  journaltitle = {Nature},
+  title        = {Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid},
+  doi          = {10.1038/171737a0},
+  number       = {4356},
+  pages        = {737--738},
+  volume       = {171},
+  publisher    = {Springer Scie

+ 2 - 2
latex/tex/thesis.tex

@@ -136,8 +136,8 @@
 % Hauptteil der Arbeit
 \input{kapitel/k1_introduction} 
 \input{kapitel/k2_dna_structure}
-\input{kapitel/k_datatypes} 
-\input{kapitel/k_algorithms} % Externe Datei einbinden
+\input{kapitel/k3_datatypes} 
+\input{kapitel/k4_algorithms} % Externe Datei einbinden
 % ------------------------------------------------------------------
 
 \label{lastpage}