Abstract
Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce.
| Original language | English |
|---|---|
| Publisher | BMC Bioinformatics |
| Volume | 11 |
| State | Published - Dec 21 2010 |
Keywords
- bioinformatics
Disciplines
- Biology
- Biotechnology