Abstract
Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce.
Original language | American English |
---|---|
Publisher | BMC Bioinformatics |
Volume | 11 |
State | Published - Dec 21 2010 |
Keywords
- bioinformatics
Disciplines
- Biology
- Biotechnology