A list of ETL utilities worth a look.
A framework to let you build data processing procedure for production both easy and robust, and help you to deal with parallel execution, performance tuning, error handling, debugging and data reprocessing.
When building a procedure to process data, there are usually some challenges:
- Collect data from different data source and process it in real time.
- Different procedure for different type of data.
- The processing code should structured in a clean and elegant form, to make it easy to write, easy to read, and easy to extend.
- Reprocess the data and clean up the out result easily.
- Easy to debug by collecting the unhandled exception, and reprocess the data when bugs fixed.
- Do not missing one data.
- Run in parallel, and scale up easily.
- Monitoring ongoing process status.
This framework is being built to solve all these problems.