While data size is a consideration, it is not the only one, and I'd like to provide an operational consideration here if I may. Most of our customers get hooked on the flexibility well before PB+ scale out becomes a driver. The "cost" for experimentation and mixed data set analytics compared to conventional approaches is what really matters more than size when getting started. There is no schema to get "wrong", no ELT and MDM required before getting started (notice I didn't say they wouldn't be the need for those in the future however) so the "cost" to get started in low and the price for failure is equally low. Anyway, something to consider.
Matching: mapreduce X