When a large-scale distributed data warehouse lacks a distribution key or partition strategy, the system performs a broadcast-based full table scan, also known as a shuffle or redistribution operation. A distribution key is a column used to determine how data rows are physically spread across the multiple storage nodes (servers) in a cluster. A partition strategy is a method of dividing large tables into smaller, more mana....
Log in to view the answer