MIT's HiP system helps robots complete long-horizon goals using three foundation models: a large language model, a video diffusion model, and an egocentric action model. Iterative refinement improves the plan at each step for household and manufacturing tasks.