The upgraded model delivers improvements in lip-sync precision physical plausibility long-form video stability and supports multi-person interactions for commercial video production.