FlameF0X
/

CanoPy

@@ -27,6 +27,9 @@ It uses PPO (Proximal Policy Optimization) to learn 2v2 gameplay through self-pl
   - Ball velocity toward goal
   - Goal scoring reward
 ## Training Configuration (from `config.json`)
 - **Number of processes:** 4

   - Ball velocity toward goal
   - Goal scoring reward
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/1v9m5G8WSuJACQOs0AdDp.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6615494716917dfdc645c44e/WTXHjHXw1ZmmMvZEr_DI5.png)
 ## Training Configuration (from `config.json`)
 - **Number of processes:** 4