
    h#<                     d    S S/r SSKrSSKJr  SSKJr  SSKJr   " S S\\R                  5      r
g)	zKallinteris-AndreaszRushiv Arora    N)utils)	MujocoEnv)Boxc                       \ rS rSrSrS/ SQ0rSS0 SSS	S
4S\S\S\\\	\-  4   S\	S\	S\	S\
4S jjrS rS rS\	4S jrS rS rS rSrg)
SwimmerEnv
   u'  
## Description
This environment corresponds to the Swimmer environment described in Rémi Coulom's PhD thesis ["Reinforcement Learning Using Neural Networks, with Applications to Motor Control"](https://tel.archives-ouvertes.fr/tel-00003985/document).
The environment aims to increase the number of independent state and control variables compared to classical control environments.
The swimmers consist of three or more segments ('***links***') and one less articulation joints ('***rotors***') - one rotor joint connects exactly two links to form a linear chain.
The swimmer is suspended in a two-dimensional pool and always starts in the same position (subject to some deviation drawn from a uniform distribution),
and the goal is to move as fast as possible towards the right by applying torque to the rotors and using fluid friction.

## Notes

The problem parameters are:
Problem parameters:
* *n*: number of body parts
* *m<sub>i</sub>*: mass of part *i* (*i* ∈ {1...n})
* *l<sub>i</sub>*: length of part *i* (*i* ∈ {1...n})
* *k*: viscous-friction coefficient

While the default environment has *n* = 3, *l<sub>i</sub>* = 0.1, and *k* = 0.1.
It is possible to pass a custom MuJoCo XML file during construction to increase the number of links, or to tweak any of the parameters.


## Action Space
```{figure} action_space_figures/swimmer.png
:name: swimmer
```

The action space is a `Box(-1, 1, (2,), float32)`. An action represents the torques applied between *links*

| Num | Action                             | Control Min | Control Max | Name (in corresponding XML file) | Joint | Type (Unit)  |
|-----|------------------------------------|-------------|-------------|----------------------------------|-------|--------------|
| 0   | Torque applied on the first rotor  | -1          | 1           | motor1_rot                       | hinge | torque (N m) |
| 1   | Torque applied on the second rotor | -1          | 1           | motor2_rot                       | hinge | torque (N m) |


## Observation Space
The observation space consists of the following parts (in order):

- *qpos (3 elements by default):* Position values of the robot's body parts.
- *qvel (5 elements):* The velocities of these individual body parts (their derivatives).

By default, the observation does not include the x- and y-coordinates of the front tip.
These can be included by passing `exclude_current_positions_from_observation=False` during construction.
In this case, the observation space will be a `Box(-Inf, Inf, (10,), float64)`, where the first two observations are the x- and y-coordinates of the front tip.
Regardless of whether `exclude_current_positions_from_observation` is set to `True` or `False`, the x- and y-coordinates are returned in `info` with the keys `"x_position"` and `"y_position"`, respectively.

By default, however, the observation space is a `Box(-Inf, Inf, (8,), float64)` where the elements are as follows:

| Num | Observation                          | Min  | Max | Name (in corresponding XML file) | Joint | Type (Unit)              |
| --- | ------------------------------------ | ---- | --- | -------------------------------- | ----- | ------------------------ |
| 0   | angle of the front tip               | -Inf | Inf | free_body_rot                    | hinge | angle (rad)              |
| 1   | angle of the first rotor             | -Inf | Inf | motor1_rot                       | hinge | angle (rad)              |
| 2   | angle of the second rotor            | -Inf | Inf | motor2_rot                       | hinge | angle (rad)              |
| 3   | velocity of the tip along the x-axis | -Inf | Inf | slider1                          | slide | velocity (m/s)           |
| 4   | velocity of the tip along the y-axis | -Inf | Inf | slider2                          | slide | velocity (m/s)           |
| 5   | angular velocity of front tip        | -Inf | Inf | free_body_rot                    | hinge | angular velocity (rad/s) |
| 6   | angular velocity of first rotor      | -Inf | Inf | motor1_rot                       | hinge | angular velocity (rad/s) |
| 7   | angular velocity of second rotor     | -Inf | Inf | motor2_rot                       | hinge | angular velocity (rad/s) |
| excluded | position of the tip along the x-axis | -Inf | Inf | slider1                          | slide | position (m)           |
| excluded | position of the tip along the y-axis | -Inf | Inf | slider2                          | slide | position (m)           |


## Rewards
The total reward is: ***reward*** *=* *forward_reward - ctrl_cost*.

- *forward_reward*:
A reward for moving forward,
this reward would be positive if the Swimmer moves forward (in the positive $x$ direction / in the right direction).
$w_{forward} \times \frac{dx}{dt}$, where
$dx$ is the displacement of the (front) "tip" ($x_{after-action} - x_{before-action}$),
$dt$ is the time between actions, which depends on the `frame_skip` parameter (default is 4),
and `frametime` which is $0.01$ - so the default is $dt = 4 \times 0.01 = 0.04$,
$w_{forward}$ is the `forward_reward_weight` (default is $1$).
- *ctrl_cost*:
A negative reward to penalize the Swimmer for taking actions that are too large.
$w_{control} \times \|action\|_2^2$,
where $w_{control}$ is `ctrl_cost_weight` (default is $10^{-4}$).

`info` contains the individual reward terms.


## Starting State
The initial position state is $\mathcal{U}_{[-reset\_noise\_scale \times I_{5}, reset\_noise\_scale \times I_{5}]}$.
The initial velocity state is $\mathcal{U}_{[-reset\_noise\_scale \times I_{5}, reset\_noise\_scale \times I_{5}]}$.

where $\mathcal{U}$ is the multivariate uniform continuous distribution.


## Episode End
### Termination
The Swimmer never terminates.

### Truncation
The default duration of an episode is 1000 timesteps.


## Arguments
Swimmer provides a range of parameters to modify the observation space, reward function, initial state, and termination condition.
These parameters can be applied during `gymnasium.make` in the following way:

```python
import gymnasium as gym
env = gym.make('Swimmer-v5', xml_file=...)
```

| Parameter                                  | Type      | Default       |Description                                                                                                                                                                                                  |
|--------------------------------------------| --------- |-------------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|`xml_file`                                  | **str**   |`"swimmer.xml"`| Path to a MuJoCo model                                                                                                                                                                                      |
|`forward_reward_weight`                     | **float** | `1`           | Weight for _forward_reward_ term (see `Rewards` section)                                                                                                                                                    |
|`ctrl_cost_weight`                          | **float** | `1e-4`        | Weight for _ctrl_cost_ term (see `Rewards` section)                                                                                                                                                         |
|`reset_noise_scale`                         | **float** | `0.1`         | Scale of random perturbations of initial position and velocity (see `Starting State` section)                                                                                                               |
|`exclude_current_positions_from_observation`| **bool**  | `True`        | Whether or not to omit the x- and y-coordinates from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies (see `Observation Space` section) |


## Version History
* v5:
    - Minimum `mujoco` version is now 2.3.3.
    - Added support for fully custom/third party `mujoco` models using the `xml_file` argument (previously only a few changes could be made to the existing models).
    - Added `default_camera_config` argument, a dictionary for setting the `mj_camera` properties, mainly useful for custom environments.
    - Added `env.observation_structure`, a dictionary for specifying the observation space compose (e.g. `qpos`, `qvel`), useful for building tooling and wrappers for the MuJoCo environments.
    - Return a non-empty `info` with `reset()`, previously an empty dictionary was returned, the new keys are the same state information as `step()`.
    - Added `frame_skip` argument, used to configure the `dt` (duration of `step()`), default varies by environment check environment documentation pages.
    - Restored the `xml_file` argument (was removed in `v4`).
    - Added `forward_reward_weight`, `ctrl_cost_weight`, to configure the reward function (defaults are effectively the same as in `v4`).
    - Added `reset_noise_scale` argument to set the range of initial states.
    - Added `exclude_current_positions_from_observation` argument.
    - Replaced `info["reward_fwd"]` and `info["forward_reward"]` with `info["reward_forward"]` to be consistent with the other environments.
* v4: All MuJoCo environments now use the MuJoCo bindings in mujoco >= 2.1.3.
* v3: Support for `gymnasium.make` kwargs such as `xml_file`, `ctrl_cost_weight`, `reset_noise_scale`, etc. rgb rendering comes from tracking camera (so agent does not run away from screen). Moved to the [gymnasium-robotics repo](https://github.com/Farama-Foundation/gymnasium-robotics).
* v2: All continuous control environments now use mujoco-py >= 1.50. Moved to the [gymnasium-robotics repo](https://github.com/Farama-Foundation/gymnasium-robotics).
* v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.
* v0: Initial versions release.
render_modeshuman	rgb_arraydepth_array
rgbd_tuplezswimmer.xml         ?g-C6?g?Txml_file
frame_skipdefault_camera_configforward_reward_weightctrl_cost_weightreset_noise_scale*exclude_current_positions_from_observationc           
         [         R                  R                  " U UUUUUUU40 UD6  X@l        XPl        X`l        UU l        [        R                  " U UU4S US.UD6  / SQ[        [        R                  " SU R                  -  5      5      S.U l        U R                  R                  R                  U R                  R                   R                  -   SU-  -
  n	[#        [        R$                  * [        R$                  U	4[        R&                  S9U l        SU-  U R                  R                  R                  SU-  -
  U R                  R                   R                  S.U l        g )N)observation_spacer   r
   r   )r	   
render_fps   )lowhighshapedtype)skipped_qposqposqvel)r   EzPickle__init___forward_reward_weight_ctrl_cost_weight_reset_noise_scale+_exclude_current_positions_from_observationr   intnprounddtmetadatadatar!   sizer"   r   inffloat64r   observation_structure)
selfr   r   r   r   r   r   r   kwargsobs_sizes
             Z/home/james-whalen/.local/lib/python3.13/site-packages/gymnasium/envs/mujoco/swimmer_v5.pyr$   SwimmerEnv.__init__   s[    	!!6
	
 
	
 '<#!1"3 7 	8 		
 #"7	
 	
 bhhsTWW}56
 IINNiinn!!"<<= 	
 "%bffXKrzz"

  JJIINN''<<=IINN''	&
"    c                 t    U R                   [        R                  " [        R                  " U5      5      -  nU$ )N)r&   r*   sumsquare)r3   actioncontrol_costs      r6   r=   SwimmerEnv.control_cost   s*    --ryy7H0IIr8   c                    U R                   R                  SS R                  5       nU R                  XR                  5        U R                   R                  SS R                  5       nX2-
  U R
                  -  nUu  pVU R                  5       nU R                  XQ5      u  pUS   US   [        R                  R                  USS9UUS.U	En
U R                  S:X  a  U R                  5         XxSSU
4$ )Nr   r      ord)
x_position
y_positiondistance_from_origin
x_velocity
y_velocityr   F)r.   r!   copydo_simulationr   r,   _get_obs_get_rewr*   linalgnormrender_moderender)r3   r<   xy_position_beforexy_position_afterxy_velocityrF   rG   observationrewardreward_infoinfos              r6   stepSwimmerEnv.step   s    !YY^^Aa05576??3 IINN1Q/446(=H!,
mmo"mmJ?+A.+A.$&IINN3D!N$L$$
 
 w&KKME5$66r8   rF   c                 \    U R                   U-  nU R                  U5      nX4-
  nUU* S.nXV4$ )N)reward_forwardreward_ctrl)r%   r=   )r3   rF   r<   forward_reward	ctrl_costrT   rU   s          r6   rK   SwimmerEnv._get_rew   sF    44zA%%f-	+ -%:

 ""r8   c                    U R                   R                  R                  5       nU R                   R                  R                  5       nU R                  (       a  USS  n[
        R                  " X/5      R                  5       nU$ )Nr   )r.   r!   flattenr"   r(   r*   concatenateravel)r3   positionvelocityrS   s       r6   rJ   SwimmerEnv._get_obs  sb    99>>))+99>>))+;;|Hnnh%9:@@Br8   c                 f   U R                   * nU R                   nU R                  U R                  R                  XU R                  R
                  S9-   nU R                  U R                  R                  XU R                  R                  S9-   nU R                  X45        U R                  5       nU$ )N)r   r   r/   )
r'   	init_qpos	np_randomuniformmodelnq	init_qvelnv	set_staterJ   )r3   	noise_low
noise_highr!   r"   rS   s         r6   reset_modelSwimmerEnv.reset_model  s    ,,,	,,
~~ 6 6 !7 !
 
 ~~ 6 6 !7 !
 
 	t"mmor8   c                     U R                   R                  S   U R                   R                  S   [        R                  R	                  U R                   R                  SS SS9S.$ )Nr   r@   r   rA   )rC   rD   rE   )r.   r!   r*   rL   rM   )r3   s    r6   _get_reset_infoSwimmerEnv._get_reset_info  sO    ))..+))..+$&IINN499>>!A3FAN$N
 	
r8   )r&   r(   r%   r'   r-   r   r2   N)__name__
__module____qualname____firstlineno____doc__r-   strr)   dictfloatboolr$   r=   rW   rK   rJ   rq   rt   __static_attributes__ r8   r6   r   r   
   s    CL 	 
H &8:'*"&#&;?A
A
 A
  $C$45	A

  %A
  A
 !A
 59A
F70#5 # 
r8   r   )__credits__numpyr*   	gymnasiumr   gymnasium.envs.mujocor   gymnasium.spacesr   r#   r   r   r8   r6   <module>r      s/   $n5   +  Z
ENN Z
r8   