ó ûëh#<ãód•SS/rSSKrSSKJr SSKJr SSKJr "SS\\R5r g) zKallinteris-AndreaszRushiv AroraéN)Úutils)Ú MujocoEnv)ÚBoxcó˜•\rSrSrSrS/SQ0rSS0SSS S 4S\S\S \\\ \-4S\ S\ S\ S\ 4SjjrSrSr S\ 4SjrSrSrSrSrg)Ú SwimmerEnvé uŒ' ## Description This environment corresponds to the Swimmer environment described in RÃ©mi Coulom's PhD thesis ["Reinforcement Learning Using Neural Networks, with Applications to Motor Control"](https://tel.archives-ouvertes.fr/tel-00003985/document). The environment aims to increase the number of independent state and control variables compared to classical control environments. The swimmers consist of three or more segments ('***links***') and one less articulation joints ('***rotors***') - one rotor joint connects exactly two links to form a linear chain. The swimmer is suspended in a two-dimensional pool and always starts in the same position (subject to some deviation drawn from a uniform distribution), and the goal is to move as fast as possible towards the right by applying torque to the rotors and using fluid friction. ## Notes The problem parameters are: Problem parameters: * *n*: number of body parts * *m_i*: mass of part *i* (*i* âˆˆ {1...n}) * *l_i*: length of part *i* (*i* âˆˆ {1...n}) * *k*: viscous-friction coefficient While the default environment has *n* = 3, *l_i* = 0.1, and *k* = 0.1. It is possible to pass a custom MuJoCo XML file during construction to increase the number of links, or to tweak any of the parameters. ## Action Space ```{figure} action_space_figures/swimmer.png :name: swimmer ``` The action space is a `Box(-1, 1, (2,), float32)`. An action represents the torques applied between *links* | Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Type (Unit) | |-----|------------------------------------|-------------|-------------|----------------------------------|-------|--------------| | 0 | Torque applied on the first rotor | -1 | 1 | motor1_rot | hinge | torque (N m) | | 1 | Torque applied on the second rotor | -1 | 1 | motor2_rot | hinge | torque (N m) | ## Observation Space The observation space consists of the following parts (in order): - *qpos (3 elements by default):* Position values of the robot's body parts. - *qvel (5 elements):* The velocities of these individual body parts (their derivatives). By default, the observation does not include the x- and y-coordinates of the front tip. These can be included by passing `exclude_current_positions_from_observation=False` during construction. In this case, the observation space will be a `Box(-Inf, Inf, (10,), float64)`, where the first two observations are the x- and y-coordinates of the front tip. Regardless of whether `exclude_current_positions_from_observation` is set to `True` or `False`, the x- and y-coordinates are returned in `info` with the keys `"x_position"` and `"y_position"`, respectively. By default, however, the observation space is a `Box(-Inf, Inf, (8,), float64)` where the elements are as follows: | Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Type (Unit) | | --- | ------------------------------------ | ---- | --- | -------------------------------- | ----- | ------------------------ | | 0 | angle of the front tip | -Inf | Inf | free_body_rot | hinge | angle (rad) | | 1 | angle of the first rotor | -Inf | Inf | motor1_rot | hinge | angle (rad) | | 2 | angle of the second rotor | -Inf | Inf | motor2_rot | hinge | angle (rad) | | 3 | velocity of the tip along the x-axis | -Inf | Inf | slider1 | slide | velocity (m/s) | | 4 | velocity of the tip along the y-axis | -Inf | Inf | slider2 | slide | velocity (m/s) | | 5 | angular velocity of front tip | -Inf | Inf | free_body_rot | hinge | angular velocity (rad/s) | | 6 | angular velocity of first rotor | -Inf | Inf | motor1_rot | hinge | angular velocity (rad/s) | | 7 | angular velocity of second rotor | -Inf | Inf | motor2_rot | hinge | angular velocity (rad/s) | | excluded | position of the tip along the x-axis | -Inf | Inf | slider1 | slide | position (m) | | excluded | position of the tip along the y-axis | -Inf | Inf | slider2 | slide | position (m) | ## Rewards The total reward is: ***reward*** *=* *forward_reward - ctrl_cost*. - *forward_reward*: A reward for moving forward, this reward would be positive if the Swimmer moves forward (in the positive $x$ direction / in the right direction). $w_{forward} \times \frac{dx}{dt}$, where $dx$ is the displacement of the (front) "tip" ($x_{after-action} - x_{before-action}$), $dt$ is the time between actions, which depends on the `frame_skip` parameter (default is 4), and `frametime` which is $0.01$ - so the default is $dt = 4 \times 0.01 = 0.04$, $w_{forward}$ is the `forward_reward_weight` (default is $1$). - *ctrl_cost*: A negative reward to penalize the Swimmer for taking actions that are too large. $w_{control} \times \|action\|_2^2$, where $w_{control}$ is `ctrl_cost_weight` (default is $10^{-4}$). `info` contains the individual reward terms. ## Starting State The initial position state is $\mathcal{U}_{[-reset\_noise\_scale \times I_{5}, reset\_noise\_scale \times I_{5}]}$. The initial velocity state is $\mathcal{U}_{[-reset\_noise\_scale \times I_{5}, reset\_noise\_scale \times I_{5}]}$. where $\mathcal{U}$ is the multivariate uniform continuous distribution. ## Episode End ### Termination The Swimmer never terminates. ### Truncation The default duration of an episode is 1000 timesteps. ## Arguments Swimmer provides a range of parameters to modify the observation space, reward function, initial state, and termination condition. These parameters can be applied during `gymnasium.make` in the following way: ```python import gymnasium as gym env = gym.make('Swimmer-v5', xml_file=...) ``` | Parameter | Type | Default |Description | |--------------------------------------------| --------- |-------------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |`xml_file` | **str** |`"swimmer.xml"`| Path to a MuJoCo model | |`forward_reward_weight` | **float** | `1` | Weight for _forward_reward_ term (see `Rewards` section) | |`ctrl_cost_weight` | **float** | `1e-4` | Weight for _ctrl_cost_ term (see `Rewards` section) | |`reset_noise_scale` | **float** | `0.1` | Scale of random perturbations of initial position and velocity (see `Starting State` section) | |`exclude_current_positions_from_observation`| **bool** | `True` | Whether or not to omit the x- and y-coordinates from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies (see `Observation Space` section) | ## Version History * v5: - Minimum `mujoco` version is now 2.3.3. - Added support for fully custom/third party `mujoco` models using the `xml_file` argument (previously only a few changes could be made to the existing models). - Added `default_camera_config` argument, a dictionary for setting the `mj_camera` properties, mainly useful for custom environments. - Added `env.observation_structure`, a dictionary for specifying the observation space compose (e.g. `qpos`, `qvel`), useful for building tooling and wrappers for the MuJoCo environments. - Return a non-empty `info` with `reset()`, previously an empty dictionary was returned, the new keys are the same state information as `step()`. - Added `frame_skip` argument, used to configure the `dt` (duration of `step()`), default varies by environment check environment documentation pages. - Restored the `xml_file` argument (was removed in `v4`). - Added `forward_reward_weight`, `ctrl_cost_weight`, to configure the reward function (defaults are effectively the same as in `v4`). - Added `reset_noise_scale` argument to set the range of initial states. - Added `exclude_current_positions_from_observation` argument. - Replaced `info["reward_fwd"]` and `info["forward_reward"]` with `info["reward_forward"]` to be consistent with the other environments. * v4: All MuJoCo environments now use the MuJoCo bindings in mujoco >= 2.1.3. * v3: Support for `gymnasium.make` kwargs such as `xml_file`, `ctrl_cost_weight`, `reset_noise_scale`, etc. rgb rendering comes from tracking camera (so agent does not run away from screen). Moved to the [gymnasium-robotics repo](https://github.com/Farama-Foundation/gymnasium-robotics). * v2: All continuous control environments now use mujoco-py >= 1.50. Moved to the [gymnasium-robotics repo](https://github.com/Farama-Foundation/gymnasium-robotics). * v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments. * v0: Initial versions release. Úrender_modes©ÚhumanÚ rgb_arrayÚdepth_arrayÚ rgbd_tuplezswimmer.xmléçð?g-Cëâ6?gš™™™™™¹?TÚxml_fileÚ frame_skipÚdefault_camera_configÚforward_reward_weightÚctrl_cost_weightÚreset_noise_scaleÚ*exclude_current_positions_from_observationc óÒ•[RR"UUUUUUUU40UD6 X@lXPlX`lUUl[R"UUU4SUS.UD6 /SQ[[R"SUR-55S.UlURRRURR R-SU-- n [#[R$*[R$U 4[R&S9UlSU-URRRSU-- URR RS.Ulg)N)Úobservation_spacerr r)r Ú render_fpsé)ÚlowÚhighÚshapeÚdtype)Úskipped_qposÚqposÚqvel)rÚEzPickleÚ__init__Ú_forward_reward_weightÚ_ctrl_cost_weightÚ_reset_noise_scaleÚ+_exclude_current_positions_from_observationrÚintÚnpÚroundÚdtÚmetadataÚdatar!Úsizer"rÚinfÚfloat64rÚobservation_structure) ÚselfrrrrrrrÚkwargsÚobs_sizes ÚZ/home/james-whalen/.local/lib/python3.13/site-packages/gymnasium/envs/mujoco/swimmer_v5.pyr$ÚSwimmerEnv.__init__™s[€ô ‰×ÒØØØØ!Ø!ØØØ6ñ ðò ð'<Ô#Ø!1Ôà"3Ôð 7ð Ô8ô ×ÒØØØð ð#Ø"7ñ ðò òôœbŸhšh s¨T¯W©W¡}Ó5Ó6ñ ˆŒ ð I‰IN‰N×ÑØi‰in‰n×!Ñ!ñ "àÐ<Ñ<ñ =ð ô "%Ü—‘œbŸf™f¨X¨K¼r¿z¹zñ" ˆÔð Ð JÑJØ—I‘I—N‘N×'Ñ'ØÐ<Ñ<ñ=à—I‘I—N‘N×'Ñ'ñ & ˆÕ"ócót•UR[R"[R"U55-nU$)N)r&r*ÚsumÚsquare)r3ÚactionÚcontrol_costs r6r=ÚSwimmerEnv.control_costÜs*€Ø×-Ñ-´·²´r·y²yÀÓ7HÓ0IÑIˆØÐr8cóè•URRSSR5nURXR5 URRSSR5nX2- UR -nUupVUR 5nURXQ5up‰USUS[RRUSS9UUS.U En URS:XaUR5 XxSSU 4$)Nrré©Úord)Ú x_positionÚ y_positionÚdistance_from_originÚ x_velocityÚ y_velocityrF) r.r!ÚcopyÚ do_simulationrr,Ú_get_obsÚ_get_rewr*ÚlinalgÚnormÚrender_modeÚrender)r3r<Úxy_position_beforeÚxy_position_afterÚxy_velocityrFrGÚobservationÚrewardÚreward_infoÚinfos r6ÚstepÚSwimmerEnv.stepàsé€Ø!ŸY™YŸ^™^¨A¨aÐ0×5Ñ5Ó7ÐØ×Ñ˜6§?¡?Ô3Ø ŸI™IŸN™N¨1¨QÐ/×4Ñ4Ó6Ðà(Ñ=ÀÇÁÑHˆØ!,Ñˆ à—m‘m“oˆØ"Ÿm™m¨JÓ?Ñˆà+¨AÑ.Ø+¨AÑ.Ü$&§I¡I§N¡NÐ3DÈ! NÐ$LØ$Ø$ñ ðð ˆð×Ñ˜wÓ&ØK‰KŒMà E¨5°$Ð6Ð6r8rFcó\•URU-nURU5nX4- nUU*S.nXV4$)N)Úreward_forwardÚreward_ctrl)r%r=)r3rFr<Úforward_rewardÚ ctrl_costrTrUs r6rKÚSwimmerEnv._get_rewøsF€Ø×4Ñ4°zÑAˆØ×%Ñ% fÓ-ˆ àÑ+ˆð-Ø%˜:ñ ˆð Ð"Ð"r8có•URRR5nURRR5nUR(aUSSn[ R"X/5R5nU$)Nr)r.r!Úflattenr"r(r*ÚconcatenateÚravel)r3ÚpositionÚvelocityrSs r6rJÚSwimmerEnv._get_obssb€Ø—9‘9—>‘>×)Ñ)Ó+ˆØ—9‘9—>‘>×)Ñ)Ó+ˆà×;×;Ø |ˆHä—n’n hÐ%9Ó:×@Ñ@ÓBˆØÐr8cóf•UR*nURnURURRXURR S9-nURURRXURRS9-nURX45 UR5nU$)N)rrr/) r'Ú init_qposÚ np_randomÚuniformÚmodelÚnqÚ init_qvelÚnvÚ set_staterJ)r3Ú noise_lowÚ noise_highr!r"rSs r6Úreset_modelÚSwimmerEnv.reset_modelsž€Ø×,Ñ,Ð,ˆ Ø×,Ñ,ˆ à~‰~ §¡× 6Ñ 6Ø°·±·±ð!7ð! ñ ˆð~‰~ §¡× 6Ñ 6Ø°·±·±ð!7ð! ñ ˆð ‰tÔ"à—m‘m“oˆØÐr8cóÐ•URRSURRS[RR URRSSSS9S.$)Nrr@rrA)rCrDrE)r.r!r*rLrM)r3s r6Ú_get_reset_infoÚSwimmerEnv._get_reset_infosO€àŸ)™)Ÿ.™.¨Ñ+ØŸ)™)Ÿ.™.¨Ñ+Ü$&§I¡I§N¡N°4·9±9·>±>À!ÀAÐ3FÈA NÐ$Nñ ð r8)r&r(r%r'r-rr2N)Ú__name__Ú __module__Ú__qualname__Ú__firstlineno__Ú__doc__r-Ústrr)ÚdictÚfloatÚboolr$r=rWrKrJrqrtÚ__static_attributes__©r8r6rr s¶†ñCðL ò ð€Hð&ØØ8:Ø'*Ø"&Ø#&Ø;?ñA àðA ððA ð $ C¨°©Ð$4Ñ5ð A ð %ðA ð ð A ð!ðA ð59õA òFò7ð0# 5ô#òòõ r8r)Ú__credits__Únumpyr*Ú gymnasiumrÚgymnasium.envs.mujocorÚgymnasium.spacesrr#rr€r8r6Úr†s/ðØ$ nÐ5€ãåÝ+Ý ôZ ˜EŸN™NõZ r8