
    hMC                         S r SSKrSSKJrJrJr  SSKrSSKJrJ	r	  SSK
Jr  SSKJr  Sr/ SQrS	rS
r " S S\5      rS rSS jrS rg)zclassic Acrobot task    N)cospisin)Envspaces)utils)DependencyNotInstalledz,Copyright 2013, RLPy http://acl.mit.edu/RLPy)zAlborz GeramifardzRobert H. KleinzChristoph DannzWilliam DabneyzJonathan P. HowzBSD 3-ClausezChristoph Dann <cdann@cdann.de>c                      ^  \ rS rSrSrSS/SS.rSrSrSrSr	Sr
S	rS	rSrS
\-  rS\-  r/ SQrSrSrSrSrSrSrSS\S-  4S jjrSSS.S\S-  S\S-  4U 4S jjjrS rS rS rS r S r!S r"Sr#U =r$$ ) 
AcrobotEnv   u5  
## Description

The Acrobot environment is based on Sutton's work in
["Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding"](https://papers.nips.cc/paper/1995/hash/8f1d43620bc6bb580df6e80b0dc05c48-Abstract.html)
and [Sutton and Barto's book](http://www.incompleteideas.net/book/the-book-2nd.html).
The system consists of two links connected linearly to form a chain, with one end of
the chain fixed. The joint between the two links is actuated. The goal is to apply
torques on the actuated joint to swing the free end of the linear chain above a
given height while starting from the initial state of hanging downwards.

As seen in the **Gif**: two blue links connected by two green joints. The joint in
between the two links is actuated. The goal is to swing the free end of the outer-link
to reach the target height (black horizontal line above system) by applying torque on
the actuator.

## Action Space

The action is discrete, deterministic, and represents the torque applied on the actuated
joint between the two links.

| Num | Action                                | Unit         |
|-----|---------------------------------------|--------------|
| 0   | apply -1 torque to the actuated joint | torque (N m) |
| 1   | apply 0 torque to the actuated joint  | torque (N m) |
| 2   | apply 1 torque to the actuated joint  | torque (N m) |

## Observation Space

The observation is a `ndarray` with shape `(6,)` that provides information about the
two rotational joint angles as well as their angular velocities:

| Num | Observation                  | Min                 | Max               |
|-----|------------------------------|---------------------|-------------------|
| 0   | Cosine of `theta1`           | -1                  | 1                 |
| 1   | Sine of `theta1`             | -1                  | 1                 |
| 2   | Cosine of `theta2`           | -1                  | 1                 |
| 3   | Sine of `theta2`             | -1                  | 1                 |
| 4   | Angular velocity of `theta1` | ~ -12.567 (-4 * pi) | ~ 12.567 (4 * pi) |
| 5   | Angular velocity of `theta2` | ~ -28.274 (-9 * pi) | ~ 28.274 (9 * pi) |

where
- `theta1` is the angle of the first joint, where an angle of 0 indicates the first link is pointing directly
downwards.
- `theta2` is ***relative to the angle of the first link.***
    An angle of 0 corresponds to having the same angle between the two links.

The angular velocities of `theta1` and `theta2` are bounded at ±4π, and ±9π rad/s respectively.
A state of `[1, 0, 1, 0, ..., ...]` indicates that both links are pointing downwards.

## Rewards

The goal is to have the free end reach a designated target height in as few steps as possible,
and as such all steps that do not reach the goal incur a reward of -1.
Achieving the target height results in termination with a reward of 0. The reward threshold is -100.

## Starting State

Each parameter in the underlying state (`theta1`, `theta2`, and the two angular velocities) is initialized
uniformly between -0.1 and 0.1. This means both links are pointing downwards with some initial stochasticity.

## Episode End

The episode ends if one of the following occurs:
1. Termination: The free end reaches the target height, which is constructed as:
`-cos(theta1) - cos(theta2 + theta1) > 1.0`
2. Truncation: Episode length is greater than 500 (200 for v0)

## Arguments

Acrobot only has `render_mode` as a keyword for `gymnasium.make`.
On reset, the `options` parameter allows the user to change the bounds used to determine the new random state.

```python
>>> import gymnasium as gym
>>> env = gym.make('Acrobot-v1', render_mode="rgb_array")
>>> env
<TimeLimit<OrderEnforcing<PassiveEnvChecker<AcrobotEnv<Acrobot-v1>>>>>
>>> env.reset(seed=123, options={"low": -0.2, "high": 0.2})  # default low=-0.1, high=0.1
(array([ 0.997341  ,  0.07287608,  0.9841162 , -0.17752565, -0.11185605,
       -0.12625128], dtype=float32), {})

```

By default, the dynamics of the acrobot follow those described in Sutton and Barto's book
[Reinforcement Learning: An Introduction](http://incompleteideas.net/book/11/node4.html).
However, a `book_or_nips` parameter can be modified to change the pendulum dynamics to those described
in the original [NeurIPS paper](https://papers.nips.cc/paper/1995/hash/8f1d43620bc6bb580df6e80b0dc05c48-Abstract.html).

```python
# To change the dynamics as described above
env.unwrapped.book_or_nips = 'nips'
```

See the following note for details:

> The dynamics equations were missing some terms in the NIPS paper which are present in the book.
  R. Sutton confirmed in personal correspondence that the experimental results shown in the paper and the book were
  generated with the equations shown in the book. However, there is the option to run the domain with the paper equations
  by setting `book_or_nips = 'nips'`

## Version History

- v1: Maximum number of steps increased from 200 to 500. The observation space for v0 provided direct readings of
`theta1` and `theta2` in radians, having a range of `[-pi, pi]`. The v1 observation space as described here provides the
sine and cosine of each angle instead.
- v0: Initial versions release

## References
- Sutton, R. S. (1996). Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding.
    In D. Touretzky, M. C. Mozer, & M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8).
    MIT Press. https://proceedings.neurips.cc/paper/1995/file/8f1d43620bc6bb580df6e80b0dc05c48-Paper.pdf
- Sutton, R. S., Barto, A. G. (2018 ). Reinforcement Learning: An Introduction. The MIT Press.
human	rgb_array   )render_modes
render_fps皙?      ?g      ?   	   )                 r   i  bookN   render_modec                 N   Xl         S U l        S U l        SU l        [        R
                  " SSSSU R                  U R                  /[        R                  S9nU* n[        R                  " X2[        R                  S9U l        [        R                  " S5      U l        S U l        g )NTr   dtype)lowhighr   r   )r   screenclockisopennparray	MAX_VEL_1	MAX_VEL_2float32r   Boxobservation_spaceDiscreteaction_spacestate)selfr   r    r   s       `/home/james-whalen/.local/lib/python3.13/site-packages/gymnasium/envs/classic_control/acrobot.py__init__AcrobotEnv.__init__   s    &
xx#sC@


 e!'bjj!Q"OOA.
    )seedoptionsr3   r4   c                .  > [         TU ]  US9  [        R                  " USS5      u  p4U R                  R                  X4SS9R                  [        R                  5      U l	        U R                  S:X  a  U R                  5         U R                  5       0 4$ )N)r3   皙皙?)r   )r   r    sizer   )superresetr   maybe_parse_reset_bounds	np_randomuniformastyper$   r(   r-   r   render_get_ob)r.   r3   r4   r   r    	__class__s        r/   r:   AcrobotEnv.reset   s    4  22T3
	 ^^++T+JQQJJ

 w&KKM||~r!!r2   c                    U R                   nUc   S5       eU R                  U   nU R                  S:  a3  X0R                  R	                  U R                  * U R                  5      -  n[
        R                  " X#5      n[        U R                  USU R                  /5      n[        US   [        * [        5      US'   [        US   [        * [        5      US'   [        US   U R                  * U R                  5      US'   [        US   U R                  * U R                  5      US'   XPl         U R                  5       nU(       d  SOSnU R                   S:X  a  U R#                  5         U R%                  5       XvS	0 4$ )
N*Call reset before using AcrobotEnv object.r   r      r   r   r   r   F)r-   AVAIL_TORQUEtorque_noise_maxr<   r=   r$   appendrk4_dsdtdtwrapr   boundr&   r'   	_terminalr   r?   r@   )r.   astorques_augmentedns
terminatedrewards           r/   stepAcrobotEnv.step   sH   JJ}JJJ}""1%   1$nn,,&&&(=(= F ii*[1dgg,7RURC$1RURC$1bednn_dnn=1bednn_dnn=1
^^%
'Sw&KKM||~v5"<<r2   c           	          U R                   nUc   S5       e[        R                  " [        US   5      [	        US   5      [        US   5      [	        US   5      US   US   /[        R
                  S9$ )NrD   r   r   rE   r   r   )r-   r$   r%   r   r   r(   r.   rP   s     r/   r@   AcrobotEnv._get_ob   sj    JJ}JJJ}xx1YAaD	3qt9c!A$i1qtDBJJ
 	
r2   c                     U R                   nUc   S5       e[        [        US   5      * [        US   US   -   5      -
  S:  5      $ )NrD   r   r   r   )r-   boolr   rY   s     r/   rN   AcrobotEnv._terminal   sK    JJ}JJJ}S1YJQqTAaD[!11C788r2   c                 ~   U R                   nU R                  nU R                  nU R                  nU R                  nU R
                  nU R
                  nSn	US   n
US S nUS   nUS   nUS   nUS   nX%S-  -  X4S-  US-  -   SU-  U-  [        U5      -  -   -  -   U-   U-   nX6S-  XF-  [        U5      -  -   -  U-   nX6-  U	-  [        X-   [        S-  -
  5      -  nU* U-  U-  US-  -  [        U5      -  SU-  U-  U-  U-  U-  [        U5      -  -
  X%-  X4-  -   U	-  [        U[        S-  -
  5      -  -   U-   nU R                  S:X  a#  U
UU-  U-  -   U-
  X6S-  -  U-   US-  U-  -
  -  nO<U
UU-  U-  -   X4-  U-  US-  -  [        U5      -  -
  U-
  X6S-  -  U-   US-  U-  -
  -  nUU-  U-   * U-  nXUUS	4$ )
Ng#@r   r   rE   r          @nipsr   )
LINK_MASS_1LINK_MASS_2LINK_LENGTH_1LINK_COM_POS_1LINK_COM_POS_2LINK_MOIr   r   r   book_or_nips)r.   rR   m1m2l1lc1lc2I1I2grO   rP   theta1theta2dtheta1dtheta2d1d2phi2phi1ddtheta2ddtheta1s                         r/   rJ   AcrobotEnv._dsdt   sO   !!!!]]]]O11A$A$q&[2Qa!b&3,V2L!LMMPRRUWW6BHs6{223b8x!|c&/BH"<==C"HsNWaZ'#f+5"frkC')G3c&kABx"'!Q&Vb1f_)==>  	 & BGdN*T1b6kB6FQQS6STH
 BGdN"RWs]WaZ%?#f+%MMPTT1fr!BEBJ.0H (]T)*R/8S88r2   c           
      ^
   U R                   cG  U R                  c   e[        R                  R	                  SU R                  R
                   S35        g  SS KnSSKJn  U R                  c  UR                  5         U R                   S:X  aQ  UR                  R                  5         UR                  R                  U R                  U R                  45      U l
        O,UR                  U R                  U R                  45      U l
        U R                   c  UR"                  R%                  5       U l        UR                  U R                  U R                  45      nUR'                  S5        U R(                  nU R*                  U R,                  -   S-   nU R                  US	-  -  nU R                  S	-  nUc  g U R*                  * [/        US   5      -  U-  U R*                  [1        US   5      -  U-  /n	U	S   U R,                  [/        US   US
   -   5      -  U-  -
  U	S
   U R,                  [1        US   US
   -   5      -  U-  -   /n
[2        R4                  " SS/X/5      S S 2S S S24   nUS   [6        S	-  -
  US   US
   -   [6        S	-  -
  /nU R*                  U-  U R,                  U-  /nUR8                  R;                  USU-  U-   S
U-  U-   4SU-  U-   S
U-  U-   4SS9  [=        XU5       GH  u  u  pnnX-   nX-   nSUSU-  SU-  4u  nnnnUU4UU4UU4UU4/n/ nU HN  nUR>                  RA                  U5      RC                  U5      nUS   U-   US
   U-   4nURE                  U5        MP     URG                  UUS5        URI                  UUS5        URK                  U[M        U5      [M        U5      [M        SU-  5      S5        URO                  U[M        U5      [M        U5      [M        SU-  5      S5        GM     URP                  RS                  USS5      nU R                  RU                  US5        U R                   S:X  a]  URV                  RY                  5         U R                   R[                  U R\                  S   5        UR                  RS                  5         g U R                   S:X  aL  [2        R^                  " [2        R4                  " UR`                  Rc                  U R                  5      5      SS9$ g ! [         a  n[        S5      UeS nAff = f)NzYou are calling render method without specifying any render mode. You can specify the render_mode at initialization, e.g. gym.make("z", render_mode="rgb_array")r   )gfxdrawzGpygame is not installed, run `pip install "gymnasium[classic-control]"`r   )   r~   r~   r   rE   r   r_   gg@)r   r   r   )	start_posend_poscolorr7   r6   )r      r   )r   r   r   FT)r   r   r   r   )r   r   rE   )axes)2r   specgymloggerwarnidpygamer}   ImportErrorr	   r!   initdisplayset_mode
SCREEN_DIMSurfacer"   timeClockfillr-   rd   LINK_LENGTH_2r   r   r$   r%   r   drawlinezipmathVector2
rotate_radrH   	aapolygonfilled_polygonaacircleintfilled_circle	transformflipbliteventpumptickmetadata	transpose	surfarraypixels3d)r.   r   r}   esurfrP   rM   scaleoffsetp1p2xysthetaslink_lengthsxythllenlrtbcoordstransformed_coordscoords                            r/   r?   AcrobotEnv.render  s   #99(((JJOO""&)),,/JL
 	& ;;KKM7*##%$nn55__doo6 %nndoot-OP::**,DJ~~t@A		/"JJ""T%7%77#=519-1$9 #ad)+e3QqT*U2
 qED&&QqTAaD[)99EAAqED&&QqTAaD[)99EAA

 hhA'(DbD1A$a-1!rAv!56**U2D4F4F4NOe|f,a%i&.@A5[6)1u9v+=>	 	 	
 !$C >FQB
A
AD#+te|;JAq!Q!fq!fq!fq!f5F!#++E2==bAqAuQx!|4"))%0   d$6F""4);]KT3q63q63sU{3C]S!!$AAC%K8H-X !?  $$T5$7v&w&LLJJOODMM,78NN!,<<))224;;?@y  -S  	(Y	s   
T 
T,T''T,c                     U R                   b6  SS KnUR                  R                  5         UR                  5         SU l        g g )Nr   F)r!   r   r   quitr#   )r.   r   s     r/   closeAcrobotEnv.closes  s4    ;;"NN!KKMDK #r2   )r,   r"   r#   r*   r   r!   r-   N)%__name__
__module____qualname____firstlineno____doc__r   rK   rd   r   rb   rc   re   rf   rg   r   r&   r'   rF   rG   r   rh   action_arrow
domain_figactions_numstrr0   r   dictr:   rV   r@   rN   rJ   r?   r   __static_attributes____classcell__)rA   s   @r/   r   r      s    qh !+.H
 
BMMKKNNHBIBI"LJ LLJKC$J  +/t "S4Z " " "=<
9
#9JYv   r2   r   c                 N    X!-
  nX:  a  X-
  n X:  a  M  X:  a  X-   n X:  a  M  U $ )aH  Wraps `x` so m <= x <= M; but unlike `bound()` which
truncates, `wrap()` wraps x around the coordinate system defined by m,M.

For example, m = -180, M = 180 (degrees), x = 360 --> returns 0.

Args:
    x: a scalar
    m: minimum possible value in range
    M: maximum possible value in range

Returns:
    x: a scalar, wrapped
 )r   mMdiffs       r/   rL   rL   |  s6     5D
%H %
%H %Hr2   c                 F    Uc
  US   nUS   n[        [        X5      U5      $ )a  Either have m as scalar, so bound(x,m,M) which returns m <= x <= M *OR*
have m as length 2 vector, bound(x,m, <IGNORED>) returns m[0] <= x <= m[1].

Args:
    x: scalar
    m: The lower bound
    M: The upper bound

Returns:
    x: scalar, bound between min (m) and Max (M)
r   r   )minmax)r   r   r   s      r/   rM   rM     s,     	yaDaDs1y!r2   c                     [        U5      n[        R                  " [        U5      U4[        R                  5      nXS'   [        R
                  " [        U5      S-
  5       H  nX%   nX%S-      U-
  nUS-  nXE   n[        R                  " U " U5      5      n	[        R                  " U " XU	-  -   5      5      n
[        R                  " U " XU
-  -   5      5      n[        R                  " U " XU-  -   5      5      nXS-  U	SU
-  -   SU-  -   U-   -  -   XES-   '   M     US   SS $ ! [         a3    [        R                  " [        U5      4[        R                  5      n GN%f = f)	a   
Integrate 1-D or N-D system of ODEs using 4-th order Runge-Kutta.

Example for 2D system:

    >>> def derivs(x):
    ...     d1 =  x[0] + 2*x[1]
    ...     d2 =  -3*x[0] + 4*x[1]
    ...     return d1, d2

    >>> dt = 0.0005
    >>> t = np.arange(0.0, 2.0, dt)
    >>> y0 = (1,2)
    >>> yout = rk4(derivs, y0, t)

Args:
    derivs: the derivative of the system and has the signature `dy = derivs(yi)`
    y0: initial state vector
    t: sample times

Returns:
    yout: Runge-Kutta approximation of the ODE
r   r   r`   g      @rE   r_   Nr   )lenr$   zerosfloat64	TypeErrorarangeasarray)derivsy0r   NyyoutithisrK   dt2k1k2k3k4s                r/   rI   rI     s=   22W xxQbjj1GYYs1vz"t1uX_3hWZZr
#ZZr"H}-.ZZr"H}-.ZZrG|,-8rAF{QV';b'@AAU # 8BQ<'  /xxQ	2::./s   D& &9E#"E#r   )r   numpyr$   r   r   r   	gymnasiumr   r   r   gymnasium.envs.classic_controlr   gymnasium.errorr	   __copyright____credits____license__
__author__r   rL   rM   rI   r   r2   r/   <module>r      sU         ! 0 2 ? .
^  ^ B*&.r2   