벡터화

Gymnasium.vector.VectorEnv

class gymnasium.vector.VectorEnv[source]

Base class for vectorized environments to run multiple independent copies of the same environment in parallel.

Vector environments can provide a linear speed-up in the steps taken per second through sampling multiple sub-environments at the same time. Gymnasium contains two generalised Vector environments: AsyncVectorEnv and SyncVectorEnv along with several custom vector environment implementations. For reset() and step() batches observations, rewards, terminations, truncations and info for each sub-environment, see the example below. For the rewards, terminations, and truncations, the data is packaged into a NumPy array of shape (num_envs,). For observations (and actions, the batching process is dependent on the type of observation (and action) space, and generally optimised for neural network input/outputs. For info, the data is kept as a dictionary such that a key will give the data for all sub-environment.

For creating environments, make_vec() is a vector environment equivalent to make() for easily creating vector environments that contains several unique arguments for modifying environment qualities, number of environment, vectorizer type, vectorizer arguments.

To avoid having to wait for all sub-environments to terminated before resetting, implementations can autoreset sub-environments on episode end (terminated or truncated is True). This is crucial for correct implementing training algorithms with vector environments. By default, Gymnasium’s implementation uses next-step autoreset, with AutoresetMode enum as the options. The mode used by vector environment should be available in metadata[“autoreset_mode”]. Warning, some vector implementations or training algorithms will only support particular autoreset modes. For more information, read https://farama.org/Vector-Autoreset-Mode.

Note

The info parameter of reset() and step() was originally implemented before v0.25 as a list of dictionary for each sub-environment. However, this was modified in v0.25+ to be a dictionary with a NumPy array for each key. To use the old info style, utilise the DictInfoToList wrapper.

Examples

>>> import gymnasium as gym
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync", wrappers=(gym.wrappers.TimeAwareObservation,))
>>> envs = gym.wrappers.vector.ClipReward(envs, min_reward=0.2, max_reward=0.8)
>>> envs
<ClipReward, SyncVectorEnv(CartPole-v1, num_envs=3)>
>>> envs.num_envs
3
>>> envs.action_space
MultiDiscrete([2 2 2])
>>> envs.observation_space
Box([[-4.80000019        -inf -0.41887903        -inf  0.        ]
 [-4.80000019        -inf -0.41887903        -inf  0.        ]
 [-4.80000019        -inf -0.41887903        -inf  0.        ]], [[4.80000019e+00            inf 4.18879032e-01            inf
  5.00000000e+02]
 [4.80000019e+00            inf 4.18879032e-01            inf
  5.00000000e+02]
 [4.80000019e+00            inf 4.18879032e-01            inf
  5.00000000e+02]], (3, 5), float64)
>>> observations, infos = envs.reset(seed=123)
>>> observations
array([[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282,  0.        ],
       [ 0.02852531,  0.02858594,  0.0469136 ,  0.02480598,  0.        ],
       [ 0.03517495, -0.000635  , -0.01098382, -0.03203924,  0.        ]])
>>> infos
{}
>>> _ = envs.action_space.seed(123)
>>> actions = envs.action_space.sample()
>>> observations, rewards, terminations, truncations, infos = envs.step(actions)
>>> observations
array([[ 0.01734283,  0.15089367, -0.02859527, -0.33293587,  1.        ],
       [ 0.02909703, -0.16717631,  0.04740972,  0.3319138 ,  1.        ],
       [ 0.03516225, -0.19559774, -0.01162461,  0.25715804,  1.        ]])
>>> rewards
array([0.8, 0.8, 0.8])
>>> terminations
array([False, False, False])
>>> truncations
array([False, False, False])
>>> infos
{}
>>> envs.close()

The Vector Environments have the additional attributes for users to understand the implementation

메서드

VectorEnv.step(actions: ActType) tuple[ObsType, ArrayType, ArrayType, ArrayType, dict[str, Any]][source]

Take an action for each parallel environment.

Parameters:

actions – Batch of actions with the action_space shape.

Returns:

Batch of (observations, rewards, terminations, truncations, infos)

Note

As the vector environments autoreset for a terminating and truncating sub-environments, this will occur on the next step after terminated or truncated is True.

Example

>>> import gymnasium as gym
>>> import numpy as np
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
>>> _ = envs.reset(seed=42)
>>> actions = np.array([1, 0, 1], dtype=np.int32)
>>> observations, rewards, terminations, truncations, infos = envs.step(actions)
>>> observations
array([[ 0.02727336,  0.18847767,  0.03625453, -0.26141977],
       [ 0.01431748, -0.24002443, -0.04731862,  0.3110827 ],
       [-0.03822722,  0.1710671 , -0.00848456, -0.2487226 ]],
      dtype=float32)
>>> rewards
array([1., 1., 1.])
>>> terminations
array([False, False, False])
>>> terminations
array([False, False, False])
>>> infos
{}
VectorEnv.reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[ObsType, dict[str, Any]][source]

Reset all parallel environments and return a batch of initial observations and info.

Parameters:
  • seed – The environment reset seed

  • options – If to return the options

Returns:

A batch of observations and info from the vectorized environment.

Example

>>> import gymnasium as gym
>>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync")
>>> observations, infos = envs.reset(seed=42)
>>> observations
array([[ 0.0273956 , -0.00611216,  0.03585979,  0.0197368 ],
       [ 0.01522993, -0.04562247, -0.04799704,  0.03392126],
       [-0.03774345, -0.02418869, -0.00942293,  0.0469184 ]],
      dtype=float32)
>>> infos
{}
VectorEnv.render() tuple[RenderFrame, ...] | None[source]

Returns the rendered frames from the parallel environments.

Returns:

A tuple of rendered frames from the parallel environments

VectorEnv.close(**kwargs: Any)[source]

Close all parallel environments and release resources.

It also closes all the existing image viewers, then calls close_extras() and set closed as True.

Warning

This function itself does not close the environments, it should be handled in close_extras(). This is generic for both synchronous and asynchronous vectorized environments.

Note

This will be automatically called when garbage collected or program exited.

Parameters:

**kwargs – Keyword arguments passed to close_extras()

속성

VectorEnv.num_envs: int

벡터 환경에 포함된 하위 환경의 수.

VectorEnv.action_space: gym.Space

(배치된) 액션 공간. `step`의 입력 액션은 `action_space`의 유효한 요소여야 합니다.

VectorEnv.observation_space: gym.Space

(배치된) 관찰 공간. reset`step`에 의해 반환되는 관찰은 `observation_space`의 유효한 요소입니다.

VectorEnv.single_action_space: gym.Space

하위 환경의 액션 공간.

VectorEnv.single_observation_space: gym.Space

하위 환경의 관찰 공간.

VectorEnv.spec: EnvSpec | None = None

gymnasium.make_vec() 중에 일반적으로 설정되는 환경의 EnvSpec

VectorEnv.metadata: dict[str, Any] = {}

렌더링 모드, 렌더링 FPS 등을 포함하는 환경의 메타데이터

VectorEnv.render_mode: str | None = None

`Env.render_mode`와 유사한 사양을 따라야 하는 환경의 렌더링 모드.

VectorEnv.closed: bool = False

벡터 환경이 이미 닫혔는지 여부.

추가 메서드

property VectorEnv.unwrapped

Return the base environment.

property VectorEnv.np_random: Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns:

Instances of `np.random.Generator`

property VectorEnv.np_random_seed: int | None

Returns the environment’s internal _np_random_seed that if not set will first initialise with a random int as seed.

If np_random_seed was set directly instead of through reset() or set_np_random_through_seed(), the seed will take the value -1.

Returns:

int – the seed of the current np_random or -1, if the seed of the rng is unknown

벡터 환경 만들기

벡터 환경을 생성하기 위해 gymnasium은 :func:`gymnasium.make`와 동일한 함수로 :func:`gymnasium.make_vec`를 제공합니다.