벡터화¶
Gymnasium.vector.VectorEnv¶
- class gymnasium.vector.VectorEnv[source]¶
Base class for vectorized environments to run multiple independent copies of the same environment in parallel.
Vector environments can provide a linear speed-up in the steps taken per second through sampling multiple sub-environments at the same time. Gymnasium contains two generalised Vector environments:
AsyncVectorEnv
andSyncVectorEnv
along with several custom vector environment implementations. Forreset()
andstep()
batches observations, rewards, terminations, truncations and info for each sub-environment, see the example below. For the rewards, terminations, and truncations, the data is packaged into a NumPy array of shape (num_envs,). For observations (and actions, the batching process is dependent on the type of observation (and action) space, and generally optimised for neural network input/outputs. For info, the data is kept as a dictionary such that a key will give the data for all sub-environment.For creating environments,
make_vec()
is a vector environment equivalent tomake()
for easily creating vector environments that contains several unique arguments for modifying environment qualities, number of environment, vectorizer type, vectorizer arguments.To avoid having to wait for all sub-environments to terminated before resetting, implementations can autoreset sub-environments on episode end (terminated or truncated is True). This is crucial for correct implementing training algorithms with vector environments. By default, Gymnasium’s implementation uses next-step autoreset, with
AutoresetMode
enum as the options. The mode used by vector environment should be available in metadata[“autoreset_mode”]. Warning, some vector implementations or training algorithms will only support particular autoreset modes. For more information, read https://farama.org/Vector-Autoreset-Mode.Note
The info parameter of
reset()
andstep()
was originally implemented before v0.25 as a list of dictionary for each sub-environment. However, this was modified in v0.25+ to be a dictionary with a NumPy array for each key. To use the old info style, utilise theDictInfoToList
wrapper.Examples
>>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync", wrappers=(gym.wrappers.TimeAwareObservation,)) >>> envs = gym.wrappers.vector.ClipReward(envs, min_reward=0.2, max_reward=0.8) >>> envs <ClipReward, SyncVectorEnv(CartPole-v1, num_envs=3)> >>> envs.num_envs 3 >>> envs.action_space MultiDiscrete([2 2 2]) >>> envs.observation_space Box([[-4.80000019 -inf -0.41887903 -inf 0. ] [-4.80000019 -inf -0.41887903 -inf 0. ] [-4.80000019 -inf -0.41887903 -inf 0. ]], [[4.80000019e+00 inf 4.18879032e-01 inf 5.00000000e+02] [4.80000019e+00 inf 4.18879032e-01 inf 5.00000000e+02] [4.80000019e+00 inf 4.18879032e-01 inf 5.00000000e+02]], (3, 5), float64) >>> observations, infos = envs.reset(seed=123) >>> observations array([[ 0.01823519, -0.0446179 , -0.02796401, -0.03156282, 0. ], [ 0.02852531, 0.02858594, 0.0469136 , 0.02480598, 0. ], [ 0.03517495, -0.000635 , -0.01098382, -0.03203924, 0. ]]) >>> infos {} >>> _ = envs.action_space.seed(123) >>> actions = envs.action_space.sample() >>> observations, rewards, terminations, truncations, infos = envs.step(actions) >>> observations array([[ 0.01734283, 0.15089367, -0.02859527, -0.33293587, 1. ], [ 0.02909703, -0.16717631, 0.04740972, 0.3319138 , 1. ], [ 0.03516225, -0.19559774, -0.01162461, 0.25715804, 1. ]]) >>> rewards array([0.8, 0.8, 0.8]) >>> terminations array([False, False, False]) >>> truncations array([False, False, False]) >>> infos {} >>> envs.close()
The Vector Environments have the additional attributes for users to understand the implementation
num_envs
- The number of sub-environment in the vector environmentobservation_space
- The batched observation space of the vector environmentsingle_observation_space
- The observation space of a single sub-environmentaction_space
- The batched action space of the vector environmentsingle_action_space
- The action space of a single sub-environment
메서드¶
- VectorEnv.step(actions: ActType) tuple[ObsType, ArrayType, ArrayType, ArrayType, dict[str, Any]] [source]¶
Take an action for each parallel environment.
- Parameters:
actions – Batch of actions with the
action_space
shape.- Returns:
Batch of (observations, rewards, terminations, truncations, infos)
Note
As the vector environments autoreset for a terminating and truncating sub-environments, this will occur on the next step after terminated or truncated is True.
Example
>>> import gymnasium as gym >>> import numpy as np >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> _ = envs.reset(seed=42) >>> actions = np.array([1, 0, 1], dtype=np.int32) >>> observations, rewards, terminations, truncations, infos = envs.step(actions) >>> observations array([[ 0.02727336, 0.18847767, 0.03625453, -0.26141977], [ 0.01431748, -0.24002443, -0.04731862, 0.3110827 ], [-0.03822722, 0.1710671 , -0.00848456, -0.2487226 ]], dtype=float32) >>> rewards array([1., 1., 1.]) >>> terminations array([False, False, False]) >>> terminations array([False, False, False]) >>> infos {}
- VectorEnv.reset(*, seed: int | None = None, options: dict[str, Any] | None = None) tuple[ObsType, dict[str, Any]] [source]¶
Reset all parallel environments and return a batch of initial observations and info.
- Parameters:
seed – The environment reset seed
options – If to return the options
- Returns:
A batch of observations and info from the vectorized environment.
Example
>>> import gymnasium as gym >>> envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="sync") >>> observations, infos = envs.reset(seed=42) >>> observations array([[ 0.0273956 , -0.00611216, 0.03585979, 0.0197368 ], [ 0.01522993, -0.04562247, -0.04799704, 0.03392126], [-0.03774345, -0.02418869, -0.00942293, 0.0469184 ]], dtype=float32) >>> infos {}
- VectorEnv.render() tuple[RenderFrame, ...] | None [source]¶
Returns the rendered frames from the parallel environments.
- Returns:
A tuple of rendered frames from the parallel environments
- VectorEnv.close(**kwargs: Any)[source]¶
Close all parallel environments and release resources.
It also closes all the existing image viewers, then calls
close_extras()
and setclosed
asTrue
.Warning
This function itself does not close the environments, it should be handled in
close_extras()
. This is generic for both synchronous and asynchronous vectorized environments.Note
This will be automatically called when garbage collected or program exited.
- Parameters:
**kwargs – Keyword arguments passed to
close_extras()
속성¶
- VectorEnv.num_envs: int¶
벡터 환경에 포함된 하위 환경의 수.
- VectorEnv.observation_space: gym.Space¶
(배치된) 관찰 공간. reset 및 `step`에 의해 반환되는 관찰은 `observation_space`의 유효한 요소입니다.
- VectorEnv.single_action_space: gym.Space¶
하위 환경의 액션 공간.
- VectorEnv.single_observation_space: gym.Space¶
하위 환경의 관찰 공간.
- VectorEnv.spec: EnvSpec | None = None¶
gymnasium.make_vec()
중에 일반적으로 설정되는 환경의EnvSpec
- VectorEnv.metadata: dict[str, Any] = {}¶
렌더링 모드, 렌더링 FPS 등을 포함하는 환경의 메타데이터
- VectorEnv.closed: bool = False¶
벡터 환경이 이미 닫혔는지 여부.
추가 메서드¶
- property VectorEnv.unwrapped¶
Return the base environment.
- property VectorEnv.np_random: Generator¶
Returns the environment’s internal
_np_random
that if not set will initialise with a random seed.- Returns:
Instances of `np.random.Generator`
- property VectorEnv.np_random_seed: int | None¶
Returns the environment’s internal
_np_random_seed
that if not set will first initialise with a random int as seed.If
np_random_seed
was set directly instead of throughreset()
orset_np_random_through_seed()
, the seed will take the value -1.- Returns:
int – the seed of the current np_random or -1, if the seed of the rng is unknown
벡터 환경 만들기¶
벡터 환경을 생성하기 위해 gymnasium은 :func:`gymnasium.make`와 동일한 함수로 :func:`gymnasium.make_vec`를 제공합니다.