Detailed Explanation of GreenhouseEnv Class
The GreenhouseEnv
class is a custom reinforcement learning environment that follows the OpenAI Gym interface standard. This environment simulates a greenhouse system, allowing AI reinforcement learning agents to learn how to optimize greenhouse temperature control to maximize crop yield and energy efficiency.
Class Overview
The GreenhouseEnv
class inherits from gym.Env
and implements the standard reinforcement learning environment interface, including step
, reset
, and render
methods. It uses GreenLightModel
to simulate the physical processes of the greenhouse and defines observation and action spaces.
Main Attributes
first_day
: Starting date of the simulationnew_first_day
: Randomly selected new starting date for each episodeisMature
: Whether the crop starts from a mature stateepw_path
: Path to weather data fileseason_length
: Length of simulated season (in days)season_interval
: Time interval for each model run (in days)current_step
: Current simulation stepinit_state
: Initial state parameterstarget_yield
: Target yieldtarget_yield_unit_energy_input
: Target energy consumption per unit yieldtarget_harvest_unit_energy_input
: Target energy consumption per unit harvestmodel
: GreenLightModel instanceyield_unit_energy_input
: Actual energy consumption per unit yieldtotal_energy_input
: Total energy consumptiongrowth_energy_input
: Energy consumption during growth periodharvest_energy_input
: Energy consumption during harvest periodharvest_unit_energy_input
: Energy consumption per unit harvesttotal_yield
: Total yieldtotal_reward
: Cumulative rewardcost_penalty
: Cost penaltyyield_change
: Yield changeaction_space
: Action space (temperature control)observation_space
: Observation space
Main Methods
__init__
Initializes a GreenhouseEnv
instance.
def __init__(self, env_config):
super(GreenhouseEnv, self).__init__()
# Initialize various attributes
self.first_day = env_config.get("first_day", 1)
self.new_first_day = random.randint(90, 120)
self.isMature = env_config.get("isMature", False)
self.epw_path = env_config.get("epw_path", "")
self.season_length = env_config.get("season_length", 60)
self.season_interval = env_config.get("season_interval", 1/24)
self.current_step = env_config.get("current_step", 0)
self.init_state = env_config.get("init_state", {})
self.target_yield = env_config.get("target_yield", 0)
self.target_yield_unit_energy_input = env_config.get("target_yield_unit_energy_input", 0)
self.target_harvest_unit_energy_input = env_config.get("target_harvest_unit_energy_input", 0)
# Initialize GreenLightModel
self.model = GreenLightModel(epw_path=self.epw_path, first_day=self.new_first_day, isMature=self.isMature)
# Initialize performance indicators
self.yield_unit_energy_input = 0
self.total_energy_input = 0
self.growth_energy_input = 0
self.harvest_energy_input = 0
self.harvest_unit_energy_input = 0
self.total_yield = 0
self.total_reward = 0
self.cost_penalty = 0
self.yield_change = 0
# Run initial model
self.new_gl = self.model.run_model(gl_params=self.init_state, season_length=self.season_length,
season_interval=self.season_interval, step=self.current_step)
# Define action space and observation space
self.action_space = gym.spaces.Discrete(11)
self.observation_space = gym.spaces.Box(low=np.array([...]), high=np.array([...]), shape=(14,), dtype=np.float64)
self.episode_unit_energy_inputs = []
self.episode_total_yields = []
This method sets up the initial parameters for the environment, creates a GreenLightModel instance, and defines the action and observation spaces.
step
Executes one environment step, processes the given action, and returns the next state, reward, and other information.
def step(self, action):
self.gl = self.new_gl
is_daytime = self.gl["d"]["isDay"]
temperature_change = action + 18
if is_daytime == 1:
self.gl["p"]["tSpDay"] = temperature_change
else:
self.gl["p"]["tSpNight"] = temperature_change
self.new_gl = self.model.run_model(self.gl, self.season_length, self.season_interval, self.current_step)
self.current_step += 1
observation = self._get_observation()
reward = self._get_reward()
terminated, is_mature = self._is_done()
truncated = False
info = {}
return observation, reward, terminated, truncated, info
This method is the core of the reinforcement learning environment. It takes an action (temperature control), updates the environment state, and returns the new observation, reward, and termination signals.
_get_observation
Retrieves the observation of the current environment state.
def _get_observation(self):
day_of_year = self.new_first_day + self.current_step * self.season_interval
lampIn = 1e-6 * calculate_energy_consumption(self.new_gl, "qLampIn", "qIntLampIn")
boilIn = 1e-6 * calculate_energy_consumption(self.new_gl, "hBoilPipe", "hBoilGroPipe")
self.total_energy_input += lampIn + boilIn
dmc = 0.06
self.yield_change = 1e-6 * calculate_energy_consumption(self.new_gl, "mcFruitHar") / dmc
self.total_yield += self.yield_change
self.new_gl = extract_last_value_from_nested_dict(self.new_gl)
params = [
("p", "tSpNight"), ("p", "tSpDay"), ("p", "co2SpDay"),
("x", "co2Air"), ("x", "vpAir"), ("x", "tAir"), ("x", "cFruit"),
("a", "mcOrgAir"), ("a", "mcAirBuf"), ("d", "iGlob"), ("d", "tOut")
]
param_values = [self.new_gl[param[0]][param[1]] for param in params]
current_obs = np.array([day_of_year, *param_values, lampIn, boilIn])
return current_obs
This method calculates and returns an observation vector of the current environment state, including date, various environmental parameters, and energy consumption.
_get_reward
Calculates the reward value for the current state.
def _get_reward(self):
terminated, is_mature = self._is_done()
if not is_mature:
cFruit_growth = self.new_gl["x"]["cFruit"] - self.gl["x"]["cFruit"]
reward = cFruit_growth * 1e-3
self.growth_energy_input = self.total_energy_input
self.harvest_period_hours = self.season_length * 24 - self.current_step
else:
reward = self.yield_change * 1e3
harvest_passed_hours = self.current_step - (self.season_length * 24 - self.harvest_period_hours)
harvest_progress = harvest_passed_hours / self.harvest_period_hours
incremental_target_yield = self.target_yield * (0.1 + 0.9 * harvest_progress)
if self.total_yield >= incremental_target_yield:
reward *= 1.2
else:
reward *= 0.8
self.harvest_energy_input = self.total_energy_input - self.growth_energy_input
self.harvest_unit_energy_input = self.harvest_energy_input / self.total_yield
if self.harvest_unit_energy_input <= self.target_harvest_unit_energy_input:
reward *= 1.2
else:
reward *= 0.8
if terminated:
self.yield_unit_energy_input = self.total_energy_input / self.total_yield
energy_diff_pct = (self.target_yield_unit_energy_input - self.yield_unit_energy_input) / self.target_yield_unit_energy_input
yield_diff_pct = (self.total_yield - self.target_yield) / self.target_yield
reward += (energy_diff_pct + yield_diff_pct) * self.total_reward
self.episode_unit_energy_inputs.append(self.yield_unit_energy_input)
self.episode_total_yields.append(self.total_yield)
self.total_reward += reward
return reward
This method calculates the reward based on the current state. It considers factors such as crop growth stage, yield, and energy efficiency, and provides additional rewards or penalties at the end of training.
_is_done
Determines if the current episode has ended.
def _is_done(self):
terminated = self.current_step >= self.season_length * (1 / self.season_interval)
is_mature = self.new_gl["a"]["mcFruitHar"] > 0.01
return terminated, is_mature
This method checks if the season length has been reached or if the crop has matured.
reset
Resets the environment to its initial state.
def reset(self, *, seed=None, options=None):
self.current_step = 0
self.gl = self.init_state
self.new_gl = self.model.run_model(gl_params=self.init_state, season_length=self.season_length,
season_interval=self.season_interval, step=self.current_step)
# Reset various energy consumption and yield variables
self.total_energy_input = 0
self.growth_energy_input = 0
self.harvest_energy_input = 0
self.harvest_unit_energy_input = 0
self.yield_unit_energy_input = 0
self.total_yield = 0
self.total_reward = 0
self.cost_penalty = 0
info = {}
observation = self._get_observation()
return observation, info
This method is called at the beginning of each new episode. It resets all state variables and returns the initial observation.
render
Renders a visualization of the environment. In this implementation, it's an empty method.
def render(self):
pass
Core Functionality Analysis
-
Dynamic Environment Simulation:
- Uses
GreenLightModel
to simulate the physical processes of the greenhouse environment. - Supports different starting dates, increasing training diversity.
- Uses
-
Flexible Action Space:
- Uses a discrete action space to control temperature, ranging from 18°C to 28°C.
- Adjusts different temperature settings based on day/night status.
-
Complex Observation Space:
- Contains 14 continuous variables, covering various aspects of the greenhouse environment.
- Observations include time information, environmental parameters, and energy consumption data.
-
Multi-Objective Reward Function:
- Considers crop growth, yield, and energy efficiency.
- Uses different reward calculation methods for different growth stages (growth period and harvest period).
- Includes dynamic assessment of target yield and energy efficiency.
-
Environment Reset Mechanism:
- Randomly selects a starting date for each new episode.
- Resets all state variables, ensuring the independence of each training session.
-
Performance Tracking:
- Records unit energy consumption and total yield for each episode.
- Provides detailed performance metrics, helpful for evaluating and adjusting reinforcement learning algorithms.
Considerations
- Ensure the correct weather data file path is provided (EPW format).
- Initial state parameters (
init_state
) have a significant impact on simulation results and need to be carefully set. - The design of the reward function directly affects learning outcomes and may need to be adjusted based on specific requirements.
- The range settings of the observation space should be adjusted according to actual conditions to ensure coverage of all possible states.
- This environment does not implement the
render
method. If visualization is needed, additional development is required. - When using this environment for training, it is recommended to use reinforcement learning algorithms that support continuous action spaces.