Data
Label Data

Weather Prediction Model Label Data Processing

Introduction

This script is designed to extract and process data from EnergyPlus simulation results. It focuses on greenhouse environments, considering factors like air temperature, relative humidity, and wind speed. The script connects to a SQLite database, retrieves relevant data, and calculates additional metrics such as vapor density and vapor pressure. It also applies a window-opening schedule based on environmental conditions.

Data Processing

The script performs the following data processing steps:

  1. Executes the SQL query to retrieve environmental data and store it in a DataFrame.
  2. Reads the window schedule CSV file into a DataFrame.
  3. Defines two functions, calculate_window_opening_actual and rh2vaporDens, which calculate the actual window opening based on temperature and wind speed, and the vapor density from temperature and relative humidity, respectively.
  4. Applies these functions to the DataFrame to calculate new columns for vapor density, vapor pressure, and actual window opening schedule.

Configuration Variables

The script starts by defining the following configuration variables:

  • city_name: The name of the city where the greenhouse is located.
  • roof_type: The type of roof the greenhouse has. It affects the file path for database and schedule data.
  • time_interval: The time interval in minutes for the window opening schedule.
  • area: The area of the greenhouse.

These variables must be set according to the user's specific context.

city_name = "Amsterdam"
roof_type = "triangle"
time_interval = 5
area = 400

Database Connection

The script creates a connection to the EnergyPlus SQLite output database:

conn = sqlite3.connect(
    f"./data/greenhouse_geometry/model_files/greenhouse_{roof_type}/run/eplusout.sql"
)

The database file path depends on the roof_type variable.

SQL Query

A SQL query is executed to retrieve environmental data from the ReportVariableWithTime table. The query filters out design days and groups the results by Month, Day, Hour, and Minute.

 
sql = """
SELECT  
        '1995' || '-' || printf("%02d", Month) || '-' || printf("%02d", Day) || ' ' || printf("%02d", Hour) || ':' || printf("%02d", Minute) AS DateTime, 
        MAX(CASE WHEN Name = 'Zone Air Temperature' THEN Value ELSE NULL END) AS 'ZoneAirTemperature °C', 
        MAX(CASE WHEN Name = 'Zone Air Relative Humidity' THEN Value ELSE NULL END) AS 'ZoneAirRelativeHumidity %',
        MAX(CASE WHEN Name = 'Zone Outdoor Air Wind Speed' THEN Value ELSE NULL END) AS 'ZoneOutdoorAirWindSpeed m/s',
        MAX(CASE WHEN Name = 'Site Outdoor Air Drybulb Temperature' THEN Value ELSE NULL END) AS 'SiteOutdoorAirDrybulbTemperature °C',
        MAX(CASE WHEN Name = 'Zone Outdoor Air Drybulb Temperature' THEN Value ELSE NULL END) AS 'ZoneOutdoorAirDrybulbTemperature °C'
FROM ReportVariableWithTime
WHERE ((KeyValue = 'GREENHOUSE ZONE 2' AND (Name = 'Zone Air Temperature' OR Name = 'Zone Outdoor Air Drybulb Temperature' OR Name = 'Zone Air Relative Humidity' OR Name = 'Zone Outdoor Air Wind Speed')) 
OR (KeyValue = 'Environment' AND Name = 'Site Outdoor Air Drybulb Temperature'))
AND DayType IS NOT 'SummerDesignDay' 
AND DayType IS NOT 'WinterDesignDay'
GROUP BY Month, Day, Hour, Minute
ORDER BY Month, Day, Hour, Minute
"""
 

Functions

The script contains three custom functions:

  • calculate_window_opening_actual: Determines the actual window opening status based on environmental conditions.
def calculate_window_opening_actual(row):
    if (
        (row["ZoneAirTemperature °C"] > 40)
        or (row["ZoneAirTemperature °C"] < 0)
        or (row["ZoneOutdoorAirDrybulbTemperature °C"] > 40)
        or (row["ZoneOutdoorAirDrybulbTemperature °C"] < 0)
        or (row["ZoneAirTemperature °C"] - row["ZoneOutdoorAirDrybulbTemperature °C"])
        > 30
        or (row["ZoneOutdoorAirWindSpeed m/s"] > 20)
    ):
        return 0
    else:
        return row["window_opening_schedule"]
 
  • rh2vaporDens: Converts relative humidity and temperature to vapor density.
def rh2vaporDens(temp, rh):
    R = 8.3144598  # molar gas constant [J mol^{-1} K^{-1}]
    C2K = 273.15  # conversion from Celsius to Kelvin [K]
    Mw = 18.01528e-3  # molar mass of water [kg mol^-{1}]
    p = [610.78, 238.3, 17.2694, -6140.4, 273, 28.916]
    # Saturation vapor pressure of air in given temperature [Pa]
    satP = p[0] * np.exp(p[2] * temp / (temp + p[1]))
    pascals = (rh / 100) * satP  # Partial pressure of vapor in air [Pa]
    # convert to density using the ideal gas law
    vaporDens = pascals * Mw / (R * (temp + C2K))
    return vaporDens
 
  • vaporDens2pres: Converts vapor density and temperature to vapor pressure.
def vaporDens2pres(temp, vaporDens):
    p = [610.78, 238.3, 17.2694, -6140.4, 273, 28.916]
    rh = vaporDens / rh2vaporDens(temp, 100)  # relative humidity [0-1]
    # Saturation vapor pressure of air in given temperature [Pa]
    satP = p[0] * np.exp(p[2] * temp / (temp + p[1]))
    vaporPres = satP * rh
    return vaporPres

Each function is thoroughly commented and contains the necessary physical constants for the calculations.

Output Generation

After processing the data, the script saves the resulting DataFrame to a CSV file. The file path includes the configuration variables to reflect the specifics of the data it contains.

# Use the apply method and lambda function to create the Indoor Vapor Density(kg{H20} m⁻³) column
eplusout_data["Indoor Vapor Density(kg{H20} m⁻³)"] = eplusout_data.apply(
    lambda row: rh2vaporDens(
        row["ZoneAirTemperature °C"], row["ZoneAirRelativeHumidity %"]
    ),
    axis=1,
)
 
# Use the apply method and lambda function to create the Indoor Vapor Pressure(Pa) column
eplusout_data["Indoor Vapor Pressure(Pa)"] = eplusout_data.apply(
    lambda row: vaporDens2pres(
        row["ZoneAirTemperature °C"], row["Indoor Vapor Density(kg{H20} m⁻³)"]
    ),
    axis=1,
)
 
# Create a column for the window opening schedule
eplusout_data["window_opening_schedule"] = window_schedule["Window_Schedule"]
 
# Repeat the window schedule to match the length of the entire year's data
eplusout_data["window_opening_schedule"] = np.tile(
    window_schedule["Window_Schedule"].values,
    len(eplusout_data) // (24 * (60 // time_interval)),
)
 
# Use the apply method and lambda function to create the actual window opening column, determined by the window opening schedule and the difference in indoor and outdoor temperature
window_schedule["window_opening_actual"] = eplusout_data.apply(
    calculate_window_opening_actual, axis=1
)
 
# Repeat the window schedule to match the length of the entire year's data
repeated_window_schedule = np.tile(
    window_schedule["window_opening_actual"].values,
    len(eplusout_data) // (24 * (60 // time_interval)),
)
 
eplusout_data["window_opening_actual"] = repeated_window_schedule
 
# Save the EnergyPlus simulation run dataframe as a CSV file
os.makedirs("./data/energyPlus", exist_ok=True)
eplusout_data.to_csv(
    f"./data/energyPlus/eplusout_{city_name}_{roof_type}_{time_interval}_{area}.csv",
    index=False,
)
 

Cleanup

Finally, the script closes the database connection and prints out confirmation messages indicating the completion of the process and the location of the saved CSV file.

conn.close()

Complete Code

import json
import sqlite3
import pandas as pd
import os
import numpy as np
from datetime import datetime, timedelta
import os
 
# Define some constants
city_name = "Amsterdam"
roof_type = "triangle"
time_interval = 5
area = 400
 
# Connect to the SQLite database
conn = sqlite3.connect(
    f"./data/greenhouse_geometry/model_files/greenhouse_{roof_type}/run/eplusout.sql"
)
 
# SQL query to extract the required data from the database
sql = """
...
"""
 
# Execute the SQL query and store the result in a pandas DataFrame
eplusout_data = pd.read_sql_query(sql, conn)
 
# Load the window schedule data from a CSV file
window_schedule = pd.read_csv(
    f"./data/greenhouse_geometry/window_schedule/window_schedule_triangle_5.csv"
)
 
# Function to calculate the actual window opening based on various conditions
def calculate_window_opening_actual(row):
    if (
        (row["ZoneAirTemperature °C"] > 40)
        or (row["ZoneAirTemperature °C"] < 0)
        or (row["ZoneOutdoorAirDrybulbTemperature °C"] > 40)
        or (row["ZoneOutdoorAirDrybulbTemperature °C"] < 0)
        or (row["ZoneAirTemperature °C"] - row["ZoneOutdoorAirDrybulbTemperature °C"])
        > 30
        or (row["ZoneOutdoorAirWindSpeed m/s"] > 20)
    ):
        return 0
    else:
        return row["window_opening_schedule"]
 
# Function to convert relative humidity to vapor density
def rh2vaporDens(temp, rh):
    R = 8.3144598  # molar gas constant [J mol^{-1} K^{-1}]
    C2K = 273.15  # conversion from Celsius to Kelvin [K]
    Mw = 18.01528e-3  # molar mass of water [kg mol^-{1}]
    p = [610.78, 238.3, 17.2694, -6140.4, 273, 28.916]
    # Saturation vapor pressure of air in given temperature [Pa]
    satP = p[0] * np.exp(p[2] * temp / (temp + p[1]))
    pascals = (rh / 100) * satP  # Partial pressure of vapor in air [Pa]
    # convert to density using the ideal gas law
    vaporDens = pascals * Mw / (R * (temp + C2K))
    return vaporDens
 
# Function to convert vapor density to vapor pressure
def vaporDens2pres(temp, vaporDens):
    p = [610.78, 238.3, 17.2694, -6140.4, 273, 28.916]
    rh = vaporDens / rh2vaporDens(temp, 100)  # relative humidity [0-1]
    # Saturation vapor pressure of air in given temperature [Pa]
    satP = p[0] * np.exp(p[2] * temp / (temp + p[1]))
    vaporPres = satP * rh
    return vaporPres
 
 
# Add a new column to the DataFrame for Indoor Vapor Density
eplusout_data["Indoor Vapor Density(kg{H20} m⁻³)"] = eplusout_data.apply(
    lambda row: rh2vaporDens(
        row["ZoneAirTemperature °C"], row["ZoneAirRelativeHumidity %"]
    ),
    axis=1,
)
 
# Add a new column to the DataFrame for Indoor Vapor Pressure
eplusout_data["Indoor Vapor Pressure(Pa)"] = eplusout_data.apply(
    lambda row: vaporDens2pres(
        row["ZoneAirTemperature °C"], row["Indoor Vapor Density(kg{H20} m⁻³)"]
    ),
    axis=1,
)
 
# Add a new column to the DataFrame for window opening schedule
eplusout_data["window_opening_schedule"] = window_schedule["Window_Schedule"]
 
# Repeat the window schedule to match the length of the annual data
eplusout_data["window_opening_schedule"] = np.tile(
    window_schedule["Window_Schedule"].values,
    len(eplusout_data) // (24 * (60 // time_interval)),
)
 
# Add a new column to the DataFrame for actual window opening, based on the window schedule and the difference in indoor and outdoor temperatures
window_schedule["window_opening_actual"] = eplusout_data.apply(
    calculate_window_opening_actual, axis=1
)
 
# Repeat the actual window opening to match the length of the annual data
repeated_window_schedule = np.tile(
    window_schedule["window_opening_actual"].values,
    len(eplusout_data) // (24 * (60 // time_interval)),
)
 
eplusout_data["window_opening_actual"] = repeated_window_schedule
 
# Save the DataFrame to a CSV file
os.makedirs("./data/energyPlus", exist_ok=True)
eplusout_data.to_csv(
    f"./data/energyPlus/eplusout_{city_name}_{roof_type}_{time_interval}_{area}.csv",
    index=False,
)
print(
    f"./data/energyPlus/eplusout_{city_name}_{roof_type}_{time_interval}_{area}.csv saved!"
)
 
# Close the database connection
conn.close()
print(f"city_name is {city_name}")
print("EnergyPlus output extraction completed!")

This documentation provides an overview of the script's functionality and usage. For more detailed information, users should refer to the inline comments within the script itself.