Run full simulations in parallel

In this section, we will learn how to:

  • run full timeseries simulations in parallel (with multiprocessing) using the run_parallel_engine() function

Note: for a better understanding, it might help to read the previous tutorial section on running full timeseries simulations sequentially before going through the following

Imports and settings

[1]:
# Import external libraries
import os
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import pandas as pd
import warnings

# Settings
%matplotlib inline
np.set_printoptions(precision=3, linewidth=300)
warnings.filterwarnings('ignore')
# Paths
LOCAL_DIR = os.getcwd()
DATA_DIR = os.path.join(LOCAL_DIR, 'data')
filepath = os.path.join(DATA_DIR, 'test_df_inputs_MET_clearsky_tucson.csv')

Get timeseries inputs

[2]:
def export_data(fp):
    tz = 'US/Arizona'
    df = pd.read_csv(fp, index_col=0)
    df.index = pd.DatetimeIndex(df.index).tz_convert(tz)
    return df

df = export_data(filepath)
df_inputs = df.iloc[:48, :]
[3]:
# Plot the data
f, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(12, 3))
df_inputs[['dni', 'dhi']].plot(ax=ax1)
df_inputs[['solar_zenith', 'solar_azimuth']].plot(ax=ax2)
df_inputs[['surface_tilt', 'surface_azimuth']].plot(ax=ax3)
plt.show()
../_images/tutorials_Run_full_parallel_simulations_6_0.png
[4]:
# Use a fixed albedo
albedo = 0.2

Prepare PV array parameters

[5]:
pvarray_parameters = {
    'n_pvrows': 3,            # number of pv rows
    'pvrow_height': 1,        # height of pvrows (measured at center / torque tube)
    'pvrow_width': 1,         # width of pvrows
    'axis_azimuth': 0.,       # azimuth angle of rotation axis
    'gcr': 0.4,               # ground coverage ratio
    'rho_front_pvrow': 0.01,  # pv row front surface reflectivity
    'rho_back_pvrow': 0.03    # pv row back surface reflectivity
}

Run simulations in parallel with run_parallel_engine()

Running full mode timeseries simulations in parallel is done using the run_parallel_engine().
In the previous tutorial section on running timeseries simulations, we showed that a function needed to be passed in order to build a report out of the timeseries simulation.
For the parallel mode, it will not be very different but we will need to pass a class (or an object) instead. The reason is that python multiprocessing uses pickling to run different processes, but python functions cannot be pickled, so a class or an object with the necessary methods needs to be passed instead in order to build a report.

An example of a report building class is provided in the report.py module of the pvfactors package.

[6]:
# Choose the number of workers
n_processes = 3
[7]:
# import function to run simulations in parallel
from pvfactors.run import run_parallel_engine
# import the report building class for the simulation run
from pvfactors.report import ExampleReportBuilder

# run simulations in parallel mode
report = run_parallel_engine(ExampleReportBuilder, pvarray_parameters, df_inputs.index,
                            df_inputs.dni, df_inputs.dhi,
                            df_inputs.solar_zenith, df_inputs.solar_azimuth,
                            df_inputs.surface_tilt, df_inputs.surface_azimuth,
                            albedo, n_processes=n_processes)

# make a dataframe out of the report
df_report = pd.DataFrame(report, index=df_inputs.index)
df_report.iloc[6:11, :]
INFO:pvfactors.run:Parallel calculation elapsed time: 0.19188380241394043 sec
[7]:
qinc_front qinc_back iso_front iso_back
2019-01-01 07:00:00-07:00 NaN NaN NaN NaN
2019-01-01 08:00:00-07:00 117.632919 9.703464 5.070103 0.076232
2019-01-01 09:00:00-07:00 587.344197 4.906038 12.087407 2.150237
2019-01-01 10:00:00-07:00 685.115436 33.478098 17.516188 3.115967
2019-01-01 11:00:00-07:00 652.526254 52.534503 24.250780 1.697046
[8]:
f, ax = plt.subplots(1, 2, figsize=(10, 3))
df_report[['qinc_front', 'qinc_back']].plot(ax=ax[0])
df_report[['iso_front', 'iso_back']].plot(ax=ax[1])
plt.show()
../_images/tutorials_Run_full_parallel_simulations_15_0.png

The results above are consistent with running the simulations without parallel model (this is also tested in the package).

Building a report for parallel mode

For parallel simulations, a class (or object) that builds the report needs to be specified, otherwise nothing will be returned by the simulation.
Here is an example of a report building class that will return the total incident irradiance ('qinc') on the back surface of the rightmost PV row. A good way to get started building the reporting class is to use the example provided in the report.py module of the pvfactors package.
Another important action of the class is to merge the different reports resulting from the parallel simulations: since the users decide how the reports are built, the users are also responsible for specifying how to merge the reports after a parallel run.
The static method that builds the reports needs to be named build(report, pvarray).
And the static method that merges the reports needs to be named merge(reports).
[9]:
class NewReportBuilder(object):
    """A class is required to build reports when running calculations with
    multiprocessing because of python constraints"""

    @staticmethod
    def build(pvarray):
        # Return back side qinc of rightmost PV row
        return {'total_inc_back': pvarray.ts_pvrows[1].back.get_param_weighted('qinc').tolist()}

    @staticmethod
    def merge(reports):
        """Works for dictionary reports"""
        report = reports[0]
        # Merge other reports
        keys_report = list(reports[0].keys())
        for other_report in reports[1:]:
            for key in keys_report:
                report[key] += other_report[key]
        return report

[10]:
# run simulations in parallel mode using the new reporting class
new_report = run_parallel_engine(NewReportBuilder, pvarray_parameters, df_inputs.index,
                                df_inputs.dni, df_inputs.dhi,
                                df_inputs.solar_zenith, df_inputs.solar_azimuth,
                                df_inputs.surface_tilt, df_inputs.surface_azimuth,
                                albedo, n_processes=n_processes)

# make a dataframe out of the report
df_new_report = pd.DataFrame(new_report, index=df_inputs.index)
INFO:pvfactors.run:Parallel calculation elapsed time: 0.19736433029174805 sec
[11]:
f, ax = plt.subplots(figsize=(5, 3))
df_new_report.plot(ax=ax)
plt.show()
../_images/tutorials_Run_full_parallel_simulations_21_0.png

The plot above shows that we’re getting the same results we obtained in the previous tutorial section with the new report generating function.