Run full simulations in parallel

In this section, we will learn how to:

run full timeseries simulations in parallel (with multiprocessing) using the run_parallel_engine() function

Note: for a better understanding, it might help to read the previous tutorial section on running full timeseries simulations sequentially before going through the following

Imports and settings

[1]:

# Import external libraries
import os
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import pandas as pd
import warnings

# Settings
%matplotlib inline
np.set_printoptions(precision=3, linewidth=300)
warnings.filterwarnings('ignore')
# Paths
LOCAL_DIR = os.getcwd()
DATA_DIR = os.path.join(LOCAL_DIR, 'data')
filepath = os.path.join(DATA_DIR, 'test_df_inputs_MET_clearsky_tucson.csv')

Get timeseries inputs

[2]:

def export_data(fp):
    tz = 'US/Arizona'
    df = pd.read_csv(fp, index_col=0)
    df.index = pd.DatetimeIndex(df.index).tz_convert(tz)
    return df

df = export_data(filepath)
df_inputs = df.iloc[:48, :]

[3]:

# Plot the data
f, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(12, 3))
df_inputs[['dni', 'dhi']].plot(ax=ax1)
df_inputs[['solar_zenith', 'solar_azimuth']].plot(ax=ax2)
df_inputs[['surface_tilt', 'surface_azimuth']].plot(ax=ax3)
plt.show()

../_images/tutorials_Run_full_parallel_simulations_6_0.png

[4]:

# Use a fixed albedo
albedo = 0.2

Prepare PV array parameters

[5]:

pvarray_parameters = {
    'n_pvrows': 3,            # number of pv rows
    'pvrow_height': 1,        # height of pvrows (measured at center / torque tube)
    'pvrow_width': 1,         # width of pvrows
    'axis_azimuth': 0.,       # azimuth angle of rotation axis
    'gcr': 0.4,               # ground coverage ratio
    'rho_front_pvrow': 0.01,  # pv row front surface reflectivity
    'rho_back_pvrow': 0.03    # pv row back surface reflectivity
}

Run simulations in parallel with `run_parallel_engine()`

Running full mode timeseries simulations in parallel is done using the run_parallel_engine().
In the previous tutorial section on running timeseries simulations, we showed that a function needed to be passed in order to build a report out of the timeseries simulation.
For the parallel mode, it will not be very different but we will need to pass a class (or an object) instead. The reason is that python multiprocessing uses pickling to run different processes, but python functions cannot be pickled, so a class or an object with the necessary methods needs to be passed instead in order to build a report.

An example of a report building class is provided in the report.py module of the pvfactors package.

[6]:

# Choose the number of workers
n_processes = 3

[7]:

# import function to run simulations in parallel
from pvfactors.run import run_parallel_engine
# import the report building class for the simulation run
from pvfactors.report import ExampleReportBuilder

# run simulations in parallel mode
report = run_parallel_engine(ExampleReportBuilder, pvarray_parameters, df_inputs.index,
                            df_inputs.dni, df_inputs.dhi,
                            df_inputs.solar_zenith, df_inputs.solar_azimuth,
                            df_inputs.surface_tilt, df_inputs.surface_azimuth,
                            albedo, n_processes=n_processes)

# make a dataframe out of the report
df_report = pd.DataFrame(report, index=df_inputs.index)
df_report.iloc[6:11, :]

INFO:pvfactors.run:Parallel calculation elapsed time: 0.19188380241394043 sec

[7]:

	qinc_front	qinc_back	iso_front	iso_back
2019-01-01 07:00:00-07:00	NaN	NaN	NaN	NaN
2019-01-01 08:00:00-07:00	117.632919	9.703464	5.070103	0.076232
2019-01-01 09:00:00-07:00	587.344197	4.906038	12.087407	2.150237
2019-01-01 10:00:00-07:00	685.115436	33.478098	17.516188	3.115967
2019-01-01 11:00:00-07:00	652.526254	52.534503	24.250780	1.697046

[8]:

f, ax = plt.subplots(1, 2, figsize=(10, 3))
df_report[['qinc_front', 'qinc_back']].plot(ax=ax[0])
df_report[['iso_front', 'iso_back']].plot(ax=ax[1])
plt.show()

../_images/tutorials_Run_full_parallel_simulations_15_0.png

The results above are consistent with running the simulations without parallel model (this is also tested in the package).

Building a report for parallel mode

For parallel simulations, a class (or object) that builds the report needs to be specified, otherwise nothing will be returned by the simulation.
Here is an example of a report building class that will return the total incident irradiance ('qinc') on the back surface of the rightmost PV row. A good way to get started building the reporting class is to use the example provided in the report.py module of the pvfactors package.
Another important action of the class is to merge the different reports resulting from the parallel simulations: since the users decide how the reports are built, the users are also responsible for specifying how to merge the reports after a parallel run.

The static method that builds the reports needs to be named build(report, pvarray).

And the static method that merges the reports needs to be named merge(reports).

[9]:

class NewReportBuilder(object):
    """A class is required to build reports when running calculations with
    multiprocessing because of python constraints"""

    @staticmethod
    def build(pvarray):
        # Return back side qinc of rightmost PV row
        return {'total_inc_back': pvarray.ts_pvrows[1].back.get_param_weighted('qinc').tolist()}

    @staticmethod
    def merge(reports):
        """Works for dictionary reports"""
        report = reports[0]
        # Merge other reports
        keys_report = list(reports[0].keys())
        for other_report in reports[1:]:
            for key in keys_report:
                report[key] += other_report[key]
        return report

[10]:

# run simulations in parallel mode using the new reporting class
new_report = run_parallel_engine(NewReportBuilder, pvarray_parameters, df_inputs.index,
                                df_inputs.dni, df_inputs.dhi,
                                df_inputs.solar_zenith, df_inputs.solar_azimuth,
                                df_inputs.surface_tilt, df_inputs.surface_azimuth,
                                albedo, n_processes=n_processes)

# make a dataframe out of the report
df_new_report = pd.DataFrame(new_report, index=df_inputs.index)

INFO:pvfactors.run:Parallel calculation elapsed time: 0.19736433029174805 sec

[11]:

f, ax = plt.subplots(figsize=(5, 3))
df_new_report.plot(ax=ax)
plt.show()

../_images/tutorials_Run_full_parallel_simulations_21_0.png

The plot above shows that we’re getting the same results we obtained in the previous tutorial section with the new report generating function.

Run full simulations in parallel

Get timeseries inputs

Prepare PV array parameters

Run simulations in parallel with run_parallel_engine()

Building a report for parallel mode

Run simulations in parallel with `run_parallel_engine()`