Data Validation

Executive Summary: Data Validation

SolarAnywhere® is a software as a service product maintained by Clean Power Research that provides on-demand access to bankable solar data and intelligence. The software and data quality are mature in development, adoption and validation.

Irradiance data are generated using visible- and infrared-channel data captured by geosynchronous orbiting satellites. The dataset is spatially and temporally consistent, and geographically precise. SolarAnywhere offers more than two decades of record, operates independently of ground measurements and is available in near real time.

SolarAnywhere is widely used for independent and bankable solar resource assessment, operational monitoring and solar forecasting. SolarAnywhere global horizontal irradiance (GHI) data is shown to be accurate within +/- 4.5% on an annual basis with 95% confidence.

Table of Contents


This document provides up-to-date reference information and validation statistics for Clean Power Research’s SolarAnywhere Data historical irradiance data product. The document version 2019.04_3.3 was last updated in April 2019 for SolarAnywhere historical irradiance model version 3.3.

About Clean Power Research

Clean Power Research® has delivered award-winning cloud software solutions to utilities and industry for 20 years. Our PowerClerk®, WattPlan® and SolarAnywhere® product families allow our customers to make sense of and thrive amid the energy transformation. Clean Power Research has offices in Napa, CA, and Kirkland, WA. For more information, visit

About SolarAnywhere

SolarAnywhere irradiance data are generated using visible- and infrared-channel data captured by geosynchronous orbiting satellites. The satellite images are processed using the most advanced algorithms developed by Dr. Richard Perez at the University at Albany (SUNY). These algorithms extract cloud indices from the satellite’s visible and infrared data. A self-calibrating feedback process adjusts for arbitrary ground surfaces such as terrain and albedo. The cloud indices are used to modulate physically based radiative transfer models describing localized clear sky climatology.

The Perez model is applied in a pseudo-empirical fashion that is periodically calibrated with a select few ground stations. However, it operates largely independent of ongoing ground data input. This approach is unique to the industry and enables ground-to-satellite correlation studies to be truly based on two independently derived measurement sources.

SolarAnywhere irradiance data are generated in both global horizontal (GHI) and direct normal (DNI) irradiance components. The following geometric balancing equation is used to calculate diffuse horizontal irradiance (DHI):

DHI = GHI - DNI*cos(αzenith)

Clean Power Research has an exclusive relationship with Dr. Perez and SUNY to implement the latest satellite-to-solar irradiance methodology advances. More information on the extensive validation of the Perez model can be found in the references section.

In agreement with the U.S. Department of Energy through the National Renewable Energy Laboratory (NREL), Perez model-based satellite irradiance data comprised the 2005 (SUNY version 1) and 2010 (SolarAnywhere version 2.3) National Solar Radiation Database (NSRDB) releases. While the output format of SolarAnywhere satellite irradiance data is similar to NSRDB data, SolarAnywhere now provides more recent and more accurate datasets intended for commercial use.

The newest version of the SolarAnywhere model has been implemented operationally as Version 3.3. SolarAnywhere satellite irradiance data are available for specific sites on a 1 km x 1 km or 10 km x 10 km basis, and from 1998 to the present hour depending on geographic availability.

Validation Methodology

Data from select ground stations are used as a reference to calculate the error and uncertainty of the SolarAnywhere irradiance model. A high-quality reference dataset is required for the validation statistics to represent, to the maximum extent possible, model performance rather than ground data inaccuracies. Therefore, only the highest quality reference stations are used as validation sites, and the data are screened for data quality issues. The validation sites span a wide geographic area and a variety of terrain and climate types to assess the model’s performance in heterogeneous conditions. Most are part of the World Radiation Monitoring Center Baseline Surface Radiation Network (BSRN).

The SolarAnywhere model has two critical properties for the purposes of data validation and overall confidence in the model’s performance. First, the model operates largely independent of ongoing ground data input. In other words, the model is not trained with ground data. Second, SolarAnywhere utilizes a single historical model regardless of the time and location. Because of these properties, the validation statistics are an excellent representation of the model’s performance not just at the validation sites, but also for the model generally.

The GHI, DNI and Diffuse Horizontal Irradiance (DIF) data are compared at hourly, monthly and annual intervals using traditional error metrices such as Mean Bias Error (MBE), Mean Absolute Error (MAE) and Root Mean Square Error (RMSE).

The error metrics are defined by the following formulas:

  1. rMBE=\frac{\sum_{i=1}^N (x_i^{SA} - x_i^{obs})}{N}\frac{100\%}{\overline{x_l^{obs}}}
  2. rMAE=\frac{\sum_{i=1}^N | (x_i^{SA} - x_i^{obs}) | }{N}\frac{100\%}{\overline{x_l^{obs}}}
  3. rRMSE=\sqrt\frac{\sum_{i=1}^N (x_i^{SA} - x_i^{obs})^{2}}{N}\frac{100\%}{\overline{x_l^{obs}}}

An X represents the variable being considered (either GHI, DNI or DHI); N is the number of data points used; and the superscripts SA and obs stand for SolarAnywhere and ground observed data. The errors metrics are normalized by the mean of the ground observed data and denoted by rMBE, rMAE and rRMSE. Hoff et al. have previously discussed the applicability of various error metrics for solar in the paper, “Reporting of Irradiance Model Relative Errors.”

In addition to summary error statistics, standard deviations, standard errors and confidence intervals are presented to provide a complete picture of the accuracy and uncertainty of the SolarAnywhere data.


Validation results are organized first by period (long-term, annual, monthly and hourly) and second by geography. Each period is associated with a use case for the data. For example, long-term averages are appropriate for solar resources assessment, while hourly statistics are more relevant to real-time performance monitoring.

Long-Term Averages

The SolarAnywhere database includes more than twenty years of satellite-derived irradiance data. The long, consistent record is particularly useful for solar resource assessment. Long-term averages are derived from the full time series, and they may be summarized by a typical year file and used to project the energy production for the life of a solar plant. For these purposes, it’s useful to understand the uncertainty in the long-term solar resource data for a given site.

Twenty years of reference data (1998-2017) are considered for most Surface Radiation Budget Network (SURFRAD) stations. Previous validation studies of satellite irradiance models have considered approximately 5-15 years of data for each site. Studying two decades for each site has only recently become possible. The expanded period confirms the consistency of the model across many years and multiple generations of satellite hardware. In addition, twenty years is significant in that the term is equivalent to a typical solar power purchase agreement (PPA). The validation period used for other validation sites depends on data availability.

The GHI and DNI average mean bias error for each validation site is plotted on the maps below. Each point represents the full period for which validation data are considered. A bias close to zero means that the model is well calibrated; in other words, on average, the model does not over- or underpredict the solar resource.

SolarAnywhere Validation v3.3 Fig1-2

In the USA, the GHI average annual mean bias error for all 14 sites falls within +/- 3.0%. Boulder is known to be a challenging site for the satellite-to-solar model because of rapid cloud movement and nearby mountains. If Boulder is excluded, the long-term averages for the remaining 13 sites fall within +/- 2.0%.

In South America, the bias falls under +/- 3.5%. Higher errors are expected in South America for two reasons. First, the quality of the ground data is expected to be lower than the reference sites in North America, which are some of the most trusted anywhere. Second, historically, satellite images are available for the South America region every three hours compared to at least hourly in North America.

DNI is also considered for the North America validation sites. An additional explanation of data selection and data quality is presented in the appendix.


Annual insolation is used in a variety of applications including:

  • Resource assessment — compare to a year-long ground measurement campaign
  • Development and financing — assess interannual variability and the ability of a solar project to meet debt coverage requirements in a low insolation year
  • Operations and asset management — compare expected to actual production annually

For these or related purposes, annual mean bias error is useful for characterizing the performance of the model. The histograms below show the distribution of annual mean bias error for both GHI and DNI.

SolarAnywhere Validation v3.3 Fig3-4

For GHI, the data forms a tight distribution around neutral bias. The mean of all points is -0.4%, indicating that on average, the model does not materially over- or under-predict annual insolation. The standard deviation is 1.9%, meaning two thirds of the site-years fall within +/- 2% of the reference. Assuming a normal distribution and that each site-year of data is independent, the 95% confidence interval of GHI annual mean bias error is [-4.3%, +3.5%] (N = 180).

The distribution for DNI is also centered around neutral bias. The mean of all points is -0.5%. The wider distribution reflects the higher uncertainty for DNI, which is expected due to the introduction of the decomposition model. The 95% confidence interval of DNI annual mean bias error is [-8.8%, +7.7%] (N = 180).

All statistics make use of the 1 km2 spatial resolution model, which is standard for SolarAnywhere timeseries data (where available). Previously published validation of the lower resolution 10 km2 spatial resolution model, which is standard for SolarAnywhere typical year (TGY) data, has shown that the 95% confidence interval of GHI annual mean bias error is +/- 5%. The higher accuracy demonstrated here is attributed to model improvements and the increased spatial resolution. The higher resolution data better capture cloud formation in heterogeneous terrain and microclimates. Over complex and coastal areas, the 1 km SolarAnywhere data are more accurate than the 10 km SolarAnywhere data and should be strongly considered as the preferred bankable resource measurement.

As in the previous section, it’s also possible to look at the distribution of errors by validation site. The box plot below illustrates the interquartile range (exclusive median) of GHI annual mean bias error for each of the 14 validation sites in North America. Each dot represents one site-year of data. A line represents the median and an “x” represents the mean. In all, the same 180 site-years of data are shown.

SolarAnywhere Validation v3.3 Fig5

All sites have a similar error distribution and a mean within a few percent of neutral bias. The scatter of the data after grouping by site indicates that model performance depends on both location and weather. Because of this and the relatively few high-quality ground stations compared to the variety of terrain and climate regions, it’s not possible to extrapolate model performance at a given location to the associated region or climate region. Rather, the data indicates the model performs well at all locations.

Clean Power Research has performed hundreds of site-specific resource assessment and ground tuning studies that compare SolarAnywhere Data to privately owned, high-quality ground-based irradiance measurements. The results of these studies support the conclusion that the validation presented here fairly represents the accuracy and uncertainty of SolarAnywhere Data for the multitude of locations where solar is being considered.

Monthly and Hourly

Asset managers may use monthly data to understand PV performance in the context of recent weather. Plant performance is weather-normalized to isolate performance metrics from variability in monthly insolation, temperature, etc. Hourly data may be used to support real-time operations and maintenance (O&M). In addition, hourly error metrics are useful for understanding how the model performs in various weather conditions.

In general, as the averaging period is shortened (e.g., from annual to monthly, or monthly to hourly), errors increase due to the fundamental properties of averages. MAE and RMSE succinctly summarize the accuracy of hourly and monthly data. For these shorter periods, the scatter in the data dominates the bias, which is already characterized for annual and long-term averages.

A scatterplot of SolarAnywhere versus reference hourly global horizontal irradiance shows how the model performs for cloudy, partly cloudy and sunny conditions. Perfect correlation would result in a straight line from zero irradiance in the bottom left corner of the graph, to maximum irradiance in the top right.

SolarAnywhere Validation v3.3 Fig6

Data for the Desert Rock SURFRAD ground station, 2017, are shown as an example. The model has a correlation R2 of 0.97 around the optimal line. Inspection reveals that the model performs well for low through high irradiance conditions.


SolarAnywhere is the most trusted provider of bankable solar resource data to the solar industry. The satellite-derived dataset is consistent across the geographic coverage area and 20+ years of record. What’s more, the data are available on-demand.

The information presented validates the low uncertainty of SolarAnywhere Data and its use in solar resource assessment for photovoltaic (PV) project financing. In North America, annual GHI uncertainty of 1-km SolarAnywhere Data is less than 4.5%. Long-term site averages fall below +/- 3.0%. Monthly and hourly statistics demonstrate the ability of the model to capture shorter periods and the full range of possible weather for operational use cases.

SolarAnywhere ensures the highest data quality globally by using a single, versioned satellite-to-solar model everywhere. SolarAnywhere may be used as long-term solar reference independent of the need for regional or site-specific ground-tuning. This unique capability was clearly demonstrated when SolarAnywhere Data made it possible to detect an unreported irradiance sensor calibration issue at one of the nation’s most trusted reference stations: the SURFRAD station of Fort Peck, Montana.

The somewhat lower reported accuracy outside of North America is driven by differences in geosynchronous satellite coverage and the availability of high-quality ground data to use as a reference.

Clean Power Research continues to invest in SolarAnywhere to meet the needs of the solar industry and accelerate the clean energy transformation.

Need a printable version of this validation document?



Input Data Sources

SolarAnywhere uses a satellite-to-solar algorithm to estimate irradiance from geosynchronous satellite images. The raw input data for the irradiance data include:

Auxiliary data including aerosol optical depth, wind speed, ambient temperature, relative humidity, solid precipitation, liquid precipitation and snow depth are derived from various numerical weather models.

Reference Stations
Station Selection Criteria

Meaningful validation requires a high quality refence. To ensure the validation statistics are representative of the model, the validation stations are required to meet the following criteria:

  • A credible organization maintains responsibility for the installation
  • Metadata such as sensor type, location, etc. exists; sensors are secondary standard or better
  • The data are publicly available
  • The period of record is at least 5 years
  • The data generally pass standard quality control and the station has >70% availability

The following stations met these criteria and were used in validation of SolarAnywhere:

Quality Control

Data from each station are reviewed by a data analyst to ensure the data are not affected by the following common issues:

  • Soiling
  • Shading
  • Calibration drift

Data that fails to pass the quality control are excluded from the analysis. If the issues with the station are persistent, the station is excluded from the analysis entirely. Deviation from SolarAnywhere is not used as a reason to exclude data.

High quality DNI data are more difficult to obtain than GHI. In an analysis of the North America validation sites, the quality of the ground measurements was checked by comparing measured GHI to GHI calculated from measured direct and diffuse irradiance (DHI + DNI*cos(αzenith)). The values would be equal for perfect measurements. Ground measurements that are consistent within 20 W/m2 are considered a valid reference for the DNI statistics.


Version Control


SolarAnywhere uses version control to ensure results are reproducible from one user or context to another. Irradiance data released as “Archive,” denoted by prefix “A” in the irradiance observation type, will not change for that model version. New model versions are expected to have somewhat different irradiance data as newer, more accurate modeling techniques are applied to the historical record of satellite images.

Click here to see information on current and past versions of SolarAnywhere Data.


The following peer-reviewed articles describe the SUNY model underlying SolarAnywhere simulations:

  1. Perez R., P. Ineichen, K. Moore, M. Kmiecik, C. Chain, R. George and F. Vignola, (2002): A New Operational Satellite-to-Irradiance Model. Solar Energy 73, 5, pp. 307-317.
  2. Perez R., P. Ineichen, M. Kmiecik, K. Moore, R. George and D. Renné, (2004): Producing satellite-derived irradiances in complex arid terrain. Solar Energy 77, 4, 363-370.

The following articles describe the performance of the SolarAnywhere V3 model:

  1. Perez R., S. Kivalov, A. Zelenka, J. Schlemmer and K. Hemker Jr., (2010): Improving the Performance of Satellite-to-Irradiance Models using the Satellite’s Infrared Sensors. Proc., ASES Annual Conference, Phoenix, Arizona.
  2. Dise J., Kankiewicz, A., Schlemmer, J., Hemker, K., Kivalov, S., Hoff, T., Perez, R., (2013): Operational Improvements in the Performance of the SUNY Satellite-to-Solar Irradiance Model Using Satellite Infrared Channels. Proc., 39th Annual IEEE Photovoltaic Specialists Conference, Tampa, Florida.
  3. Perez, Richard & Schlemmer, James & Hemker, Karl & Kivalov, Sergey & Kankiewicz, Adam & Gueymard, Chris. (2015). Satellite-to-Irradiance Modeling – A New Version of the SUNY Model. 10.1109/PVSC.2015.7356212.

The following articles include validations of the SUNY/SolarAnywhere model in different environments:

  1. Wilcox, S., R. Perez, R. George, W. Marion, D. Meyers, D. Renné, A. DeGaetano, and C. Gueymard, (2005): Progress on an Updated National Solar Radiation Data Base for the United States. Proc. ISES World Congress, Orlando, FL
  2. Vignola F., and R. Perez (2005): Solar Resource Data base for the Pacific Northwest Using Satellite Data. Final Report to USDOE.
  3. Wilcox, S., M. Anderberg, R. George, W. Marion, D. Myers, D. Renné, W. Beckman, A. DeGaetano, C. Gueymard, R. Perez, N. Lott, P. Stackhouse and F. Vignola, (2006): Towards Production of an Updated National Solar Radiation Data base. Proc. ASES Annual Conference, Denver, CO
  4. Paul W. Stackhouse, Jr., Taiping Zhang, William S. Chandler, Charles H. Whitlock, James M. Hoell, David J. Westberg, Richard Perez, and Steve Wilcox, (2008): satellite Based Assessment of the NSRDB Site Irradiances and Time Series from NASA and SUNY/Albany Algorithms. Proc. ASES Annual Meeting, San Diego, CA.
  5. Perez R., J. Schlemmer, D. Renné, S. Cowlin, R. George and B. Bandyopadhyay, (2009): Validation of the SUNY Satellite Model in a Meteosat Environment. Proc. ASES Annual Conference, Buffalo, New York.

Reference station references:

  1. BSRN: Driemel, A., Augustine, J., Behrens, K., Colle, S., Cox, C., Cuevas-Agulló, E., Denn, F. M., Duprat, T., Fukuda, M., Grobe, H., Haeffelin, M., Hodges, G., Hyett, N., Ijima, O., Kallis, A., Knap, W., Kustov, V., Long, C. N., Longenecker, D., Lupi, A., Maturilli, M., Mimouni, M., Ntsangwane, L., Ogihara, H., Olano, X., Olefs, M., Omori, M., Passamani, L., Pereira, E. B., Schmithüsen, H., Schumacher, S., Sieger, R., Tamlyn, J., Vogt, R., Vuilleumier, L., Xia, X., Ohmura, A., and König-Langlo, G.: Baseline Surface Radiation Network (BSRN): structure and data description (1992–2017), Earth Syst. Sci. Data, 10, 1491-1501, doi:10.5194/essd-10-1491-2018, 2018.

Need a printable version of this validation document?