ESS Principles on Open Source Software#

LAST UPDATE: May 22, 2023

Introduction#

Software is an indispensable asset in the production of official statistics. We need it in the design of our processes, in data collection, in data processing, during analysis and to disseminate statistical results. Without software there would not be modern official statistics. However software development and maintenance is costly. Fortunately statistical organisations share common needs and similar processes and official statistics is standards driven. Hence there is already a strong culture of international and open collaboration.

Re-use of software assets across organisations in the statistical process chain is profitable. Reducing duplicate efforts by co-investments increases efficiency. Moreover, sharing software as open source software (OSS) adds to the principle of transparency as contained in the Statistical code of practice and is in-line with EU and national open source strategies. A statistical open source community is most effective and innovative if it works from a common understanding across statistical organisations of the underlying drivers for open source. This is why there is a need to identify the core principles underlying open source in official statistics.

This document lists a number of principles on OSS. They are believed to be generally applicable to all organisations involved in the production chains of official statistics, also known in Europe as the ESS (European Statistical System). This includes National Statistical Institutes (NSIs) as well as International Organisations that provide official statistics. These principles are kept as generic as possible, meaning that they are technology-agnostic (applicable to all platform and programming language) and domain-agnostic (applicable to software in all statistical domains). Finally, they are formulated in general terms, meant as a starting point for the development of more concrete implementation details in the further development towards an effective and mature statistical open source community.

1. OSS by default#

Statement#

In the production of official statistics we prefer the use of open source software solutions over closed software solutions. Moreover we share our software solutions as open source.

Rationale#

This principle adds to core values in official statistics such as being transparent and independent in how we make statistics and strive for high quality and reproducibility. Using and sharing open source software increases the transparency on how we work, it avoids black boxes in the implementation of official statistics and it adds to the efficiency of the ESS as a whole.

Implications#

For the ESS this means that when implementing, redesigning or creating new processes, open source software solutions have preference. Only when no viable open source solutions exist it is possible to derogate from the default OSS option. The same holds sharing: sharing as open source is the default, but it is possible to derogate from this in justified cases. For NSIs this means that the methods used in the production of official statistics are not just described, the code used to actually apply the method is shared as OSS. For International Organisations this means openness on how international aggregates are computed via OSS solutions.

2. Work in the open#

Statement#

We start our projects in the open from the beginning and clearly mark maturity status.

Rationale#

Many projects have the intention to publish results as open source but have difficulties deciding on the best moment to do this. It might feel uncomfortable to put early ideas and rough implementation sketches on-line, but on the other hand sharing it too late prevents others from providing valuable comments and ideas or volunteer to work together on the project. To circumvent this dilemma we start working in the open right from the beginning wherever possible and clearly mark and update our projects development phase over time.

Implications#

For the ESS this means that it is recommended and accepted to start developments projects in the public domain. We clearly show development status, which may vary from pre-alpha to stable and proven by showing a public roadmap, public source code repository, a public backlog of features, issues, bugs etc.

3. Improve and give back#

Statement#

We rather improve existing open source solutions than decide to create new solutions and we give our improvements back to the respective open source community.

Rationale#

There are cases where existing open source solutions do not exactly cover the functionality needed in official statistics. The quickest way to cope with this is to copy a solution, adapt it and use it. However improvements implemented in the original solution will not be merged into the copy and our improvements will not be visible in a wider context. Therefore we strive to give back our improvements to the open source community as change requests or suggestions even if it takes additional resources to do that. In the end this is an investment in the effectiveness and efficiency of the ESS as a whole.

Implications#

For the ESS this means that statistical organisations actively search for solutions that can be re-used instead of creating new solutions. Even if a solution does not exactly fits the needed functionality, it is examined how it could be improved, keeping the intended functionality in mind or even widening it. This also holds for partial solutions such as code snippets and (machine learning) models that could be valuable for others. The changes or extensions are tested, documented, and given back to the respective community to decide on possible integration into their solution.

4. Think generic statistical building blocks#

Statement#

In our open source work we strive for re-usable generic functional building blocks that support well-defined methodology in statistical processes.

Rationale#

Publishing source code as open source is not enough for effective re-use in the ESS. It is necessary to think about the design of what is to be shared and identify generic statistical building blocks, that can be used in different contexts. Therefore we design the software from the intended user point of view and in a way that it can be re-used in multiple statistical domains or organisations as possible. This helps maintaining complex statistical processes and guaranteeing high-quality official statistics.

Implications#

For the ESS this means that monolithic applications are componentised as much as possible in generic configurable statistical building blocks. We put statistical functionality in code and make statistical expert knowledge configurable. We make these components as much as possible generic in time, across statistical domains and across statistical organisations. For NSIs this means that not just “their” statistical production process should be kept in mind when developing tools but also the possible wider applicability. International Organisations should actively encourage the development and sharing of generic OSS solutions within their domain of expertise.

5. Test, package and document#

Statement#

We test, package and document our open source software for easy re-use.

Rationale#

Re-using generic statistical software in the ESS is not always easy due to differences in statistical processes, technological environments, and way of working. Testing our software on functionality and security and packaging our software with good documentation is of utter importance as it improves the chances on re-use. General purpose package management systems offer versioning and documentation facilities to exchange generic statistical software in the ESS. The use of such packaging systems helps maintaining complex statistical processes and guaranteeing high-quality official statistics.

Implications#

For the ESS this means that we invest in testing, security scans, packaging and documentation to enable re-use. Security patches are applied as soon as possible. Documentation is designed from the viewpoint of a statistical user, keeping it concise, understandable but also complete and covering at least the basic functionality and a complete API reference. Packaging is a key success criterion for open source projects. Larger projects should adopt modern approaches such as containerisation, automate as much as possible, and smaller projects can follow these practices. Every package is downloadable without registration, is installable with minimal efforts and has a minimal viable example that can be executed. Dependencies are managed and as much as possible minimised. Versioning is implemented according to the principles of the respective package exchange platform with a preference for semantic versioning. Security patches are implemented with priority. For NSIs this means that published OSS software is maintained and updated according to the policies of the relevant platforms, e.g. CRAN. International Organisations should play an active role in knowledge exchange on test, package and documentation policies in their domain of expertise.

6. Choose permissive#

Statement#

We choose the most permissive OS license possible for sharing our software.

Rationale#

Re-using software in official statistics is in the common interest of the whole ESS. Re-use of our software outside the ESS is also of added value since the larger user community will add to software quality. To maximise re-use by others it is necessary to choose an OS license that maximally allows re-use, and minimises conflicts with other licenses. This is known as “permissive” (see for example here). When choosing the appropriate OS license we strive for maximum re-use.

Implications#

For the ESS this means that when sharing software we opt for a permissive license (e.g. Apache 2.0/MIT) over a “Copyleft” license, taking into legal, organisational and societal considerations. Mandatory acknowledgement / attribution of sources and authors is a viable additional option.

7. Promote#

Statement#

We invest in promoting new developments or improvements on our open source software within the ESS community and where applicable in a wider context.

Rationale#

Re-use of generic software is not going to happen if no one knows what can be re-used. On the other hand it is difficult to know beforehand what the value of our software is for others. The only way out is to communicate, even if we have no clue if it’s usable in a wider context. We advertise our software in an honest, brief way, mentioning its core functionality. Let the public know our plans for new developments and improvements and be open for suggestions for improvements.

Implications#

For the ESS this means working together on communication facilities targeted at the open source ESS community. A community-driven approach of sharing knowledge, possible OS building blocks and its application in the statistical production should be the preference instead of centrally maintained repositories. A centrally maintained repository of software tools can get outdated soon and collecting information from the community could be a big effort. Therefore, such a repository should be maintained by the whole community. For NSIs this means that they are active participants in the OSS community by participating in events, joining relevant only forums, etc. International Organisations should play an active role in the organisation of the statistical OSS community in their domain of expertise.

References#