SemStats 2013 logo

Population census statistical results in RDF

Introduction

In connexion with the SemStats 2013 challenge, Insee publishes in RDF a selection of statistical results from the Population Census. These results concern population estimates as of January 1, 2010 by fine-grained geographic areas, age, sex and activity status.

Data modelization

Data are structured according to the Data Cube vocabulary, which relies on the SDMX data model. According to this standard, the data structure is defined by a DSD (Data Structure Definition) which describes in particular the data cube dimensions and the concept measured; various attributes can also be specified (unit of measure, observation status, additional informations, etc.).

In the present case, two data cubes are defined. The biggest one provides population estimates by municipality (as of January 1, 2012), sex, five-year age group and activity status. For the second cube, the municipality is replaced by the municipal district, which is only defined in the biggest cities (Paris, Lyon, Marseille). For both cubes, the measure corresponds to the population aged 15 and over as of January 1, 2010. Two attributes have also been added at the observation level, giving the department and region where the municipality or municipal disctrict is located.

The code lists used in the different cube dimensions are formalized as SKOS concept schemes.

Important note: the URIs used in the data and metadata are not definitive and should not be publicised outside of the SemStats challenge context. Those URIs are also not dereferenceable.

Access to the data

The complete data can be downloaded as a 7z archive (28 Mo) containing two Turtle files: one for the data set on municipalities (pop5-2010-com.ttl) and one for the data set on municipal districts (pop5-2010-arm.ttl). A second archive contains the data structure definition (pop5-dsd-2010.ttl) and the SKOS concept schemes corresponding to the municipalities (cs-com-2012.ttl), municipal districts (cs-arm-2012.ttl), departments (cs-dep-2012.ttl) and regions (cs-reg-2012.ttl), as well as the sex code (cs-sexe.ttl), the five-year age groups (cs-ageq65.ttl) and the activity status (cs-tactr.ttl).

Note: 7z is an open compression format supported by different archiving software packages, and in particular by the 7-zip open source software.

Methodological notes

The French population census

The census is based on a rolling method with a five year period. Every year, the survey constitutes a sample of 14% of the total French population. The census results are published each year, based on the five most recent annual surveys. Thus, the 2010 results use data from the surveys conducted from 2008 to 2012.

A general description of the census can be found on http://www.insee.fr/en/methodes/default.asp?page=sources/ope-rp.htm.

A more detailed article was published in 2004 in http://www.epsilon.insee.fr/jspui/handle/1/14396.

How to use the data?

The data sets are designed to allow users to group variable values according to their needs, or to obtain results on custom groupings of municipalities for the variables defining the non-geographic dimensions.

The population estimates are computed with sampling techniques and provided with six decimal places; these must be used for all calculations in order to avoid rounding errors.

The estimates greater than to 500 can normally be used with confidence. Figures below 200 must be treated with caution because, due to the imprecision related to the sampling, they may not be significant. Therefore, comparisons between areas of small size should be avoided.

Definitions

Age

Age is the time elapsed since the moment of birth. It can be coded using two definitions:

The results of the population census are now (since 2004) presented using the age in completed years. The results of the previous population censuses (1999 and before) were presented using the age difference in years.

Population

The population figures correspond to all persons whose usual residence is in the area under consideration. The population of this area includes:

Pupils and students of age in boarding schools, and military living in barracks while having a personal residence are now counted in the population of the communities of the municipality of their establishment. Before 2004 these people were attached to their family residence and therefore counted in the household population of the municipality of their family home.

Activity status

The activity status divides the population between active and inactive. Among the active are those who are employed (including those in traineeship or paid internship) and the unemployed. Among the inactive can be distinguished pupils, students and unpaid interns, retirees or early retirees, housewives or househusbands.

Note

The definition of unemployment in the census differs from those of the International Labour Office (ILO). Unemployment in the census is higher than the unemployment according to the ILO because inactive people sometimes tend to declare themselves unemployed when they do not meet all the criteria of the ILO.