General Questions

The WBG-LinkedIn DDD is the first attempt to harness LinkedIn’s data, which covers more than 100 countries, to support the analytical, advisory, and operational work of the World Bank Group.

The data is publicly accessible to all, but recommended for policy analysts, economists, and researchers.

You can access the data here or the "Download Data" link in the main navigation.

LinkedIn’s vision is to create economic opportunity for every member of the global workforce.  One of the ways LinkedIn operationalizes this vision is through its pro-bono Economic Graph initiative, which uses LinkedIn data to study the global economy and develop new insights for policymakers. World Bank Group’s mission to end extreme poverty and boost shared prosperity can help further advance these goals, especially in developing countries, where data may be scarce and expensive to collect.  In general, the World Bank Group is engaging key players—innovative thinkers, governments, the private sector, and global partners—to develop solutions that use emerging technologies and business models in ways that help the poor and advance development goals.

You can use this citation: "World Bank LinkedIn Digital Data for Development" by World Bank Group & LinkedIn Corporation, licensed under CC BY 3.0

To protect member privacy, this partnership and project imposes a minimum of 50 observations for each data cell to report a value.  See LinkedIn’s Privacy Policy. LinkedIn has also implemented GDPR (the EU’s General Data Protection Regulation) globally. 

The LinkedIn collaboration and Memorandum of Understanding (MoU) is strictly nonremunerated and the benefits for LinkedIn are mostly intangible, such as adding rigor to its analytics through collaborations with WB economists and demonstrating the value of LinkedIn data in helping address global development challenges.

This collaboration is strictly nonremunerated as stipulated in the MoU legal clauses.

LinkedIn has implemented GDPR (the EU’s General Data Protection Regulation) globally. LinkedIn also adheres to local laws in the 200+ countries and territories that it conducts business, including adhering to local regulations related to data privacy.  See LinkedIn’s Privacy Policy for details. In addition, to further protect member privacy, this collaboration imposes a minimum of 50 observations for each cell to report a value. No “personal data” is shared.

Data Questions

Our initial comparisons to government surveys show that WBG-LinkedIn metrics have strongest representation in knowledge-intensive and tradable sectors, including: Financial Services, Professional Services, Information & Communication Technology (ICT), the Arts & Creative Industries, Manufacturing, and Mining/Quarrying. Tech-savvy, business professionals, youth, and women are also more likely to be on LinkedIn than the average worker. Our assessments of representativeness are based on comparisons to the latest available data from ILOStat, a consolidated dataset of national labor surveys covering over 100 countries. In general, the metrics represent the world as seen through the lens of LinkedIn data, which is influenced by how LinkedIn members choose to use the platform. This can vary based on professional, social, and regional culture, as well as overall site availability and accessibility in individual countries. These variances cannot be fully accounted for in the analysis and is detailed in the methodology paper here.

The data only show countries that had at least 100,000 LinkedIn members at the end of 2017 to maximize confidence in our samples and data quality. The team will add more countries as they cross this threshold in the next data refresh.

To protect member privacy, the data only show results when at least 50 LinkedIn members meet the criteria being queried (i.e. need at least 50 observations to report a value in a data cell).

To the extent possible, the project team has conducted data quality checks for all the countries included in the dataset and validate the data against 23 external data sources before publishing (see methodology paper for details).  To strike a balance between making data available to as many developing countries as possible while keeping data quality in check, the project team imposes some thresholds in the data extraction rules with a forward-looking perspective: As adoption and usage of the LinkedIn platform keep increasing in developing countries, the quality and coverage of these metrics will keep improving over time.  If there are certain data points that are confusing, e.g. whether the “Information Technology and Service” industry in Lebanon is really growing fast in the past three years, please either cross-check with other data sources or at least use your “smell test” by logging onto www.linkedin.com, search the country “Lebanon” in the search box, then click on the “Location” tab, and then choose to show “companies”.  You will see a list of companies that are located within Lebanon with industry classifications.  Browse through the company names and descriptions under the “Information Technology and Service” industry in this country.  After seeing the company names and descriptions, you may gain more intuitions whether employment data extracted out of these company employment dynamics reflects reality on the ground.

Please refer to the methodology paper – it provides an overview of the methodologies, data representativeness, and limitations. 

The project team will update the data and online visuals on an annual basis. Note that each annual refresh of the data may impact previously-reported numbers, as new members join LinkedIn and their historical employment history is captured and aggregated for analysis.

The publicly available data only goes back to 2015 due to LinkedIn’s data retention constraints. 

Yes, though this is subject to end-user demand and feedback. The project team will monitor downloads and citations of this dataset to assess demand.

LinkedIn members self-report their skills on their LinkedIn profiles. Currently, there are more than 50,000 distinct, standardized skills (second row below) classified by LinkedIn. These have been coded and classified by taxonomists at LinkedIn into 249 skill groupings (first row below), which are the skill groups represented in the dataset. Descriptions of each of the 249 skill groups can be found in the appendix of our methodology paper.

Image removed.

 

The dataset uses LinkedIn’s own industry classifications. Maintaining LinkedIn’s industry classifications permits the data to be updated more easily. The project team has mapped LinkedIn’s industry categories to the ISIC Rev. 4 Standard Industry classification at the two-digit level. This mapping is available in the appendix section of the methodology paper. The industry a member belongs to is based on the industries declared by the companies in a member’s work history. Please note that in the industry employment shift visuals, since there are only a few sub-industries contained in the mining and quarrying sector, the project team combines manufacturing and mining & quarrying into one tab for visualization.

In general, LinkedIn membership and metric coverage and accuracy are correlated. This means that users should interpret data on, for example, the ICT sector in middle income countries with higher confidence than the Manufacturing sector in low income countries. These trade-offs are outlined in the public methodology paper, and detailed country-specific coverage statistics are available to World Bank project teams upon request.

This first phase of collaboration is focused on only data derived from LinkedIn member profiles. Extensions to other parts of the LinkedIn ecosystem such as job postings will be considered for future projects and depends on end-user demand and feedback.

The data is grouped by metric into separate excel files. Each metric excel contains background information as well as the dataset itself which is formatted to facilitate transferring into common statistical software data import formats (e.g. csv, tsv).

Data points are extracted from member profiles in a number of ways, depending on the desired information (e.g. industry affiliation from company of employment; skills from user-entered fields, if any; sex from user name). In this manner, the number of members reporting a skill in their profile may differ from number of members associated with a given industry, likewise the total number of members in a country may differ yet again.

Just like any other datasets, there are assumptions and criteria imposed in collecting and cleaning the LinkedIn data.  The major limitations associated with these datasets are: 1) Based on self-declared information, 2) Based on proprietary taxonomies that differ from international standard taxonomies (although we’ve done our best to create mappings and intend to share these additional resources publicly for researchers), 3) LinkedIn data is a by-product, not an end-product, based on users interacting on a digital platform, and hence its data representativeness largely correlates with LinkedIn user demographics and behaviors (note: this is also one of the key characteristic of social media “big data”).

Because of lower penetration rates of some sectors, the first phase of the World Bank Group-LinkedIn collaboration will share data only from the six knowledge-intensive and tradable sectors to ensure data quality and minimize risks of misinterpretation of the LinkedIn data due to small sample size; the remaining sectors left out (according to ISIC Rev. 4) are: L. Real estate activities; D. Electricity; gas, steam and air conditioning supply; N. Administrative and support service activities; P. Education; O. Public administration and defense; compulsory social security; S. Other service activities; Q. Human health and social work activities; H. Transportation and storage; G. Wholesale and retail trade; repair of motor vehicles and motorcycles; F. Construction; I. Accommodation and food service activities: A. Agriculture; forestry and fishing.

WBG Operational Team Questions

The metrics provided on the website are targeted at answering the following questions:

  • Industry employment growth - What are the most recent employment growth trends in my country or city, especially in knowledge-intensive and tradable sectors?
  • Industry skills needs - For the industries I am interested in, what are the latest skills requirements? Which skills are becoming more or less important over time? How are skills being applied across industries?
  • Talent migration - Which countries do I compete against for talent? What industries and skills are associated with these movements?

Yes. Examples include the South Africa Economic Update (September 2017), the Macedonia Systematic Country Diagnostic (2018), and the “Mid-Term Assessment of Beijing’s 13th 5-Year Plan: Note on Innovation and Local Economic Development” (2018). As these data and insights are more widely adopted, the project team encourages WBG project to share their experience in the form of blogs or case studies, hence encouraging “open analytics” from this data. 

A number of resources can be accessed on the Economic Graph site, including case studies from other LinkedIn projects with policymakers. For additional resources, such as how WBG staffs can access more granular LinkedIn data, please go to data.worldbank.org and search LinkedIn in the search box.

The LinkedIn data shows developing countries were most likely to lose frontier digital economy skills to High Income countries, like Cloud Computing, Artificial Intelligence, Software Testing, and Data-Driven Decision Making but are able to retain foundational skills like Research, Digital Literacy, and Business Management. Given the digital economy space is borderless and the physical location of employees is increasingly less relevant, regional programs that promote talent exchange and collaboration should be considered to partially offset the challenges associated with talent loss while continuing to invest in transversal foundational skills.