As part of a personal research project, I need to create a simulated group of web developers, each one with a random unique skill set constrained by the actual probability of having those skills. To calculate this probability, one of the pieces of information needed was the ‘shape’ of the developers or how many separate technologies they knew. While I was looking for usable data, I figured I could simply extract this information from the Stackoverflow annual survey. By the way, thanks to Stackoverflow for this open data.
In the 2021 survey, I only used the answers from the 58,153 participants who identified as “I am a developer by profession”. Using the survey questionnaire, I created a classification for front-end developers and 6 classifications for back-end developers; Javascript, Java, C#, Python, PHP, and Ruby. With this, the shape of the web developer could be described by any combination between the front-end skill and the 6 back-end skills. All of this was done using Python. Here are the results.
The following Figure 1 shows the distribution of web and non-web developers.
Figure 1
The distribution of Back End developers across the 6 stacks is described in the following Figure 2.
Figure 2
To generate the shape of the simulated developers' population I used the information shown in Figure 1 and Figure 2 but I also needed to know two extra pieces of information. The first one was the number of back-end stacks mastered by each developer, shown here in Figure 3.
Figure 3
I have to admit that the Figure 3 results were a surprise for me. I was sure the peak would be at 2. I couldn’t be more wrong. ) The second and final information I needed to calculate was, for the developers who are mastering 2 or more back-end stacks, the proportion of each possible stack pair, shown here in Figure 4.
Figure 4
I can now use this data as probability distributions to generate a pool of fake Web developers with a combination of skills that respect the Stackoverflow 2021 dataset.
- The results of the Stackoverflow 2021 survey can be found here
- On github you will find my hacky python code (devrange) and the results of this work in a handy JSON format
No comments:
Post a Comment