6.3. Example: Population Trends¶

We are now ready to work with large tables of data. The file below contains “Annual Estimates of the Resident Population by Single Year of Age and Sex for the United States.” Notice that read_table can read data directly from a URL.

# As of August 2021, this census file is online here: 
data = 'http://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2019/nc-est2019-agesex-res.csv'

# A local copy can be accessed here in case census.gov moves the file:
# data = path_data + 'nc-est2019-agesex-res.csv'

full_census_table = Table.read_table(data)
full_census_table

AGE	CENSUS2010POP	ESTIMATESBASE2010	POPESTIMATE2010	POPESTIMATE2011	POPESTIMATE2012	POPESTIMATE2013	POPESTIMATE2014	POPESTIMATE2015	POPESTIMATE2016	POPESTIMATE2017	POPESTIMATE2018	POPESTIMATE2019
0	3944153	3944160	3951430	3963092	3926570	3931258	3954787	3983981	3954773	3893990	3815343	3783052
1	3978070	3978090	3957730	3966225	3977549	3942698	3948891	3973133	4002903	3972711	3908830	3829599
2	4096929	4096939	4090621	3970654	3978925	3991740	3958711	3966321	3991349	4020045	3987032	3922044
3	4119040	4119051	4111688	4101644	3981531	3991017	4005928	3974351	3982984	4006946	4033038	3998665
4	4063170	4063186	4077346	4121488	4111490	3992502	4004032	4020292	3989750	3997280	4018719	4043323
5	4056858	4056872	4064521	4087054	4131049	4121876	4004576	4017589	4035033	4003452	4008443	4028281
6	4066381	4066412	4072904	4074531	4096631	4141126	4133372	4017388	4031568	4048018	4014057	4017227
7	4030579	4030594	4042990	4082821	4084175	4106756	4152666	4145872	4030888	4044139	4058370	4022319
8	4046486	4046497	4025501	4052773	4092559	4094513	4118349	4165033	4158848	4042924	4054236	4066194
9	4148353	4148369	4125312	4035319	4062726	4103052	4106068	4130887	4177895	4170813	4053179	4061874

... (296 rows omitted)

Only the first 10 rows of the table are displayed. Later we will see how to display the entire table; however, this is typically not useful with large tables.

A description of the table appears online.

The SEX column contains numeric codes: 0 stands for the total, 1 for male, and 2 for female. The assumptions underlying this binary categorization are questionable. We will discuss them in the next section.

In this section we will focus on age and population size.

The AGE column contains ages in completed years, but the special value 999 is a sum of the total population. The “age” 100 too has a special status. In that category, the Census Bureau includes all people aged 100 or more.

The other columns contain estimates of the US population in each category of sex and age in the years 2010 through 2019. The Census is decennial: it takes place every 10 years. The most recent Census was held in 2020 and the one before that in 2010. The Census Bureau also estimates population changes each year. As explained in the Bureau’s description of its methodology, it “adds [the estimated changes] to the last decennial census to produce updated population estimates every year.”

Typically, a publicly available table will contain more information than necessary for a particular investigation or analysis. To get the large table into a more usable form, we have to do some data cleaning.

Suppose we are only interested in the population changes from 2014 to 2019. Let’s select the relevant columns.

partial_census_table = full_census_table.select('SEX', 'AGE', 'POPESTIMATE2014', 'POPESTIMATE2019')
partial_census_table

AGE	POPESTIMATE2014	POPESTIMATE2019
0	3954787	3783052
1	3948891	3829599
2	3958711	3922044
3	4005928	3998665
4	4004032	4043323
5	4004576	4028281
6	4133372	4017227
7	4152666	4022319
8	4118349	4066194
9	4106068	4061874

... (296 rows omitted)

We can simplify the labels of the selected columns.

us_pop = partial_census_table.relabeled('POPESTIMATE2014', '2014').relabeled('POPESTIMATE2019', '2019')
us_pop

AGE	2014	2019
0	3954787	3783052
1	3948891	3829599
2	3958711	3922044
3	4005928	3998665
4	4004032	4043323
5	4004576	4028281
6	4133372	4017227
7	4152666	4022319
8	4118349	4066194
9	4106068	4061874

... (296 rows omitted)

6.3.1. Ages 97-100¶

As a warm-up, let’s examine the total population, labeled by SEX code 0. Since all these rows will have the same value 0 in the SEX column, we will drop that column.

us_pop_by_age = us_pop.where('SEX', are.equal_to(0)).drop('SEX')
us_pop_by_age

AGE	2014	2019
0	3954787	3783052
1	3948891	3829599
2	3958711	3922044
3	4005928	3998665
4	4004032	4043323
5	4004576	4028281
6	4133372	4017227
7	4152666	4022319
8	4118349	4066194
9	4106068	4061874

... (92 rows omitted)

Now let’s look at the population in the highest ages.

us_pop_by_age.where('AGE', are.between(97, 101))

AGE	2014	2019
97	82948	116969
98	59546	86150
99	41277	57124
100	70685	100322

Not surprisingly, the numbers of people are smaller at higher ages. For example, there are fewer 99-year-olds than 98-year-olds.

But the numbers for AGE 100 are quite a bit larger than those for age 99. That is because the row with AGE 100 doesn’t just represent 100-year-olds. It also includes those who are older than 100.

6.3.2. Percent Change¶

Each column of the table us_pop_by_age is an array of the same length, and so columns can be combined using arithmetic. The array below contains the change in population between 2014 and 2019. There is one entry corresponding to each row of us_pop_by_age.

change = us_pop_by_age.column('2019') - us_pop_by_age.column('2014')
change

array([-171735, -119292,  -36667, ...,   15847,   29637, 9938515])

We can augment us_pop_by_age with a column that contains these changes, both in absolute terms and as percents relative to the value in 2014.

us_pop_change = us_pop_by_age.with_columns(
    'Change', change,
    'Percent Change', change/us_pop_by_age.column('2014')
)
us_pop_change.set_format('Percent Change', PercentFormatter)

AGE	2014	2019	Change	Percent Change
0	3954787	3783052	-171735	-4.34%
1	3948891	3829599	-119292	-3.02%
2	3958711	3922044	-36667	-0.93%
3	4005928	3998665	-7263	-0.18%
4	4004032	4043323	39291	0.98%
5	4004576	4028281	23705	0.59%
6	4133372	4017227	-116145	-2.81%
7	4152666	4022319	-130347	-3.14%
8	4118349	4066194	-52155	-1.27%
9	4106068	4061874	-44194	-1.08%

... (92 rows omitted)

Almost all the entries displayed in the Percent Change column are negative, demonstrating a drop in population at the youngest ages. However, the overall population grew by about 9.9 million people, a percent change of just over 3%.

us_pop_change.where('AGE', are.equal_to(999))

AGE	2014	2019	Change	Percent Change
999	318301008	328239523	9938515	3.12%

Let us compare this to the change at each age. For ease of interpretation, we will sort the table in decreasing order of the absolute change in population, contained in the column Change.

us_pop_change.where(
    'AGE', are.below(999)
).sort('Change', descending=True)

AGE	2014	2019	Change	Percent Change
72	2191642	3191048	999406	45.60%
68	2567511	3345475	777964	30.30%
69	2530460	3252423	721963	28.53%
70	2461426	3136704	675278	27.43%
71	2516392	3083083	566691	22.52%
76	1692960	2222392	529432	31.27%
62	3677408	4156645	479237	13.03%
28	4345247	4818725	473478	10.90%
64	3481789	3950578	468789	13.46%
38	3848856	4305576	456720	11.87%

... (91 rows omitted)

Take a look at the top few rows. While the percent change is about 3% for the overall population, it jumps to well over 20% for the people in their late sixties and early seventies. This stunning change contributes to what is known as the greying of America.

What could explain this large increase? We can explore this question by examining the years in which the relevant groups were born.

Those who were in the age group 69 to 72 in 2014 were born in the years 1942 to 1945. The attack on Pearl Harbor was in late 1941, and by 1942 U.S. forces were heavily engaged in a massive war that ended in 1945.
Those who were 69 to 72 years old in 2019 were born in the years 1947 to 1950, at the height of the post-WWII baby boom in the United States.

The post-war jump in births is a major reason for the large changes that we have observed.

6.2. Selecting Rows

6.4. Example: Sex Ratios

Computational and Inferential Thinking

6.3. Example: Population Trends¶

6.3.1. Ages 97-100¶

6.3.2. Percent Change¶