Rework income process using IndexDistribution #1024

Mv77 · 2021-06-23T02:35:14Z

@sbenthall recently introduced IndexDistribution to deal (among other things) with distributions that time-varying indices.

Seb, this is a first attempt at using that class in the income-process constructor. The current code works and I quite like the change, I'd like your feedback.

The big issue is that, as we discussed before, we need to think a bit more about what is the appropriate thing to do about RNG seeds with this change. I've not taken care of it and that's why tests are failing.

Tests for new functionality/models or Tests to reproduce the bug-fix in code.
Updated documentation of features that add new functionality.
Update CHANGELOG.md with major/minor changes.

Mv77 · 2021-06-23T02:37:12Z

TODOS:

Handle seeds appropriately.
Replace the code that draws from the income distribution to make use of the new functionality.
Docstrings and linting.

HARK/ConsumptionSaving/ConsIndShockModel.py

sbenthall · 2021-06-23T12:59:56Z

Yes, thank you, I like this.

There is an additional failing test:

    def setUp(self):
        # Make one agent type and an economy for it to live in
        self.agent = AggShockMarkovConsumerType()
        self.agent.cycles = 0
        self.agent.AgentCount = 1000  # Low number of simulated agents
>       self.agent.IncShkDstn[0] = 2 * [self.agent.IncShkDstn[0]]  ## see #557
E       TypeError: 'IndexDistribution' object does not support item assignment

I didn't write a __setitem__ method for the IndexDistribution class. One could be easily added.

https://www.geeksforgeeks.org/__getitem__-and-__setitem__-in-python/

The seed logic in HARK is quite complex and I'm not sure I fully understand all the motivations for it.

# Conflicts: # HARK/ConsumptionSaving/ConsIndShockModel.py # HARK/core.py

Mv77 · 2022-01-17T20:06:42Z

I am reviving this and will work on it during the next few weeks. It is part of a broad goal of making the income-process construction code more flexible. However, the next step is to update the current code using newer tools that have been added, like IndexDistribution and calc_expectations (both of which I think are awesome).

A big issue with this kind of update is the RNG. For reasons yet to be determined (maybe the order in which shocks are drawn or some seed that needs to be passed) using IndexDistribution changes the shocks that are drawn by tests. Being the income process that pretty much every AgentType in HARK inherits, this change makes almost all of the simulation tests fail.

Before I dive into it, I want to summon the RNG maestros to ask if they have any idea or opinion on how this issue should be resolved. Do I edit the tests to check for the new values? Do I tweak the new implementation until matches the current tests? Do I look for a happy middle?

Have the team's views on simulation tests and how they should be done changed?

sbenthall · 2022-01-17T21:00:03Z

Hooray. Glad you're looking into this.

I don't think this addresses the whole issue, but one way to tackle the problem is to rewrite tests so that rather than targeting a specific numerical value, they tests a functional relationship:

Here's an example:
https://github.com/econ-ark/HARK/blob/master/HARK/ConsumptionSaving/tests/test_ConsPortfolioFrameModel.py#L81-L84

This is actually a much better test, since it tests the functionality of the code, and not some accidental combination of the code and a random seed.

Mv77 · 2022-01-17T21:25:09Z

That test is evaluating a property of the solution. I think we have a fair bit of those.

I still see value in testing simulation results. Imagine for instance that some indexing error in SimOnePeriod made us draw the wrong shocks. Or that, as it happened 2 years ago, some handling of a seed makes agents always draw the same shock. Tests to catch that are great. So I am happy that we have both kinds, but I wonder if in your discussions there has been any development in how we think we should carry out simulation tests going forward.

Mv77 · 2022-01-17T21:40:43Z

Something reassuring is that in the current state of the PR, tests that deal with the solution itself (not the simulation history) pass, so we know the current machinery is leaving the distributions that the agents get in the solution process unchanged. We know the miss-matches are generated when we draw from the IndexDistributions as opposed to the lists of discrete distributions currently in master.

sbenthall · 2022-01-17T21:58:52Z

These are great points.

This is an example of a test of simulation functionality that does not depend on the random seed:
https://github.com/econ-ark/HARK/blob/master/HARK/ConsumptionSaving/tests/test_ConsPortfolioFrameModel.py#L69-L74

I am not aware of any new consensus for doing simulation tests.
But automated testing wisdom would say that it's always better to test for a specific unit of functionality than to test for a general 'correct answer' from a complicated system. The current way of testing simulations -- just testing for some predetermined output -- is a lazy way of writing tests. I wrote most of them, so I know how lazy they are. This problem with the random seeds exposes the cost of that laziness.

sbenthall · 2022-01-17T22:17:51Z

Is it possible that there is an error in IndexDistribution though?
If so, a new unit test for IndexDistribution might catch it.

Mv77 · 2022-01-17T22:32:02Z

Is it possible that there is an error in IndexDistribution though? If so, a new unit test for IndexDistribution might catch it.

I am 98% sure that:

There is no bug in terms of representing different distributions from the ones we want to represent. If that were the case we would get errors in tests that deal with model solutions.
It does seem to change the order or seeds with which shocks are drawn, resulting in different simulation histories even if the distributions we are drawing from are the same as before. That's why we are getting failures in the simulation tests.

Mv77 · 2022-01-18T00:43:48Z

@sbenthall I have a clue of what might be happening.

If you run

from HARK.distribution import IndexDistribution, Lognormal, MeanOneLogNormal
import numpy as np

n_app = 5
sigmas = [0.1,0.2]

RNG = np.random.RandomState(1)

a = []
for s in sigmas:
    a.append(MeanOneLogNormal(s, seed = RNG.randint(0, 2 ** 31 - 1)))

a_draws = []
for d in a:
    a_draws.append(d.draw(1))

print(a_draws)

b = IndexDistribution(engine = MeanOneLogNormal, conditional  = {'sigma': sigmas}, seed = 1)

b_draws = []
for d in b:
    b_draws.append(d.draw(1))

print(b_draws)

You will find that a_draws and b_draws have the exact same shock draws.

Notice:

Approach "a" is creating a random-number generator with seed 1, then using it to draw the seeds of the time-specific distributions.
Approach "b" creates an index distribution with seed=1 and then draws from the time-specific distributions.

The key of why these two approaches give the same result is that IndexDistribution creates its own internal random-number generator with the given seed

HARK/HARK/distribution.py

Line 67 in f338c97

super().__init__(seed)

HARK/HARK/distribution.py

Lines 19 to 20 in f338c97

    
           def __init__(self, seed=0): 
        
               self.RNG = np.random.RandomState(seed)

and then uses that random-number generator to draw the seeds of the time-specific distributions

HARK/HARK/distribution.py

Lines 72 to 78 in f338c97

    
           def __getitem__(self, y): 
        
               # test one item to determine case handling 
        
               item0 = list(self.conditional.values())[0] 
        
               if type(item0) is list: 
        
                   cond = {key: val[y] for (key, val) in self.conditional.items()} 
        
                   return self.engine(seed=self.RNG.randint(0, 2 ** 31 - 1), **cond)

That is why the two previous approaches yield the same shocks.

However, notice that representing the time-varying income distribution with an IndexDistribution generates and extra RNG. There will be t+1 random-number generators:

A "master" one, that will draw the seed of the time-specific distributions.
t time-specific ones.

The difference is that under the current approach the seeds for the time-specific distributions are drawn directly from the agent's RNG

HARK/HARK/ConsumptionSaving/ConsIndShockModel.py

Lines 2737 to 2742 in f338c97

    
           IncShkDstn.append( 
        
               combine_indep_dstns( 
        
                   PermShkDstn_t, 
        
                   TranShkDstn_t, 
        
                   seed=self.RNG.randint(0, 2 ** 31 - 1), 
        
               )

Thus, without IndexDstn we have
Agent's RNG --(draws seeds for)--> Time-specific RNGs
and with IndexDstn we have
Agent's RNG --(draws seed for)--> IndexDstn's RNG --(draws seeds for)--> Time-specific RNGs

The extra step must make simulations not exactly match (but it might not be the only thing).

Mv77 · 2022-01-18T00:49:02Z

A solution might be to let IndexDstn (optionally) receive an RNG from which to draw time-specific seeds, instead of creating its own.

Mv77 · 2022-01-18T04:00:49Z

@sbenthall, I'd like to ask why you think IndexDistribution must inherit from the Distribution class.

I think the above issue is being generated by the fact that the current implementation thinks of the IndexDistribution as a random variable in itself. I wonder what you would think of a re-interpretation in which IndexDistribution is not itself a distribution but a way to organize time-varying distributions.

Mv77 · 2022-01-19T04:04:40Z

A solution might be to let IndexDstn (optionally) receive an RNG from which to draw time-specific seeds, instead of creating its own.

@sbenthall Just confirmed this (mostly) works.

Passes all of IndShock's tests except Harmenberg (it fails most with the current implementation).

# Conflicts: # HARK/distribution.py

codecov-commenter · 2022-01-24T19:26:45Z

Codecov Report

Merging #1024 (7a479af) into master (84ee03e) will increase coverage by 0.06%.
The diff coverage is 97.56%.

@@            Coverage Diff             @@
##           master    #1024      +/-   ##
==========================================
+ Coverage   73.66%   73.73%   +0.06%     
==========================================
  Files          69       69              
  Lines       10568    10577       +9     
==========================================
+ Hits         7785     7799      +14     
+ Misses       2783     2778       -5

Impacted Files	Coverage Δ
HARK/ConsumptionSaving/ConsIndShockModel.py	`86.29% <97.05%> (+0.43%)`	⬆️
.../ConsumptionSaving/tests/test_ConsAggShockModel.py	`98.23% <100.00%> (ø)`
.../ConsumptionSaving/tests/test_ConsRepAgentModel.py	`100.00% <100.00%> (ø)`
...nsumptionSaving/tests/test_IndShockConsumerType.py	`73.07% <100.00%> (+0.38%)`	⬆️
HARK/core.py	`89.77% <100.00%> (ø)`
HARK/distribution.py	`83.33% <0.00%> (+0.51%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 84ee03e...7a479af. Read the comment docs.

Mv77 · 2022-01-24T20:43:33Z

This PR finally passes tests! I managed to resolve all the simulation discrepancies.

It does kick the can of simulation tests down the road, but there is already an issue for that #1105 and I feel getting to an unified solution and a single fix in a different PR would be better. It was also satisfying to get it to match: I now know there is no bug in IndexDstn.

The actual change that is introduced by this pr is not that transcendental: it simply uses IndexDstn for the income process. However, this innovation makes it much easier to flexibilize the income process with things like time-varying unemployment, which is not currently supported (and I will need and implement soon).

I am re-requesting review.

sbenthall · 2022-01-26T21:23:10Z

This is looking very good.

There's one change I'd suggest.

According to its documentation, the engine parameter of the IndexDistribution should be of type Distribution. You have made standalone methods (e.g. IncShk_engine) for your distribution engines. (I believe I've done this in my own code elsewhere).

I wonder whether it would make sense to make these engine methods into model-specific subclasses of Distribution. Basically, you would just be putting the substance of your engine methods into the __init__ method of the class.

Mv77 · 2022-01-26T21:57:30Z

Thanks Seb,

Just to verify that I follow, you'd like me to move my ***Shk_engine methods to the inside of the Agent type? so that they are called by self.***Shk_engine?

I think that'd make sense, since these are specifically tailored for the given model.

sbenthall · 2022-01-26T23:31:49Z

No, that's not what I mean.

I mean something more like:

class IncomeShockDistribution(Distribution):

    def __init__(self,
        sigma_Perm,
        sigma_Tran,
        n_approx_Perm,
        n_approx_Tran,
        UnempPrb,
        IncUnemp,
        seed=0,):
        ...

IncShkDstn = IndexDistribution(
    engine=IncomeShockDistribution,
    conditional={
        "sigma_Perm": PermShkStd,
        "sigma_Tran": TranShkStd,
        "n_approx_Perm": PermShkCount_list,
        "n_approx_Tran": TranShkCount_list,
        "UnempPrb": UnempPrb_list,
        "IncUnemp": IncUnemp_list,
    },
    RNG=self.RNG,
)

sbenthall · 2022-01-27T13:38:07Z

To elaborate on this @Mv77 ...

The single-period income distributions are used by several models (in the current HARK architecture, via inheritance). But where I think we should move to is that these distributions should be define in model configuration, not hard coded into model code. See issue #620

In the FramedAgent code, this is precisely what happens: the PermShk frame takes, as its 'transition function', a Distribution.

https://github.com/econ-ark/HARK/blob/master/HARK/ConsumptionSaving/ConsPortfolioFrameModel.py#L193-L206

That is is then used by the simulation code to know that that variable must be sampled as a shock.

https://github.com/econ-ark/HARK/blob/master/HARK/frame.py#L257-L265

At the very least, if you defined the IncShk engine in terms of classes that are subclasses of Distribution (or, DiscreteDistribution, which may be more appropriate), then I could use that same distribution definition in the PortfolioFrameModel. That's an indication of how this would support code reuse.

Mv77 · 2022-01-27T14:19:09Z

Thanks Seb.

I initially thought of this as type puritanism!

But the more I think of it, the more it makes sense. It pushes us in the direction of income distributions being their own object, modular entities easily switched around. I am fully onboard; will revise over the weekend.

Mv77 · 2022-01-27T18:47:03Z

Not that type puritanism is bad; it is just one of those things that have a huge social return but are annoying to enforce in one's own life.

…umbing/income_process # Conflicts: # HARK/ConsumptionSaving/ConsIndShockModel.py

Mv77 · 2022-01-30T15:41:56Z

@sbenthall

Seb, I've now made engines their own class. Tests still pass. This is ready for your review again.

Thank you!

llorracc · 2022-01-30T15:53:30Z

A small suggestion: Many different choices could be made for the IncomeShockDistribution. The choices here are highly specific: Transitory and Permanent shocks, with a transitory unemployment spell. As long as we are inventing a new name anyway, let's call it something more specific, like FriedmanBufferStockShockDistribution or TranPermWithIIDUnempShockDistribution or BufferStockTheoryShockDistribution. (Or perhaps substituting Shk For Shock in accord with guidelines in the NARK. And maybe Dstn for Distribution, to prevent the name of the object from being too long.

Mv77 · 2022-01-30T16:08:28Z

Agreed. I've now gone with

Permanent income shocks: LognormPermIncShk.
Transitory income shocks: MixtureTranIncShk.
Joint permanent + transitory income shock specification: BufferStockIncShkDstn.

sbenthall · 2022-01-31T15:28:47Z

Great work, @Mv77 . Thanks for the careful design and implementation. :)

Mv77 · 2022-02-01T21:38:06Z

Thank you for the reviews! :)

Mv77 added 2 commits June 22, 2021 17:48

Define engines for lognormal income process

7b27a96

Rework construct_lognormal_income_process

3c1922d

sbenthall reviewed Jun 23, 2021

View reviewed changes

HARK/ConsumptionSaving/ConsIndShockModel.py Outdated Show resolved Hide resolved

sbenthall reviewed Jun 23, 2021

View reviewed changes

HARK/ConsumptionSaving/ConsIndShockModel.py Outdated Show resolved Hide resolved

sbenthall reviewed Jun 23, 2021

View reviewed changes

HARK/ConsumptionSaving/ConsIndShockModel.py Outdated Show resolved Hide resolved

Mv77 added 5 commits June 23, 2021 11:35

InkShk -> IncShk

c7a1a3e

Merge branch 'master' into income_process

da285a8

Merge branch 'master' into plumbing/income_process

41ed520

# Conflicts: # HARK/ConsumptionSaving/ConsIndShockModel.py # HARK/core.py

Add support for neutral_measure

70645dd

Lint

7b888cf

IndexDstn gets RNG instead of creating its own

63d1840

Mv77 mentioned this pull request Jan 19, 2022

IndexDstn rework #1104

Merged

3 tasks

Mv77 added 4 commits January 24, 2022 10:53

Merge branch 'master' into plumbing/income_process

b3f7671

# Conflicts: # HARK/distribution.py

Make Harm test easier to diagnose

a43f508

Create new RNGs for Perm and Tran for backward compatibility

a326805

Let time_vary elements be IndexDistributions

30c1ed0

Mv77 added 2 commits January 24, 2022 14:00

Fix markov tests

c4f97fa

Update CHANGELOG.md

d8846c3

Mv77 added 3 commits January 24, 2022 14:51

Fix markov examples

5b395f4

Write docstring for engines

5896574

Lint

de41dad

Mv77 changed the title ~~[WIP] Rework income process using IndexDistribution~~ Jan 24, 2022

Mv77 requested a review from sbenthall January 24, 2022 20:45

sbenthall self-assigned this Jan 26, 2022

Mv77 added 2 commits January 26, 2022 18:17

Make engines a proprety of the agent

8c4c2f4

Stop passing neutral_measure around

d75fff5

Mv77 added 2 commits January 30, 2022 10:11

Make engines their own classes

e10da8e

Merge branch 'income_process' of https://github.com/Mv77/HARK into pl…

8b35b2c

…umbing/income_process # Conflicts: # HARK/ConsumptionSaving/ConsIndShockModel.py

More informative names for income shocks

7a479af

sbenthall merged commit 4bc5984 into econ-ark:master Jan 31, 2022

sbenthall mentioned this pull request Jan 31, 2022

define distributions directly in initial parameters for models #620

Closed

sbenthall added this to the 0.13.0 milestone Jan 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework income process using IndexDistribution #1024

Rework income process using IndexDistribution #1024

Mv77 commented Jun 23, 2021 •

edited

Loading

Mv77 commented Jun 23, 2021 •

edited

Loading

sbenthall commented Jun 23, 2021

Mv77 commented Jan 17, 2022

sbenthall commented Jan 17, 2022

Mv77 commented Jan 17, 2022

Mv77 commented Jan 17, 2022

sbenthall commented Jan 17, 2022

sbenthall commented Jan 17, 2022

Mv77 commented Jan 17, 2022

Mv77 commented Jan 18, 2022 •

edited

Loading

Mv77 commented Jan 18, 2022

Mv77 commented Jan 18, 2022 •

edited

Loading

Mv77 commented Jan 19, 2022 •

edited

Loading

codecov-commenter commented Jan 24, 2022 •

edited

Loading

Mv77 commented Jan 24, 2022

sbenthall commented Jan 26, 2022

Mv77 commented Jan 26, 2022

sbenthall commented Jan 26, 2022 •

edited

Loading

sbenthall commented Jan 27, 2022

Mv77 commented Jan 27, 2022

Mv77 commented Jan 27, 2022

Mv77 commented Jan 30, 2022

llorracc commented Jan 30, 2022

Mv77 commented Jan 30, 2022

sbenthall commented Jan 31, 2022

Mv77 commented Feb 1, 2022

Rework income process using IndexDistribution #1024

Rework income process using IndexDistribution #1024

Conversation

Mv77 commented Jun 23, 2021 • edited Loading

Mv77 commented Jun 23, 2021 • edited Loading

sbenthall commented Jun 23, 2021

Mv77 commented Jan 17, 2022

sbenthall commented Jan 17, 2022

Mv77 commented Jan 17, 2022

Mv77 commented Jan 17, 2022

sbenthall commented Jan 17, 2022

sbenthall commented Jan 17, 2022

Mv77 commented Jan 17, 2022

Mv77 commented Jan 18, 2022 • edited Loading

Mv77 commented Jan 18, 2022

Mv77 commented Jan 18, 2022 • edited Loading

Mv77 commented Jan 19, 2022 • edited Loading

codecov-commenter commented Jan 24, 2022 • edited Loading

Codecov Report

Mv77 commented Jan 24, 2022

sbenthall commented Jan 26, 2022

Mv77 commented Jan 26, 2022

sbenthall commented Jan 26, 2022 • edited Loading

sbenthall commented Jan 27, 2022

Mv77 commented Jan 27, 2022

Mv77 commented Jan 27, 2022

Mv77 commented Jan 30, 2022

llorracc commented Jan 30, 2022

Mv77 commented Jan 30, 2022

sbenthall commented Jan 31, 2022

Mv77 commented Feb 1, 2022

Mv77 commented Jun 23, 2021 •

edited

Loading

Mv77 commented Jun 23, 2021 •

edited

Loading

Mv77 commented Jan 18, 2022 •

edited

Loading

Mv77 commented Jan 18, 2022 •

edited

Loading

Mv77 commented Jan 19, 2022 •

edited

Loading

codecov-commenter commented Jan 24, 2022 •

edited

Loading

sbenthall commented Jan 26, 2022 •

edited

Loading