Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework income process using IndexDistribution #1024

Merged
merged 23 commits into from
Jan 31, 2022

Conversation

Mv77
Copy link
Contributor

@Mv77 Mv77 commented Jun 23, 2021

@sbenthall recently introduced IndexDistribution to deal (among other things) with distributions that time-varying indices.

Seb, this is a first attempt at using that class in the income-process constructor. The current code works and I quite like the change, I'd like your feedback.

The big issue is that, as we discussed before, we need to think a bit more about what is the appropriate thing to do about RNG seeds with this change. I've not taken care of it and that's why tests are failing.

  • Tests for new functionality/models or Tests to reproduce the bug-fix in code.
  • Updated documentation of features that add new functionality.
  • Update CHANGELOG.md with major/minor changes.
@Mv77
Copy link
Contributor Author

Mv77 commented Jun 23, 2021

TODOS:

  • Handle seeds appropriately.
  • Replace the code that draws from the income distribution to make use of the new functionality.
  • Docstrings and linting.
@sbenthall
Copy link
Contributor

Yes, thank you, I like this.

There is an additional failing test:

    def setUp(self):
        # Make one agent type and an economy for it to live in
        self.agent = AggShockMarkovConsumerType()
        self.agent.cycles = 0
        self.agent.AgentCount = 1000  # Low number of simulated agents
>       self.agent.IncShkDstn[0] = 2 * [self.agent.IncShkDstn[0]]  ## see #557
E       TypeError: 'IndexDistribution' object does not support item assignment

I didn't write a __setitem__ method for the IndexDistribution class. One could be easily added.

https://www.geeksforgeeks.org/__getitem__-and-__setitem__-in-python/

The seed logic in HARK is quite complex and I'm not sure I fully understand all the motivations for it.

@Mv77
Copy link
Contributor Author

Mv77 commented Jan 17, 2022

I am reviving this and will work on it during the next few weeks. It is part of a broad goal of making the income-process construction code more flexible. However, the next step is to update the current code using newer tools that have been added, like IndexDistribution and calc_expectations (both of which I think are awesome).

A big issue with this kind of update is the RNG. For reasons yet to be determined (maybe the order in which shocks are drawn or some seed that needs to be passed) using IndexDistribution changes the shocks that are drawn by tests. Being the income process that pretty much every AgentType in HARK inherits, this change makes almost all of the simulation tests fail.

Before I dive into it, I want to summon the RNG maestros to ask if they have any idea or opinion on how this issue should be resolved. Do I edit the tests to check for the new values? Do I tweak the new implementation until matches the current tests? Do I look for a happy middle?

Have the team's views on simulation tests and how they should be done changed?

@sbenthall
Copy link
Contributor

Hooray. Glad you're looking into this.

I don't think this addresses the whole issue, but one way to tackle the problem is to rewrite tests so that rather than targeting a specific numerical value, they tests a functional relationship:

Here's an example:
https://github.com/econ-ark/HARK/blob/master/HARK/ConsumptionSaving/tests/test_ConsPortfolioFrameModel.py#L81-L84

This is actually a much better test, since it tests the functionality of the code, and not some accidental combination of the code and a random seed.

@Mv77
Copy link
Contributor Author

Mv77 commented Jan 17, 2022

That test is evaluating a property of the solution. I think we have a fair bit of those.

I still see value in testing simulation results. Imagine for instance that some indexing error in SimOnePeriod made us draw the wrong shocks. Or that, as it happened 2 years ago, some handling of a seed makes agents always draw the same shock. Tests to catch that are great. So I am happy that we have both kinds, but I wonder if in your discussions there has been any development in how we think we should carry out simulation tests going forward.

@Mv77
Copy link
Contributor Author

Mv77 commented Jan 17, 2022

Something reassuring is that in the current state of the PR, tests that deal with the solution itself (not the simulation history) pass, so we know the current machinery is leaving the distributions that the agents get in the solution process unchanged. We know the miss-matches are generated when we draw from the IndexDistributions as opposed to the lists of discrete distributions currently in master.

@sbenthall
Copy link
Contributor

These are great points.

This is an example of a test of simulation functionality that does not depend on the random seed:
https://github.com/econ-ark/HARK/blob/master/HARK/ConsumptionSaving/tests/test_ConsPortfolioFrameModel.py#L69-L74

I am not aware of any new consensus for doing simulation tests.
But automated testing wisdom would say that it's always better to test for a specific unit of functionality than to test for a general 'correct answer' from a complicated system. The current way of testing simulations -- just testing for some predetermined output -- is a lazy way of writing tests. I wrote most of them, so I know how lazy they are. This problem with the random seeds exposes the cost of that laziness.

@sbenthall
Copy link
Contributor

Is it possible that there is an error in IndexDistribution though?
If so, a new unit test for IndexDistribution might catch it.

@Mv77
Copy link
Contributor Author

Mv77 commented Jan 17, 2022

Is it possible that there is an error in IndexDistribution though? If so, a new unit test for IndexDistribution might catch it.

I am 98% sure that:

  • There is no bug in terms of representing different distributions from the ones we want to represent. If that were the case we would get errors in tests that deal with model solutions.
  • It does seem to change the order or seeds with which shocks are drawn, resulting in different simulation histories even if the distributions we are drawing from are the same as before. That's why we are getting failures in the simulation tests.
@Mv77
Copy link
Contributor Author

Mv77 commented Jan 18, 2022

@sbenthall I have a clue of what might be happening.

If you run

from HARK.distribution import IndexDistribution, Lognormal, MeanOneLogNormal
import numpy as np

n_app = 5
sigmas = [0.1,0.2]

RNG = np.random.RandomState(1)

a = []
for s in sigmas:
    a.append(MeanOneLogNormal(s, seed = RNG.randint(0, 2 ** 31 - 1)))

a_draws = []
for d in a:
    a_draws.append(d.draw(1))

print(a_draws)

b = IndexDistribution(engine = MeanOneLogNormal, conditional  = {'sigma': sigmas}, seed = 1)

b_draws = []
for d in b:
    b_draws.append(d.draw(1))

print(b_draws)

You will find that a_draws and b_draws have the exact same shock draws.

Notice:

  • Approach "a" is creating a random-number generator with seed 1, then using it to draw the seeds of the time-specific distributions.
  • Approach "b" creates an index distribution with seed=1 and then draws from the time-specific distributions.

The key of why these two approaches give the same result is that IndexDistribution creates its own internal random-number generator with the given seed

super().__init__(seed)

def __init__(self, seed=0):
self.RNG = np.random.RandomState(seed)

and then uses that random-number generator to draw the seeds of the time-specific distributions
def __getitem__(self, y):
# test one item to determine case handling
item0 = list(self.conditional.values())[0]
if type(item0) is list:
cond = {key: val[y] for (key, val) in self.conditional.items()}
return self.engine(seed=self.RNG.randint(0, 2 ** 31 - 1), **cond)

That is why the two previous approaches yield the same shocks.

However, notice that representing the time-varying income distribution with an IndexDistribution generates and extra RNG. There will be t+1 random-number generators:

  • A "master" one, that will draw the seed of the time-specific distributions.
  • t time-specific ones.

The difference is that under the current approach the seeds for the time-specific distributions are drawn directly from the agent's RNG

IncShkDstn.append(
combine_indep_dstns(
PermShkDstn_t,
TranShkDstn_t,
seed=self.RNG.randint(0, 2 ** 31 - 1),
)

Thus, without IndexDstn we have
Agent's RNG --(draws seeds for)--> Time-specific RNGs
and with IndexDstn we have
Agent's RNG --(draws seed for)--> IndexDstn's RNG --(draws seeds for)--> Time-specific RNGs

The extra step must make simulations not exactly match (but it might not be the only thing).

@Mv77
Copy link
Contributor Author

Mv77 commented Jan 18, 2022

A solution might be to let IndexDstn (optionally) receive an RNG from which to draw time-specific seeds, instead of creating its own.

@Mv77
Copy link
Contributor Author

Mv77 commented Jan 18, 2022

@sbenthall, I'd like to ask why you think IndexDistribution must inherit from the Distribution class.

I think the above issue is being generated by the fact that the current implementation thinks of the IndexDistribution as a random variable in itself. I wonder what you would think of a re-interpretation in which IndexDistribution is not itself a distribution but a way to organize time-varying distributions.

@Mv77
Copy link
Contributor Author

Mv77 commented Jan 19, 2022

A solution might be to let IndexDstn (optionally) receive an RNG from which to draw time-specific seeds, instead of creating its own.

@sbenthall Just confirmed this (mostly) works.

image

Passes all of IndShock's tests except Harmenberg (it fails most with the current implementation).

@Mv77 Mv77 mentioned this pull request Jan 19, 2022
3 tasks
@codecov-commenter
Copy link

codecov-commenter commented Jan 24, 2022

Codecov Report

Merging #1024 (7a479af) into master (84ee03e) will increase coverage by 0.06%.
The diff coverage is 97.56%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1024      +/-   ##
==========================================
+ Coverage   73.66%   73.73%   +0.06%     
==========================================
  Files          69       69              
  Lines       10568    10577       +9     
==========================================
+ Hits         7785     7799      +14     
+ Misses       2783     2778       -5     
Impacted Files Coverage Δ
HARK/ConsumptionSaving/ConsIndShockModel.py 86.29% <97.05%> (+0.43%) ⬆️
.../ConsumptionSaving/tests/test_ConsAggShockModel.py 98.23% <100.00%> (ø)
.../ConsumptionSaving/tests/test_ConsRepAgentModel.py 100.00% <100.00%> (ø)
...nsumptionSaving/tests/test_IndShockConsumerType.py 73.07% <100.00%> (+0.38%) ⬆️
HARK/core.py 89.77% <100.00%> (ø)
HARK/distribution.py 83.33% <0.00%> (+0.51%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 84ee03e...7a479af. Read the comment docs.

@Mv77 Mv77 changed the title [WIP] Rework income process using IndexDistribution Jan 24, 2022
@Mv77
Copy link
Contributor Author

Mv77 commented Jan 24, 2022

This PR finally passes tests! I managed to resolve all the simulation discrepancies.

It does kick the can of simulation tests down the road, but there is already an issue for that #1105 and I feel getting to an unified solution and a single fix in a different PR would be better. It was also satisfying to get it to match: I now know there is no bug in IndexDstn.

The actual change that is introduced by this pr is not that transcendental: it simply uses IndexDstn for the income process. However, this innovation makes it much easier to flexibilize the income process with things like time-varying unemployment, which is not currently supported (and I will need and implement soon).

I am re-requesting review.

@Mv77 Mv77 requested a review from sbenthall January 24, 2022 20:45
@sbenthall sbenthall self-assigned this Jan 26, 2022
@sbenthall
Copy link
Contributor

This is looking very good.

There's one change I'd suggest.

According to its documentation, the engine parameter of the IndexDistribution should be of type Distribution. You have made standalone methods (e.g. IncShk_engine) for your distribution engines. (I believe I've done this in my own code elsewhere).

I wonder whether it would make sense to make these engine methods into model-specific subclasses of Distribution. Basically, you would just be putting the substance of your engine methods into the __init__ method of the class.

@Mv77
Copy link
Contributor Author

Mv77 commented Jan 26, 2022

Thanks Seb,

Just to verify that I follow, you'd like me to move my ***Shk_engine methods to the inside of the Agent type? so that they are called by self.***Shk_engine?

I think that'd make sense, since these are specifically tailored for the given model.

@sbenthall
Copy link
Contributor

sbenthall commented Jan 26, 2022

No, that's not what I mean.

I mean something more like:

class IncomeShockDistribution(Distribution):

    def __init__(self,
        sigma_Perm,
        sigma_Tran,
        n_approx_Perm,
        n_approx_Tran,
        UnempPrb,
        IncUnemp,
        seed=0,):
        ...

IncShkDstn = IndexDistribution(
    engine=IncomeShockDistribution,
    conditional={
        "sigma_Perm": PermShkStd,
        "sigma_Tran": TranShkStd,
        "n_approx_Perm": PermShkCount_list,
        "n_approx_Tran": TranShkCount_list,
        "UnempPrb": UnempPrb_list,
        "IncUnemp": IncUnemp_list,
    },
    RNG=self.RNG,
)
@sbenthall
Copy link
Contributor

To elaborate on this @Mv77 ...

The single-period income distributions are used by several models (in the current HARK architecture, via inheritance). But where I think we should move to is that these distributions should be define in model configuration, not hard coded into model code. See issue #620

In the FramedAgent code, this is precisely what happens: the PermShk frame takes, as its 'transition function', a Distribution.

https://github.com/econ-ark/HARK/blob/master/HARK/ConsumptionSaving/ConsPortfolioFrameModel.py#L193-L206

That is is then used by the simulation code to know that that variable must be sampled as a shock.

https://github.com/econ-ark/HARK/blob/master/HARK/frame.py#L257-L265

At the very least, if you defined the IncShk engine in terms of classes that are subclasses of Distribution (or, DiscreteDistribution, which may be more appropriate), then I could use that same distribution definition in the PortfolioFrameModel. That's an indication of how this would support code reuse.

@Mv77
Copy link
Contributor Author

Mv77 commented Jan 27, 2022

Thanks Seb.

I initially thought of this as type puritanism!

But the more I think of it, the more it makes sense. It pushes us in the direction of income distributions being their own object, modular entities easily switched around. I am fully onboard; will revise over the weekend.

@Mv77
Copy link
Contributor Author

Mv77 commented Jan 27, 2022

Not that type puritanism is bad; it is just one of those things that have a huge social return but are annoying to enforce in one's own life.

…umbing/income_process

# Conflicts:
#	HARK/ConsumptionSaving/ConsIndShockModel.py
@Mv77
Copy link
Contributor Author

Mv77 commented Jan 30, 2022

@sbenthall

Seb, I've now made engines their own class. Tests still pass. This is ready for your review again.

Thank you!

@llorracc
Copy link
Collaborator

A small suggestion: Many different choices could be made for the IncomeShockDistribution. The choices here are highly specific: Transitory and Permanent shocks, with a transitory unemployment spell. As long as we are inventing a new name anyway, let's call it something more specific, like FriedmanBufferStockShockDistribution or TranPermWithIIDUnempShockDistribution or BufferStockTheoryShockDistribution. (Or perhaps substituting Shk For Shock in accord with guidelines in the NARK. And maybe Dstn for Distribution, to prevent the name of the object from being too long.

@Mv77
Copy link
Contributor Author

Mv77 commented Jan 30, 2022

Agreed. I've now gone with

  • Permanent income shocks: LognormPermIncShk.
  • Transitory income shocks: MixtureTranIncShk.
  • Joint permanent + transitory income shock specification: BufferStockIncShkDstn.
@sbenthall
Copy link
Contributor

Great work, @Mv77 . Thanks for the careful design and implementation. :)

@Mv77
Copy link
Contributor Author

Mv77 commented Feb 1, 2022

Thank you for the reviews! :)

@sbenthall sbenthall added this to the 0.13.0 milestone Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants