Preface

Unlike the DP method talked in the Chapter 4, which aims to evaluate the states values or get the optimal Policy by computation based on the knowledge of the envs. MC method put more attention to experience, namely, the experiences agent have gone through. Either actual or simulated experience can be used.

.. to be continued