Optimizing Large-Scale Systems with Reinforcement Learning

Sayak Ray Chowdhury

$59.95 $51.16

Paperback

Not in-store but you can order this
How long will it take?

Availability Information

We source books from suppliers in Australia and overseas. For books we don't currently have in stock, the time it takes to get them from our suppliers can vary widely - from a few days to a few months - so we check each book with each supplier to determine the expected time it will take to be supplied to us.

We then advise you accordingly. If the time taken to get any book is too long for you, you can let us know and we will cancel or adjust your order, and refund as required.

To find out the anticipated arrival time for specific items prior to ordering, please contact us by phone or email:

Phone +61 2 9264 3111, or 1800 4 BOOKS (1800 4 26657) if outside Sydney:
option 1 Abbey's Bookshop (Crime, History, Science, Kids & more) • info@abbeys.com.au
option 2 Language Book Centre (ESL & Foreign Languages) • language@abbeys.com.au
option 3 Galaxy Bookshop (Sci-fi, Fantasy, Romance, Graphic Novels) • sf@galaxybooks.com.au

QTY:

English

Classichouse
29 March 2024

Database design & theory; Data capture & analysis; Data mining; Artificial intelligence; Information architecture

Summary
Details

Reinforcement learning (RL) is concerned with learning to take actions to maximize rewards,

by trial and error, in environments that can evolve in response to actions. A Markov decision process (MDP) [6] is a popular framework to model decision making in RL environments. In the MDP, starting from an initial observed state, an agent repeatedly (a) takes an action, (b) receives a reward, and (c) observes the next state of the MDP. The traditional objective in RL is a search goal - find a policy (a rule to select an action for each state) with high total reward using as few interactions with the environment as possible, known as the sample complexity of RL problem [7]. This is, however, quite different from the corresponding optimization goal, where the learner seeks to maximize the total reward earned from all its decisions, or equivalently, minimize the regret or shortfall in total reward compared to that of an optimal policy [8]. This objective is relevant in many practical sequential decision-making settings in which every decision that is taken carries utility or value - recommendation systems (clicks by consumers translate into revenue),

sequential investment and portfolio allocation (financial holdings make profits or losses), dynamic resource allocation in communication systems scheduling decisions affect data throughput), to name a few.

By: Sayak Ray Chowdhury
Imprint: Classichouse
Dimensions: Height: 279mm, Width: 216mm, Spine: 10mm
Weight: 454g
ISBN: 9798224721306
Pages: 190
Publication Date: 29 March 2024
Audience: General/trade , ELT Advanced
Format: Paperback
Publisher's Status: Active