What we cannot control, we do not understand. — Adapted from Richard Feynman: “What I cannot create, I do not understand.” The best way to predict the future is to control it. — Adapted from: “The best way to predict the future is to invent it.” (See https://quoteinvestigator.com/2012/09/27/invent-the-future/ )
Song Fang, Quanyan Zhu. A connection between feedback capacity and Kalman filter for colored Gaussian noises. IEEE International Symposium on Information Theory (ISIT), 2020. 视频链接: https://www.bilibili.com/video/BV19v411B7SG/ 论文链接: https://arxiv.org/abs/2001.03108
Towards Integrating Information and Control Theories: From Information-Theoretic Measures to System Performance Limitations 信息論與控制論之融合:從信息論測度到系統性能局限 链接: https://scholars.cityu.edu.hk/en/theses/towards-integrating-information-and-control-theories-from-informationtheoretic-measures-to-system-performance-limitations(8f21be24-127e-495d-a0cc-4a6e728450ee).html --- 其中引用了不少有意思的 quotes(也可参考: http://blog.sciencenet.cn/blog-286797-1022865.html 以及 http://blog.sciencenet.cn/blog-286797-1021452.html ): Frontispiece When one submerges the gourd bowl in water, there floats the gourd ladle. — Chinese proverb Chapter 1 There is an obvious analogy between the problem of smoothing the data to eliminate or reduce the effect of tracking errors and the problem of separating a signal from interfering noise in communications systems. — R. B. Blackman, H. W. Bode, and C. E. Shannon, “Data Smoothing and Prediction in Fire-Control Systems,” 1946 (We) become aware of the essential unity of the set of problems centring about communication, control, and statistical mechanics, whether in the machine or living tissue... We have decided to call the entire field of control and communication theory, whether in the machine or the animal, by the same Cybernetics. — N. Wiener, “Cybernetics,” 1948, Fundamental limits are actually at the core of many fields of engineering, science and mathematics... Firstly, they evolve from basic axioms about the nature of the universe. Secondly, they describe inescapable performance bounds that act as benchmarks for practical systems. And thirdly, they are recognized as being central to the design of real systems. — M. M. Seron, J. H. Braslavsky, and G. C. Goodwin, “Fundamental Limitations in Filtering and Control,” 1997 Chapter 2 The sciences do not try to explain, they hardly even try to interpret, they mainly make models. By a model is meant a mathematical construct which, with the addition of certain verbal interpretations, describes observed phenomena. The justification of such a mathematical construct is solely and precisely that it is expected to work. — John von Neumann The only way of discovering the limits of the possible is to venture a little way past them into the impossible. — Arthur C. Clarke A theory is the more impressive the greater the simplicity of its premises, the more different kinds of things it relates, and the more generalized its area of applicability. Therefore the deep impression that classical thermodynamics made upon me. It is the only physical theory of universal content which I am convinced will never be overthrown, within the framework of applicability of its basic concepts. — Albert Einstein Chapter 3 The idea of a statistical message source is central to Shannon’s work. The study of random processes had entered into communication before his communication theory. There was a growing understanding of and ability to deal with problems of random noise... Wiener had dealt extensively with the extrapolation, interpolation, and smoothing of time series. Although Wiener’s book was published in 1949, it had been available earlier in a wartime version known as the Yellow Peril (the cover was yellow). Shannon and Bode took considerable pains to put Wiener’s work in a form more directly useful to them (and to many others). — J. R. Pierce, “The Early Days of Information Theory,” 1973 We said before: “It feeds upon negative entropy,” attracting, as it were, a stream of negative entropy upon itself, to compensate the entropy increase it produces by living and thus to maintain itself on a stationary and fairly low entropy level. — Erwin Schrodinger, “What is Life,” 1944 If one has really technically penetrated a subject, things that previously seemed in complete contrast, might be purely mathematical transformations of each other. — John von Neumann Chapter 4 However, by building an amplifier whose gain is deliberately made, say 40 decibels higher than necessary (10000 fold excess on energy basis), and then feeding the output back on the input in such a way as to throw away that excess gain, it has been found possible to effect extraordinary improvement in constancy of amplification and freedom from nonlinearity. — H. S. Black, “Stabilized Feedback Amplifiers,” 1934 In control and communication we are always fighting nature’s tendency to degrade the organized and to destroy the meaningful; the tendency, as Gibbs has shown us, for entropy to increase. — N. Wiener, “The Human Use of Human Beings,” 1950 All stable processes we shall predict. All unstable processes we shall control. — John von Neumann Chapter 5 What’s in a name? In the case of Shannon’s measure the naming was not accidental. In 1961 one of us (Tribus) asked Shannon what he had thought about when he had finally confirmed his famous measure. Shannon replied: “My greatest concern was what to call it. I thought of calling it ‘information’, but the word was overly used, so I decided to call it ‘uncertainty’. When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, ‘You should call it entropy, for two reasons. In the first place you uncertainty function has been used in statistical mechanics under that name. In the second place, and more importantly, no one knows what entropy really is, so in a debate you will always have the advantage.’” — M. Tribus, E. C. McIrvine, “Energy and information,” 1971 The bottom line for mathematicians is that the architecture has to be right. In all the mathematics that I did, the essential point was to find the right architecture. It’s like building a bridge. Once the main lines of the structure are right, then the details miraculously fit. The problem is the overall design. — Freeman Dyson Far waters fail to quench near fires. — Chinese Proverb Chapter 6 I like to think of Bode’s integrals as conservation laws. They state precisely that a certain quantity—the integrated value of the log of the magnitude of the sensitivity function—is conserved under the action of feedback. The total amount of this quantity is always the same. It is equal to zero for stable plant/compensator pairs, and it is equal to some fixed positive amount for unstable ones... This applies to every controller, no matter how it was designed. Sensitivity improvements in one frequency range must be paid for with sensitivity deteriorations in another frequency range, and the price is higher if the plant is open-loop unstable. — G. Stein, “Respect the Unstable,” 2003 The average performance of any pair of algorithms across all possible problems is identical. This means in particular that if some algorithm’s performance is superior to that of another algorithm over some set of optimization problems, then the reverse must be true over the set of all other optimization problems. — D. H. Wolpert, W. G. Macready, “No Free Lunch Theorems for Optimization,” 1997 We know the past but cannot control it. We control the future but cannot know it. — Claude Shannon Chapter 7 Consider the case where you are the controller and you observe samples of the process output whose average has been satisfactorily close to set point and that suffers only from white noise disturbances. Should you make an adjustment to the control output upon observing a sample of the process output that is not on set point? If the average of the process output is indeed nearly at the set point then any deviation, if it is really white or unautocorrelated, will be completely independent of the previous value of the control output and it will have no impact on subsequent disturbances. Therefore, if you should react to such a deviation, you would be wasting your time because the next observation will contain another deviation that has nothing to do with the previous deviation on which you acted. You, in fact, may make things worse... A feedback controller cannot decrease the standard deviation of the white noise riding on the process output. At best it can keep the average on set point. — D. M. Koenig, “Practical Control Engineering,” 2009 In respect of military method, we have, firstly, Measurement; secondly, Estimation of quantity; thirdly, Calculation; fourthly, Balancing of chances; fifthly, Victory. — Sun Tzu, “The Art of War” All the evidence shows that God was actually quite a gambler, and the universe is a great casino, where dice are thrown, and roulette wheels spin on every occasion. Over a large number of bets, the odds even out and we can make predictions... But over a very small number of rolls of the dice, the uncertainty principle is very important. — Stephen Hawking Chapter 8 The world is continuous, but the mind is discrete. — David Mumford Time is defined so that motion looks simple. — John Wheeler If everything seems under control, you’re just not going fast enough. — Mario Andretti Chapter 9 Essentially, all models are wrong, but some are useful. — George. E. P. Box In theory, theory and practice are the same. In practice, they are not. — Albert Einstein Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin. — John von Neumann Chapter 10 An understanding of fundamental limitations is an essential element in all engineering. Shannon’s early results on channel capacity have always had center court in signal processing. Strangely, the early results of Bode were not accorded the same attention in control. — K. J. Astrom, in G. Stein, “Respect the Unstable,” 2003 If I turn toward a science not for external reasons such as earning an income, or for ambition, and also not — at least not exclusively — for the mere sportive joy and the fun of brain-acrobatics, then I must ask myself the question: what is the final goal that the science I am devoted to will and can reach? To what extent are its general results “true”? What is essential and what is based only on accident in its development? — Albert Einstein As a mathematical discipline travels far from its empirical source, or still more, if it is a second and third generation only indirectly inspired by ideas coming from “reality” it is beset with very grave dangers. It becomes more and more purely aestheticizing, more and more purely I’art pour I’art. This need not be bad, if the field is surrounded by correlated subjects, which still have closer empirical connections, or if the discipline is under the influence of men with an exceptionally well-developed taste. But there is a grave danger that the subject will develop along the line of least resistance, that the stream, so far from its source, will separate into a multitude of insignificant branches, and that the discipline will become a disorganized mass of details and complexities. In other words, at a great distance from its empirical source, or after much “abstract” inbreeding, a mathematical subject is in danger of degeneration. At the inception the style is usually classical; when it shows signs of becoming baroque, then the danger signal is up... In any event, whenever this stage is reached, the only remedy seems to me to be the rejuvenating return to the source: the re-injection of more or less directly empirical ideas. I am convinced that this was a necessary condition to conserve the freshness and the vitality of the subject and that this will remain equally true in the future. — John von Neumann
50 年代末到60 年代初, 航天技术的发展涉及到大量的多输入多输出系统的最优控制问题, 用经典 控制理论已难以解决. 数字计算机的出现使得亨利¢ 庞加莱(1875-1906) 的状态空间表述方法可以作 为被控对象的数学模型和控制器设计与分析的工具.于是产生了以极大值原理、动态规划和 状态空间法 为核心的现代控制理论。 1. 经典状态空间法: State Space Model 状态空间模型包括两个模型: 一是状态方程模型,反映动态系统在输入变量作用下在某时刻所转移到的状态; 二是输出或量测方程模型,它将系统在某时刻的输出和系统的状态及输入变量联系起来。 如下 : 离散状态空间模型. 其中, k 为离散时间, x k 为状态变量, y k 为观测, u k ,v k 为噪声。 f k (.)为状态模型, h k (.) 为观测模型。 状态空间模型 提供一种方便、有效的时序递归的贝叶斯最优估计框架,因此有了坚实的理论基础。开山之作就是卡尔曼滤波,见下文的回顾: Approximate Gaussian Conjugacy: Parametric Recursive Filtering under Nonlinearity, Multimodality, Uncertainty, and Constraint, and Beyond, Frontiers of Information Technology Electronic Engineering, 2017, 18(12):1913-1939, LINK 其中特别值得一提的是,哈佛终身教授何毓琦院士1964年发表于TAC的经典文章最早(之一)阐释了卡尔曼滤波和贝叶斯最优估计的关系。这极大助力了后来卡尔曼滤波的蓬勃发展 ,至今已有近六十年(因为一个方法关联一个伟大的理论,将如虎添翼!): Ho, Y., Lee, R., 1964. A Bayesian approach to problems in stochastic estimation and control. IEEE Trans. Autom. Contr., 9(4):333-339. 状态空间模型的假设条件是动态系统符合马尔科夫Hidden Markov Model (HMM)特性,即上面的 x k = f k ( x k- 1 , u k ) ,即给定系统的现在状态,则系统的将来与其过去独立;这给建模和递归计算带来了极大方便。然而,HMM受限很多,对真实世界的刻画并不一定准确甚至有效,特别是,随着传感大数据时代的到来,其一些弊端日益突出. 毕竟我们今天的传感器和外界条件和卡尔曼、何院士的60年代完全不可同日而语: 目标变得越来越狡猾,难以用简单的HMM建模。特别是系统统计信息缺失(如不知道目标的运动模型,不知道系统噪声、甚至观测噪声模型,以及各种的复杂系统关联、时滞和耦合等等),根本无法构建较为准确甚至有效的的状态空间模型, 2. 抛弃HMM: 对于传感器数据越来越多,传感器精度越来越高的情况,是否可以有新的解决方案(HMM弃之不用)呐?见 : 如果我有成百上千个传感器,是否还需要动态模型? 以及 轻松多传感器多目标探测与跟踪! 这类方案主要应对完全未知系统背景,但数据量很大的情况 Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful. -- Box, George E. P.; Norman R. Draper (1987). Empirical Model-Building and Response Surfaces, p. 74 3. 数据驱动的新框架: 既然经典方法成也萧何(HMM)败也萧何(HMM),除了弃之不用(太过消极了点)之外,更恰当的解决方法是寻找一个更符合自然规律和更能够准确描述真实世界的替代模型。 下文提出了一种取代HMM的新框架: Joint Smoothing, Tracking, and Forecasting Based on Continuous-Time Target Trajectory Fitting, IEEE Trans. Automation Science and Engineering, Oct. 2018. DOI:10.1109/TASE.2018.2882641. @ IEEE Xplore Pre-print @ arXiv:1708.02196 Joint Smoothing and Tracking Based on Continuous-Time Target Trajectory Function Fitting 论文中提供了程序源代码(链接) Abstract: This paper presents a joint trajectory smoothing and tracking framework for a specific class of targets with smooth motion. We model the target trajectory by a continuous function of time (FoT), which leads to a curve fitting approach that finds a trajectory FoT fitting the sensor data in a sliding time-window. A simulation study is conducted to demonstrate the effectiveness of our approach in tracking a maneuvering target, in comparison with the conventional filters and smoothers. 基于数据驱动的估计新框架(与基于HMM的经典状态空间法的思路相比)的核心在于将HMM替换为一个连续时间上的目标轨迹曲线函数 FoT (Function of Time) x k = f ( t ) , 从而将传统的滤波、平滑与预报等估计问题转化为一个连续时间窗内的 曲线拟合和参数学习 问题,即可用一个参数化的函数近似曲线轨迹函数: F ( t ; C k ) ≈ f ( t ) , 其中 C k 为待求参数。从而可以采用聚类、拟合与机器学习等数据驱动的工具与方法解决复杂场景下的(多)目标探测、跟踪与预报问题,这样就有望克服传统方法严重依赖目标模型假设、机动探测时滞、对错序数据敏感等难题。如下图所示: 上图中,左侧为 经典的滤波估计方法: KF : Kalman Filter, AGC : Approximate Gaussian Conjugacy, PF : Particle Filter, MHT :Multiple hypothesis tracking, FISST :Finite-Set Statistics. 等等.....近六十年的发展,出现了非常多的理论和方法。 右侧为数据驱动的新范式: O2 : Observation-only , C4F : Clustering for Filtering , F4S :Fitting for Smoothing , FTC : Flooding-then-Clustering -, T-FoT : Trajectory Function of Time。 两者均采用相同的观测模型 y k = h k ( x k , v k ) , 但是不同的状态模型: 经典状态空间法采用HMM,新范式采用轨迹FoT。 一提到曲线拟合或者回归分析,可能会觉得计算效率低,不如递归迭代计算所以不能满足实时性?事实上: 对于线性观测系统,那么只需要线性拟合,并一般定义量测误差为范数2的马氏距离,曲线拟合退化为加权最小二乘直接给出,计算效率胜过线性卡尔曼滤波。 对于非线性观测系统进行线性拟合如多项式拟合,拟合需要往往需要迭代近似。对于非线性观测系统下的曲线拟合计算效率至关重要的是 参数的初始化, C k = C k -1 + ρ k 可大大加速计算效率(甚至一两步的梯度下降法就可以搜索到收敛的参数估计),从而可能使得拟合的计算效率扩展卡尔曼滤波(需要计算雅可比阵)还快 --- 这可能超出我们直觉想象 -- 不试不知道! 更进一步,如果系统含有约束条件呐?仍然可以有效解决,请参考下文: 4. 约束下的SSM和轨迹曲线拟合: Single-Road-Constrained Positioning Based on Deterministic Trajectory Geometry Tiancheng Li, IEEE Communications Letters (Volume: 23, Issue: 1 , Jan. 2019) pp.。 80-83 论文中提供了程序源代码(链接) Abstract: We consider the single-road-constrained estimation problem for positioning a target that moves on a single, deterministic and exactly known trajectory. Based on the geometry of the trajectory curve, we cast the constrained estimation problem as an unconstrained problem with reduced state dimension. Two approaches are devised based on a Markov transition model for unscented Kalman filtering and a continuous function of time for (weighted) least square fitting, respectively. A popular simulation model has been used for demonstrating the performance of the proposed approaches in comparison to existing approaches. 请参考论文。下面给出该短文关键部分的一些截图。
Approximate Gaussian conjugacy : parametric recursive filtering under nonlinearity, multimode, uncertainty, and constraint, and beyond Author(s): Tian-cheng Li, Jin-ya Su, Wei Liu, Juan M. Corchado Affiliation(s): School of Sciences, University of Salamanca, Salamanca 37007, Spain; more Corresponding email(s): t.c.li@usal.es , J.Su2@lboro.ac.uk , w.liu@sheffield.ac.uk , corchado@usal.es Key Words: Kalman filter, Gaussian filter, time series estimation, Bayesian filtering, nonlinear filtering; constrained filtering, Gaussian mixture, maneuver, unknown inputs Abstract: Since the landmark work of R. E. Kalman in the 1960s, considerable efforts have been devoted to time series state space models for a large variety of dynamic estimation problems. In particular, parametric filters that seek exact analytical estimates based on closed-form Markov-Bayes recursion, e.g., recursion from a Gaussian or gaussian mixture (GM) prior to a Gaussian/GM posterior (termed Gaussian conjugacy in this paper), form the backbone for general time series filter design. Due to challenges arising from nonlinearity, multimode (including target maneuver ), intractable uncertainties (such as unknown inputs and/or non-Gaussian noises) and constraints (including circular quantities), and so on, new theories, algorithms and technologies are continuously being developed in order to maintain, or approximate to be more precise, such a conjugacy. They have in a large part contributed to the prospective developments of time series parametric filters in the last six decades. This paper reviews the stateof- the-art in distinctive categories and highlights some insights which may otherwise be overlooked . In particular, specific attention is paid to nonlinear systems with very informative observation , multimodal systems including gaussian mixture posterior and maneuver s, intractable unknown inputs and constraints, to fill the voids in existing reviews/surveys. To go beyond a pure review, we also provide some new thoughts on alternatives to the first order Markov transition model and on filter evaluation with regard to computing complexity. 10 Highlights presented in the paper: CRLB (Cramer-Rao Lower Bound) limits only the variance of unbiased estimators and lower MSE (mean squared error) can be obtained by allowing for a bias in the estimation, while ensuring that the overall estimation error is reduced. The KF (Kalman filter) is conditionally biased with a non-zero process noise realization in the given state sequence and is not an efficient estimator in a conditional sense, even in a linear and Gaussian system. Among all possible distributions of the observation noise $\\mathbf{w}$ with a fixed covariance matrix, the CRLB for $\\mathbf{x}$ attains its maximum when $\\mathbf{w}$ is Gaussian, i.e., the Gaussian scenario is the ``worst-case'' for estimating $\\mathbf{x}$. For sufficiently precise measurements, none of the KF variants, including the KF itself, are based on an accurate approximation of the joint density. Conversely, for imprecise measurements all KF variants accurately approximate the joint density, and therefore the posterior density. Differences between the KF variants become evident for moderately precise measurements. While the BCRLB (Bayesian Cramer-Rao Lower Bound) sets a best line (in the sense of MMSE) that any unbiased sequential estimator can at maximum achieve, the O2 inference sets the bottom line that any ``effective'' estimator shall at worst achieve. Many adaptive-model approaches proposed for MTT (manuevering target tracking) may show superiority when the target indeed maneuvers but perform disappointingly or even significantly worse than those without using an adaptive model, when there is actually no maneuver. We call this over-reaction due to adaptability. The theoretically best achievable second order error performance, namely the CRLB, in target state estimation is independent of knowledge (or the lack of it) of the observation noise variance. Robust filtering is much more related to robustness with respect to statistical variations than it is to optimality with respect to a specified statistical model. Typically, the worst case estimation error rather than the MSE needs to be minimized in a robust filter. As a result, robustness is usually achieved by sacrificing the performance in terms of other criteria such as MSE and computing efficiency. The standard structure of recursive filtering is based on infinite impulse response (IIR), namely all the observations prior to the present time have effect on the state estimate at present time and therefore the filter suffers from legacy errors. Computing speed matters! open access page: http://www.jzus.zju.edu.cn/iparticle.php? doi=10.1631/FITEE.1700379