This paper provides a new distributed optimal cooperative tracking control method with
disturbance rejection for multi-mobile robots. The method removed the phenomenon of
separating the main controller and transformed a structure ACD with three NNs into a structure
with only one NN for which the novel NN weight-tuning laws were designed. The online
optimal cooperative control algorithms of the scheme, in which the knowledge of internal
dynamics was relaxed, were proposed to approximate the Nash equilibrium solutions of the HJI
equations. The algorithms guaranteed that the value functions and the control and disturbance
laws simultaneously converged to the optimal values and that the cooperative tracking errors of
the closed-loop systems and approximation errors of NN weights were uniformly ultimately
bounded. The compared simulations were carried out, and the experimental results on the testbed
equipped with an omnidirectional vision system were consistent with the simulation. Based on
the experimental results, it can be inferred that our method is effective for a certain practical
aspect of control systems technology including multiple nonholonomic mobile mechanic agents
or autonomous vehicles, which track both positions and velocities.
23 trang |
Chia sẻ: honghp95 | Lượt xem: 609 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Intelligent distributed cooperative control for multiple nonholonomic mobile robots subject to unknown dynamics and external disturbances - Nguyen Tan Luy, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
advantages. First, they included two iterative loops, i.e., as the parameters of
Nguyen Tan Luy, Tran Ngoc Anh, Dang Quang Minh
142
the disturber NN were updated in an iterative loop, the parameters of the actor NN had to wait
for updating in the other loop. Second, the knowledge of system internal dynamics was required.
Finally, the initial stability of the system strongly depended on how the initialization of the
three-NN weights was performed. For those reasons, an increase in computational complexity in
addition to wasted resources were inevitable [21, 22].
In our previous work [23], the design of an optimal cooperative control of multiple MIMO
nonlinear systems overcame drawbacks of using many NNs. However, disturbance rejection was
only considered for each agent, not for neighborhoods.
To the best of our knowledge, optimal cooperative tracking control schemes with
disturbance rejection for multiple nonlinear agents in the presence of no knowledge of internal
dynamics with application to nonholonomic robot systems has not yet been considered. In this
paper, we provide such a scheme with the following main contributions.
1. A bounded 2L -gain synchronization problem of a multi-NMR system in a distributed
communication graph is formulated. In contrast with the work in [18], we avoid
separating kinematics and dynamics when designing the control scheme. Thus, a
performance index function subject to all signals will be minimized.
2. The design of an optimal cooperative tracking control scheme is proposed. This work
extends the work of [14] to three cases: (i) Consideration of nonlinear agents instead of
linear agents; (ii) Use of only one NN for each agent instead of three to overcome some
disadvantages caused by the large number of NNs; (iii) No prior knowledge of system
internal dynamics for analyzing and designing control algorithms. We prove that the
system parameters converge to the approximately optimal values, and the cooperative
tracking errors and the NN approximation errors are uniformly ultimately bounded.
3. Through simulations, the proposed algorithms with other algorithms are compared to
demonstrate effectiveness. To test our algorithms in practical applications, a hardware
testbed, consisting of NMRs equipped with an omnidirectional vision system, is
designed and constructed. Based on the experimental results, it can be inferred that our
method is effective for a certain practical aspect of control systems technology including
multiple nonholonomic mobile mechanic agents or autonomous vehicles.
The paper is organized as follows. Section 2 provides the theoretical background of graph
and nonholonomic mobile robots from which integrated cooperative control is derived. Section 3
designs an optimal cooperative tracking control scheme. Section 4 shows the results of the
simulation and experiment. A brief conclusion is given in Section 5.
2. BACKGROUND AND PRELIMINARIES
2.1. Distributed Communication Graph Theory
Consider m robots in a cooperative system. The distributed communication of the system
can be represented by a directed graph ( , , ) , where the robots are characterized by the
set of nodes 0 ms , ,s , where 0s is a leader node. Relationships among the robots are
determined by the set of edges with a connectivity weight matrix [ ]ija , where
0iia , 0ija for ija and 0ija , otherwise. If the states of the robot i are available to the
Intelligent distributed cooperative control for multiple nonholonomic mobile robots
143
robot j then js is a neighborhood of is . All neighborhoods of is give a set
: ,( , )i j i jj s s s . Define a graph Laplacian matrix
( 1) ( 1)m m ,
where ( )idiag b ,
i
i ijj
b a . Note that row sums of are equal to zero.
A directed path is a sequence of ordered edges 1( , )i is s , 0, , 1i m . If a directed path
from is to js exists such that ( , )i js s , i js s , then the directed graph is strongly connected.
The graph is directed spanning tree if the set 1, , ms s exits at least one node with a directed
path to all other nodes.
The connectivity matrix between the ith robot and its leader is defined as
1 2, ,..., mdiag c c c (1)
where 1ic if the ith robot connects to its leader, and 0ic , otherwise.
2.2. Nonholonomic Mobile Robot and Integrated Cooperative Control Problem
Consider a NMR presented by the node is . Its mass im , including the mass of the platform
without wheels and the mass of wheels, is focused on the center point. A distance of driven
wheels is ib . A radius of each wheel is ir . A distance from the center point to the driving axle is
il . Without loss of generality, il can be equal to zero. NMR is a mechanical system with n
generalized configuration variables 1 2( , , , )i i inq q q suffered p nonholonomic constraints [19].
The kinematics and dynamics of the ith NMR are written as
( ) ( )
( ) ( , ) ( ) ( )
i i i i
i i i i i i i i i di i i i
q S q v t
q v C q q v F B qM q η η
(2)
where 1 2[ , , , ]
n
i i i inq q q q are position vectors, and ( ) [ , ]
n p
i i iv t ω are velocity
vectors, where i and iω are translational and rotational velocities.
( )n n p
iS are full-rank
matrices, and i i iB S B with
( )n n p
iB are the input transformation matrices. i i i iM S M S
with
n n
iM are inertia matrices consisting of the total mass im and moment of inertia iI .
Centripetal and Coriolis matrices are defined as i i i i i i iC S M S S C S with
n n
iC , and
surface friction and gravitational vectors are defined as i i iF S F with
n
iF . Bounded
disturbances including unstructured unmodeled dynamics and external disturbances are denoted
by di i diη B η with
n p
diη , and control torque vectors are denoted by [ , ]
n p
i li riη η η ,
where liη and riη are left and right torques, respectively.
Properties 1: iM are asymmetric positive definite matrices. The parameters in (2) are bounded
Nguyen Tan Luy, Tran Ngoc Anh, Dang Quang Minh
144
(Boundedness [19, 24]), i.e., min maxii im mM , min maxii ic cC , maxdi idη η ,
min maxqii is sg ,
1 1
max minvi i ii im gB m B ,
1 1
max minvi iim mk , and max max maxi i i i i iF f q f s v
max max maxi i i i i iF f q f s v , with positive constant scalars minim , maxim , maxic , minic , max min,di iη s ,
maxis and maxif .
From (2), the nonlinear dynamics of NMR i with unknown internal dynamics vif and
disturbances diη can be written as
( ) ( )
( , ) ( , ) ( , )
qi i qi i i
vi i i vi i i i vi i i
i
di i
f q g q v
f q v g
q
v q v η k q v η
(3)
where 0qif , qi ig S ,
1
vi i i i if M C v F ,
1
ivi iMg B ,
1
vi ik M .
Remark 1: vif is Lipschitz on the compact set
n p
iΩ such that
1
min max max max max( , ( )) i i i i i i ivi i if q s μv m c f v v
for a positive constant scalar maxiμ [25]. diη has 2L -gain with
2
0
diη dt [26].
Definition 1: It is assumed that a leader robot (virtual robot) generates the bounded smooth
trajectory 0
nq that holds
0 0 0 0
0 0 0 0
( )
( , )v
q S q v
v f q v
(4)
where 0v is the desired velocity, and 0vf is an acceleration function which satisfies Lipschitz
assumption [27]. Then the cooperative tracking control problem of the multi-NMR system is to
design iη in (3) so that when 0diη , if each NMR directly connects to his leader,
0( ) ( ) 0iq t q t and 0( ) ( ) 0iv t v t , or directly connects to its neighborhoods, ij :
( ) ( ) 0i jq t q t , ( ) ( ) 0i jv t v t .
For the ith NMR, the local tracking error functions are defined as [7]
0( ) ( )
i
qi ij i j i ij
e a q q c q q (5)
0( ) ( ).i
i
v ij i j i ij
e a v v c v v (6)
Furthermore, to avoid collisions, (5) is written as
0( ) ( ).
i
qi ij i i j j i i ij N
e a q δ q δ c q δ q (7)
where
n
iδ are coordinates of the front points on NMRs i and j . Taking the derivative of
(5) or (7) and (6), with notation of (3) and (4), functions of tracking dynamics are rewritten as
0 ( )
i
qi i i i qi i ij qj jj
e c q b c g v a g v (8)
Intelligent distributed cooperative control for multiple nonholonomic mobile robots
145
( ) ( )( ) ( )
i
vi evi i i vi i vi di ij vj j vj djj
e f t b c g η k η a g η k η (9)
where
0( ) ( ) ( )
i
evi ij vi vj i vi vj
f t a f f c f f .
For m NMRs, the overall functions of tracking dynamics are given by
0 ( ) ( )q n n qe I q I g q v (10)
( ) ( ) ( , ) ( , )( )v ev n p v v de f t I g q v η k q v η (11)
where is Kronecker product operator, and 0 01 nq I q , with 1 1,...,1
m , and
,n n pI I are identify matrices, 1 ,..., mq q q , 1,...,q q qme e e , 1( ) ,...,q q qmg q diag g g ,
1 ,..., mv v v , 1,...,v v vme e e , 1( ) ,...,ev ev evmf t f f , 1( , ) , ,v v vmg q v diag g g ,
( , )vk q v 1, ,v vmdiag k k , 1 ,..., mη η η , 1,...,d d dmη η η . , where and
are sub-matrices of and , formed by removing the elements of the leader.
Note that, (10) and (11) denote kinematic and dynamic equations. In almost reported
studies, kinematic and dynamic controllers were designed successively based on these equations.
In this paper, the objective of our design is to gain integrated controllers without separating
kinematics and dynamics. To this end, the following transformations are performed.
Adding and subtracting ( )n q vI g q e to (10), yields
( )( ) ( )( ) ( )q n q a q ve I g q v v g q e (12)
where 1[ ,..., ]a a amv v v , holds
0( ) ( ) .( ) ( )n q a n n q vI g q v I q I g q e (13)
Adding and subtracting ( )n q qI g q e to (11), yields
( ) ( ) ( , )( ) ( , ) ( )( )v ev n p v a v d n q qe f t I g q v η η k q v η I g q e (14)
where 1[ ,..., ]a a amη η η , holds
( ) ( , ) ( ) .n p v a n q qI g q v η I g q e (15)
Next, the pseudo-control inputs of (10) and the real-control inputs of (11) are defined as
*
av v v (16)
*
aη η η (17)
where the bounded 2L -gain optimal control inputs
* * *
1 ,..., mv v v and
* * *
1 ,..., mη η η
will be designed in the next section.
Nguyen Tan Luy, Tran Ngoc Anh, Dang Quang Minh
146
Now, we introduce Lemma 1 to convert the cooperative tracking control problem of the
multi-NMR system into the stabilization of an integrated cooperative dynamical system.
Lemma 1: Let the integrated cooperative control inputs
*[ , ]r au v η u u hold (16) and
(17), where the inputs [ , ]a a au v η hold (13) and (15), and the bounded 2L -gain optimal
control inputs
* * *[ , ]u v η stabilize the following integrated cooperative dynamical system
*2( ) ( ) ( ) ( )( )e n pe f t I g x u k x d (18)
where [ , ]q ve e e , ,x q v , 1[0 , ]mn dd η , 1( ) 0 ,e mn evf t f denotes unknown
internal dynamics, ( )( ) 0 ,mn m n p vk x diag k and ( ) [ , ]q vg x diag g g are input matrices.
Then, the optimal cooperative tracking control scheme of the multi-NMR system with dynamics
(3) is equivalent to the bounded 2L -gain optimal control scheme of (18). In other words, if the
dynamical systems (3) are applied by the control law
*
r au u u , the tracking error dynamics
are transformed into the cooperative dynamical system (18).
Proof: To evaluate the stability of the system (3) when applying ru , we rewrite the tracking
error dynamic equations of (3) by substituting the control laws (16) and (17) into (12) and (14)
with noting (13) and (15):
*
*
( ) 0 0 00 0
( ) ( )
0 ( , )( ) 0 ( , )
( ) 0
( )
0 ( )
n qq n
n p vev dn p vv
n q v
qn p q
I g qe Iv
I k q vf t ηI g q ve η
I g q e
eI g q
(19)
Then, we choose the candidate Lyapunov function / 2J e e , and take derivative through (19):
*
*
( ) 0 0 00 0
( ) ( )
0 ( , )( ) 0 ( , )
( ) 0
( ) .
0 ( )
n q n
n p vev dn p v
n q v
q v
qn p q
I g q Iv
J e
I k q vf t ηI g q v η
I g q e
e e
eI g q
(20)
One may easily recognize that the last term in the right-hand side of equation (20) is equal to
zero. Thus, Eq. (21) can be rewritten in the reduced form as
*2( ) ( ) ( ) ( )( ( )).e n pJ e f t I g x u k x d (21)
On the other hand, if we also choose the Lyapunov candidate J for closed-loop dynamical
system (18), after taking derivative, we obtain the result as (21). One can conclude that the
existence of a bounded 2L -gain optimal tracking controller to make (21) is negative to stabilize
the dynamical system (18) is sufficient to make dynamical systems (3) stable.
Intelligent distributed cooperative control for multiple nonholonomic mobile robots
147
Remark 2: Because system internal dynamics ( )ef t and external disturbance d in (18) are
derived from (3), which obviously hold the facts in properties 1 and remark 1, knowledge of
( )ef t is completely unknown and disturbance 2[0, )d L .
3. DESIGN OF OPTIMAL COOPERATIVE TRACKING CONTROL SCHEME WITH
DISTURBANCE REJECTION
In this section, motivated by the design of the optimal cooperative tracking control scheme
with disturbance rejection but only applied to multiple linear agents [14], we propose a novel
control scheme to apply to multiple nonlinear agents, of which the multi-NMR system in the
paper is one.
3.1. Bounded 2L -Gain Problem for Multi-NMR Systems
Consider nonlinear cooperative dynamics of each NMR, derived from (18), with the
measured outputs
q
iy
( ) ( ) ( ) ( ) ( ) ( )
( )
( ) ( )
i
i ei i i i i i i i i ij j j j j j jj
i i i
e f t b c g x u k x d a g x u k x d
y h e
(22)
where ( )i ih e are continuous smooth functions. Define the general disturbances i i iω d d
with ,i j id d j , and performance outputs ( ) ( ) ( )i i i iz e t u t u t , ,i j iu u j ,
satisfying the following inequality of the bounded 2L -gain for all NMRs when disturbances
0iω
2 2 2
0 0 0
2
0
( ) ( (0))
( (0))
( )
i
i
T T T
i ii i i ii i i i
T
i ii i j ij
j
jj
jj
i
ijz dt Q e u R u dt γ ω dt β e
γ d T d d T d d
u R u
t β e
(23)
for some bounded functions β such that (0) 0β [28], where 0 ( ) 0i ii ie Q e and
0 (0) 0i iie Q , 0iiR , 0ijR , 0iiT and 0ijT .
*γ γ is the prescribed disturbance
attenuation level, where
*γ is the minimum gain of γ for which the bounded 2L -gain condition
(23) is satisfied.
The objective in this section is to design an optimal cooperative tracking control scheme for
each NMR subject to unknown internal dynamics eif and external disturbances id and jd to
make all signals in (22) 2L -bounded.
Define the local infinite horizontal tracking performance function for each NMR
2 2
0
( (0), , , , ) ( ) .( )
i i
j ij j ji i i i i i ii i i ii i i ii i ij jj j
J e u u d d Q e u R u γ du R u dT d γ T dd t (24)
Nguyen Tan Luy, Tran Ngoc Anh, Dang Quang Minh
148
Two-player zero-sum differential game theory [29] can extensively be studied to find solutions
of the bounded 2L -gain problem for the system (22) subject to (24)
* (0) minmax (0), , , , ,( ) ( )
i i
i i i i i i i i
u d
V e J e u u d d (25)
i.e., the saddle point (
* *,i iu d ), where
*
iu and
*
id are the optimal control law and the worst
disturbance law, respectively, holds
* * * * *(0) minmax (0), , , , max min (0), , , , .( ) ( ) ( )
i ii i
i i i i i i i i i i i i i i
u dd u
V e J e u u d d J e u u d d (26)
*
iV is known as the Nash equilibrium value of the multi-player game holding the following
constraints for all laws iu and id .
* * * * * * * * * *, , , , , , , , , .( ) ( ) ( )i i i i i i i i i i i i i i iJ u u d d J u u d d J u u d d (27)
Followed by ADP principle, for state feedback control laws iu and id , we define a local
cooperative value function for the ith NMR as [14]
2 2( ) ( )( ) ( )
i i
i i ii i i ii i j ij j i ii i j ij jj jt
V e t Q e u R u u R u γ d T d γ d T d dt (28)
Using Leibnizs formula, a differential equivalent to (28) is given by the Hamiltonian
2 2
, , , , , ( ) ( ) ( ) ( ) ( )
( ) 0.
( ) ( ( ) )
i i
i
i
i i i i i i ei ei i i i i i i i i ij j j j j j j
i j
ii i i ii i i ii ij ij j j ij jj j
V
H e u u d d V f b c g x u k x d a g x
u R u d T
u k x d
e
Q e u R u γ d dT d γ
(29)
Using the stationary condition for (29), one obtains
1
1
0 ( ) ( ) ( )
2
i i
i i i i ii i i
i i
H V
u e b c R g x
u e
(30)
1
2
1
0 ( ) ( ) ( ) ,
2
i i
i i i i ii i i
i i
H V
d e b c T k x
d eγ
(31)
with boundary condition (0) 0iV . The following coupled HJI equations for the cooperative
tracking problem are obtained by substituting (30) and (31) into (29):
2 1 2 1
1 2 1 1 2
2 2
1 1
( ) ( ) ( ) ( ) ( ) ( )
4 4
1 1
( ) ( ) ( ) ( ) ( )
4 4
(
i
i
jci i i
ii i ei i i i i ii i i j j j j jj ijj
i i i j
j j j i
jj j j j j j j jj ij jj j j i ij
j j j i
i i
VV V V
Q e f b c g x R g x b c g x R R
e e e e
V V V V
R g x b c k x T T T k x b c
e e e eγ γ
k x 1) ( ) 0iii i i
i
V
T k x
e
(32)
where the closed–loop system corresponding to the ith NMR is defined as
Intelligent distributed cooperative control for multiple nonholonomic mobile robots
149
2 1 1 2
2
1 1
2
1 1
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
2 2
1
( ) ( ) ( ) ( ) ( ) .
2
1
2 i
i
jc i
ei ei i i i i ii i i j j j j jj j j i i
i j
ji
i i ii i
i
i j j j j jj j j
i
jj
ijj
j
VV
f f t b c g x R g x b c g x R g x b c
e e γ
VV
k x
a
T k x b c k x T k x
e γ
a
e
(33)
Let all optimal control laws and worst disturbance laws be functions of given solution iV ,
i.e.,
* ( )i i iu u V ,
* ( )i i iu u V ,
* ( )i i id d V and
* ( )i i id d V , then the HJIs (32) become
* * * *, , , , , 0, (0) 0.( )ii i i i i i i
i
V
H e u u d d V
e
(34)
Lemma 2: Assume that ( )i iV e , 1,...,i m is smooth, ( ) 0i iV e , and ( )i iV e is a the solution to
the coupled HJI equation (32). Let optimal control laws of neighborhoods be given. Then, for
every
*
iu and
*
id , the following condition holds
* * * * * * 2 * *, , , , , ( ) ( ) ( ) ( ).( )ii i i i i i i i ii i i i i ii i i
i
V
H e u u d d u u R u u γ d d T d d
e
(35)
Proof: Complete the squares in (32) to obtain (35).
Lemma 3: Choose
*γ γ . Assume that
*
iV , 1,...,i m is smooth,
* 0iV , and
*
iV is a the
solution to the coupled HJI equation (32). Let optimal control laws of neighborhoods be given.
Then the equilibrium point of the closed-loop system
* *( ) ( ) ( ) ( )
i
i ei i i i i i ij j j jj
e f t b c g x u a g x u (36)
is asymptotically stable with control inputs
* *( )i i iu u V given by (30) in terms of
*
iV . In
addition, in the presence of disturbances
*,i id u makes the bounded 2L -gain condition (23)
satisfied, where ii i i iQ h Q h with iQ is a positive-definite matrix.
Proof: The proof is similar to the proof of Theorem 1 in [14] and is omitted here.
Lemma 4 (Solution to Multi–player Zero-sum Game [14]): Choose
*γ γ . Assume that the
value of the game (33) is finite and optimal control laws of neighborhoods are available. Let
*,iV i m be smooth,
* 0iV , and be the solution to the coupled HJI equation (32), such that the
closed-loop system
* * * *( ) ( ) ( ) ( ) ( ) ( ) ( )
i i
i ei i i i i i ij j j j i i i i i ij j j jj j
e f t b c g x u a g x u b c k x d a k x d (37)
is asymptotically stable around its equilibrium point. Then, the Nash condition (27) is satisfied
for control and disturbance laws,
* *( )i i iu u V and
* *( )i i id d V , given by (30) and (31) in terms
Nguyen Tan Luy, Tran Ngoc Anh, Dang Quang Minh
150
of
*
iV . Further, there exists the value of the game, i.e., the solution
*( (0))i iV e , i m of the HJI
equation (32).
Proof: The proof is similar to the proof of Theorem 2 in [14] and is omitted here.
It is shown in [29] that the nonlinear bounded 2L -gain optimal control problem relies on
values of multi-player zero-sum games; in particular, as shown in Lemma 4, the values are the
solutions of the coupled HJI equations (32). However, the HJI equations are impossible to solve
analytically.
3.2. Optimal Cooperative Tracking Control Scheme with Disturbance Rejection and
Algorithms in Real Time
In this section, we design a control scheme and online algorithms to solve the HJI equations
(32) based on reinforcement learning techniques of [12, 14]. However, in contrast to most
existing schemes using ADP for learning the solutions of the HJI equations, the scheme in this
paper uses only one critic NN and does not need knowledge of system internal dynamics.
Moreover, the online algorithms synchronously update parameters in one iterative loop.
According to the Weierstrass higher order approximation Theorem [30], there exists an NN
such that the smooth value function iV , i m is approximated by
*( ) ( ) ( )Ti i i i i i iV e W e e (38)
where
(2 )( ) : n pi ie is the activation function vector of neurons in the hidden layer,
( )i ie the NN approximation error, and iW R the ideal weight vector.
Properties 2 (Approximation [31]):
( )i ie , 1,...,i m can be selected as a complete
independent basis set so that when , ( ) 0i ie and ( ) ( ) / 0ei i i i ie e e , and
for fixed , max( )i i ie , max( )ei i eie , where maxi and maxei are positive constants.
Substituting (30), (31) and (38) into (32), the NN-based coupled HJI equations are obtained
2
2
1 1
( ) ( ) ( ) ( )
4 2
1
( ) 0
4
( ) ( )
( )
i
ii
ii i i ei ei i i i ei i i ei i ij j j i ei j j ej jj
j j j ej i i ej j Hj
Q e W f t b c W g k W a b c W g k W
b c W g k W
(39)
where { , }l i j ,
1
l l ll lg g R g , 2
1
l l ll lk k T k
, ( ) /el l l le e , /el l le ,
1 1
i j jj ij jj jg g R R R g ,
1 1
2
1
i j jj ij jj jk k T T T k . The residual errors iH
e , caused by the function approximation errors, are
computed as
Intelligent distributed cooperative control for multiple nonholonomic mobile robots
151
* * 2 * *
1 * 1 *
2
1
( ) ( )( ) ( ) 2
4
1
( ) ( )
2
1
( ) .
4
( ) ( ) ( )
( ) ( )
( )
i
i
i i
i
H ei ei i i i i i i i i ei i i ei ei ij j j j jj
ij j j ei j j ej j j ej j jj ij j j jj ij jj j
j j ej i i ejj
ε ε f t b c g u k d b c ε g k ε ε a g u k d
a b c ε g k ε b c ε g R R u k T T d
b c ε g k ε
(40)
Remark 3: According to Properties 1, { , , }l i j i , the functions lg and lk are positive
definite and bounded, e.g., min max0 i i ig g g , where
2
min min max ( )i i iig g R and
2
max max min( )i i iig g R with min and max are the functions of largest and smallest eigenvalues,
respectively. Then, using Property 2, Hi is bounded on a compact set. That is
max max max0, ( ) : sup iHi Hi e Hi HiN . Moreover, if , Hie converges
uniformly to zero [30].
The ideal weight vectors iW (38) are unknown, thus ( )i iV e are approximated by ˆiW :
ˆ ˆ( ) ( ).i i i i iV e W e (41)
Then, the estimated control and disturbance laws become
11 ˆˆ ( ) ( ) ( ) ,
2
i i i i ii i i ei iu e b c R g x W (42)
1
2
1ˆ ˆ( ) ( ) ( ) .
2
i i i i ii i i ei id e b c T k x W (43)
The approximate Hamiltonian are obtained by substituting (43), (42) and (41) into (29):
2
2
1ˆ ˆˆ ˆ ˆ ˆ ˆˆ ˆ, , , , , ( ) ( ) ( )
4
1 1ˆ ˆ ˆ ˆ( ) ( ) .
2 4
( ) ( )
( ) ( )
i i
i i i i i i i ii i i ei ei i i i ei i i ei i
ij j j i ei j j ej j j j j ej i i ej jj j
H e W u u d d Q e W f t b c W g k W
a b c W g k W b c W g k W
(44)
It is desired that besides turning ˆiW to minimize residual error functions of (44) such that
ˆ
i iW W , assumption of identifying knowledge of internal dynamics should be removed. The
residual error functions are chosen as the square integral functions
1
2 i i
i H HE e e , where
ˆ ˆˆ ˆ ˆ ˆ, , , , ,( )
i
t
H i i i i i i i
t T
e H e W u u d d dη (45)
where T > 0 is a chosen time interval. Based on the normalized gradient descent scheme that is
modified from Levenberg-Marquardt algorithm, we propose the following weight-tuning laws
Nguyen Tan Luy, Tran Ngoc Anh, Dang Quang Minh
152
( ) ( )2
2
( 1)
ˆ
( 1)
i i
i it it i t-T i t-T
i i
i
i i
i i
i i
α ζ
Ψ if e e e e
ζ ζ
W
α ζ
Ψ Ξ otherwise
ζ ζ
(46)
where ( )it ie e t , ( ) ( )i t T ie e t T , and i , i and i are defined in (47), (48) and (49),
respectively, with positive constants i and i .
21 1ˆ ˆ( ) ( ) ( )( ) ( )
2 2
( ) ( ) ( ) .
( )
( ) ( ) ( )
i
t t
i ei ei i i i i ei i ij j j j j ej j ei ijt T t T
i i i i i i
f b c g k W a b c g k W d e s d
e t e t T e t
(47)
2 2( )
1 1ˆ ˆ ˆ ˆ ˆ( ) ( ) ( ) .
4 4
( )
i
t
i i i ii i i i ei i i ei i j ej i i ej j
t T
j jj
W Q b c W g k W W g kc db W (48)
( ) ( ) ( ) ( ) .
2
( )
i
t
i
i i i ei i i i ij j j ej j j jjt T
b c g k e a b c g g e d (49)
Later, Theorem 1 shows that if the tuning laws (46) are used in online algorithms to learn
the solutions of (39), the approximated NN weights will converge and be stable.
Remark 4: Knowledge of system internal dynamics ( )eif t is relaxed for the weight-tuning laws
(46).
Remark 5: If ( ) ( ) 0i je t e t , the values in the right side of the weight-tuning laws (46) are
zeros. Thus, ˆ
iW are not tuned any more. To guarantee that
ˆ
iW converge to the true values, we
apply the Persistence of Excitation (PE) condition [32] though the following lemma.
Lemma 5: Let iu and id , i m be any given bounded stable laws so that the value function
(28) can be written as
2 2( ) ( ) ( ) .( ) ( ) ( )
ii
j i
t
i i ii i i ii i i ii i j ij j i ijt
j jjT
V e t T Q e u R u γ d T d γ d T d dtu R u V e t (50)
Using NN (36) for (50),
2 2( )( )
ii i
t
T
ii i i ii i j ij j i ii i j ij j i i i Bj jt T
Q e u R u u R u d T d d T d dt W e e (51)
where
iB
e is the reconstruction error. If PE condition be satisfied in the interval
[ , ]pt T t , 0pT
1 2( ) ( )
p
t
T
i i i i
t T
β I ζ η ζ η dη β I (52)
where 1i , 2i are positive constants, / ( 1)
T
i i i i and I is the identity matrix with the
appropriate dimension. Then,
Intelligent distributed cooperative control for multiple nonholonomic mobile robots
153
For 0
iB
e (no reconstruction error), the NN weight approximation error converges to
zero exponentially fast;
For
maxiB i
e e , the NN weight approximation error converges exponentially fast to a
residual set.
Proof: From (42) and (43), Eq. (47) can be written as
21 1ˆ ˆ ˆ ˆ( ) ( ) ( )( ) .
2 2
( )
i
p
t
T
i i i i ei ei i i i i ei i ij j j j j ej jj
t T
e W W f b c g k W a b c g k W d (53)
With noting that ˆi i iW W W ˆ( )i iW W and pT T , substituting (53) into (46), function
approximation error dynamics is obtained as
2
2
2
1 1ˆ ˆ ˆ ˆ( ) ( ) ( ) ( )
( 1) 4 2
1ˆ ˆ ˆ( ) ( ) ( ) .
4
(
)
i
p
i
t
i i
i ii i i ei ei i i i ei i i ei i ij j j i eiT j
i i t T
j j ej j j j j ej i i ej jj
W Q e W f b c W g k W a b c W
g k W b c W g k W d
(54)
Note that, from (44),
iH
e in (45, i m can be written as
2
2
1 1ˆ ˆ ˆ ˆ( ) ( ) ( ) ( ) ( )
4 2
1ˆ ˆ ˆ( ) ( ) ( ) .
4
(
)
i ip
i
t
H ii i i ei e i i i ei i i ei i ij j j ijt T
ei j j ej j j j j ej i i ej jj
e Q e W f t b c W g k W a b c W
g k W b c W g k W
(55)
Using (53) and (55), one obtains
2 2
0
ˆ( ) .( )
i
i i
t T
ii i i ii i j ij j i ii i j ij j i i i Hj j
Q e u R u u R u γ d T d γ d T d dt W Δ e e (56)
Notice that, using pT for (51), and then subtracting to (56),
.
i iH i i i B
e W e e (57)
On the other hand, comparing (55) with (54), we obtain:
2
.
1
i
i
i i H
i i
ζ
W α e
ζ ζ
(58)
Inserting (57) into (58), note that i i (47), iW becomes
i
T i
i i i i i i B
i
ζ
W α ζ ζ W α e
m
(59)
where 1
T
i i im . This approximation error is the same as the approximation error in [33],
and the reminder of the proof is followed by the proof of Theorem 1 in [33].
Online Algorithms: Based on (46), we design online algorithms for optimal cooperative
tracking control (OCTC) (as shown in Algorithm 1), where all parameters of control and
Nguyen Tan Luy, Tran Ngoc Anh, Dang Quang Minh
154
disturbance laws are updated simultaneously in one iterative loop, in contrast to policy iteration
algorithms using three NNs for each agent in [14].
Algorithm 1. Online OCTC
Step 1: 1,...,i m , ij select iiQ , iiR , iiT , ijR , ijT , , the activate functions vector i ,
i , i , choose the sampling interval T , initialize stable values of
(0)ˆ
iW and
(0)ˆ
jW . Compute
(0)ˆ
iV (41),
(0)ˆ
iu (42) and
(0)ˆ
id (43), choose probing noise i for the PE condition (52). Assign
0l , stopl (time to stop the algorithm), (the small positive real number) for the convergence
criteria.
Step 2: 1,...,i m add the probing noise to
( )ˆ l
iu and
( )ˆ l
id to excite the system:
( ) ( )ˆ ˆl l
i i iu u and
( ) ( )ˆ ˆl l
i i id d ; observe ( ), ( )i ie t e t T for i (47); update
( 1)ˆ l
iW (46);
compute
( 1)ˆ l
iV (41) update both
( 1)ˆ l
iu (42) and
( 1)ˆ l
id (43), simultaneously.
Step 3: 1,...,i m if
( 1) ( )ˆ ˆl l
i iW W then assign 0i . If stopl l and 0i then stop
the algorithm, else 1l l , go back to Step 2.
3.3. Stability and Convergence Analysis
Stability and convergence of the closed–loop systems (22) when the ith NMW performs the
online OCTC with the NN weight tuning law (46), control law (42) and disturbance law (43), are
stated and proven by the following theorem.
Theorem 1: Let the cooperative dynamics of the multi–NMR systems be defined in (18),
which gives the ith dynamics in (22), for all i, i = 1, ..., m. Let the cooperative value function of
NMR i be chosen as (28) and the coupled HJI equations as (32). Let the NN weight–tuning law
be defined in (46), the control law in (42) and the worst case disturbance law in (43). Let i be
the PE condition (52). Assume that NMR i performs the online OCTC and that the control law
and disturbance law of the neighborhoods in the previous steps were updated and stable with
*
. Then, the online OCTC guarantees that
(Stabilization) The cooperative tracking errors ie of the closed–loop systems and the
NN approximation errors iW are uniformly ultimately bounded (UUB).
(Convergence) After a limited number of iterative steps, the value function, the control
law as well as the worst disturbance law are synchronously converged to the approximately
optimal values, i.e.,
* ˆ
ii i v
V V e , * ˆ
ii i u
u u e and
* ˆ
ii i d
d d e for small positive
constants
iv
e ,
iu
e and
id
e .
4. HARDWARE TESTBED AND RESULTS
4.1. Hardware Testbed with Omnidirectional Vision
Intelligent distributed cooperative control for multiple nonholonomic mobile robots
155
Using the graph theory in Section II, the communication of the multi-NMR system is
chosen (Fig. 2), where the virtual leader is indexed by 0. The information exchange between
NMR i and its neighborhoods, including positions [ , , ]i i i iq x y , velocities [ , ]i i iv
and torques [ , ]i il ir , is represented by arrows. It is desired that information of the virtual
leader is only available to NMR 1 .
To verify the effectiveness of the proposed algorithm for practical applications we
developed the hardware testbed thatconsists of three experimental NMRs as shown in Fig. 1.
The geometric parameters of the NMRs are
1 0.05r m, 1 0.5b m, 1 0l , 2 3 0.025r r m,
2 3 0.2b b m, and 2 3 0l l . The total mass parameters of the NWRs are 1 5m kg and
2 3 0.5m m kg. Then, using these values, the parameters of iM and iB in (2) are obtained.
Figure 2. Communication graph of multi– NMR system. Figure 1. Experimental NMRs; (a): Rear view,
(b): Front view.
The hardware diagram of NMR 1 , shown in Fig. 3, consists of three main parts: the
mechanical part (a mechanical frame, DC motors with stall torques of 0.73 Nm, and digital
quadrature encoders with 400 divisions), a control board (a PIC micro-controller, a power
circuit, and a XBee wireless module), and an embedded computer with an Intel Atom
D510@1.66 GHz CPU executing the online OCTC. An omnidirectional vision system is
constructed to identify feedback states of positions and linear velocities for all NMRs.
Each neighbor of NMR 1 has only one frame and a control board. Communication with
others is achieved using XBee radio transceivers. At each sample point, the neighbors send their
encoder pulses to and receive torques from NMR 1 . In the case of a dropped packet, the previous
data packet is used.
In the embedded computer, software based on the VC++ programming language running
on the Windows platform is programmed to implement image processing via the OpenCV
software to identify the positions and linear velocities of all NMRs, read the encoder pulses to
compute the rotation velocities, process communication, execute the online algorithm, and send
the torque signals to the micro-controllers for controlling the DC motors through the pulse width
modulation (PWM) technique with a frequency of 20 kHz. The upper bounds of the torques are
selected as 0.2i Nm. In addition, the software generates reference trajectories of positions
and the velocities. The embedded computer and the micro-controller communicate to each other
via the RS232 protocol. Users can remotely start or stop the testbed through interfacing tools on
the remote computer connected to the embedded computer through the wireless network. The
data generated by the NMRs during movement are stored and plotted in Matlab.
Nguyen Tan Luy, Tran Ngoc Anh, Dang Quang Minh
156
Figure 3. Structural diagram of NMR 1. Figure 4. Local and global Cartesian coordinate systems.
Only NMR 1 is equipped with the omnidirectional vision system (OVS), which is shown in
Fig. 3. OVS consists of a camera and a hyperbolic curve mirror with the optical center C and a
bend radius R . The mirror is placed on a glass tube with a diameter 0.2D m. The camera,
with a resolution of 1280 720 pixels and a frame rate of 30 fps, is fixed to the top of the robot
platform at the geometrical center. The vertical center axis of the glass tube goes through the
focus point of the camera and the mirror center. The distance between the origin O and the
bottom point of the mirror
1C , which is measured along the vertical axis, is 0.5H m. The OVS
can recognize the colored landmarks in any direction; therefore, any mechanism to adjust the
rotation of the camera around the pan, tilt and yaw axes is not needed.
Consider the Cartesian coordinate system OXY fixed at the surface of NMR 1, where the
origin O coincides with the geometrical center and OX aligns with the vertical axis of symmetry.
Via OVS and image processing, it is not difficult to measure the center coordinate of any
landmark placed around the robot in the image space OXY in units of pixels. However, when the
robot moves, to identify every coordinate in the world space OXY in units of length, we need an
identity operation [25]. Here, we use an NN with radial basis functions (RBFs). The RBFs are
trained offline using samples, which are the center coordinate of the landmark measured in the
image space (pixels) as inputs and in the world space (meters) as desired outputs. Using the
approximation ability of the RBF, the centers of any landmarks surrounding the robot can be
recognized and transformed into the world coordinates.
To determine the center position coordinates of NMR 1 in the horizontal plane, we choose
the Cartesian coordinate system Oxy based on two differently colored landmarks (as shown in
Fig. 4). Without loss of generality, the origin O coincides with the center of the landmark (blue),
and the axis Ox goes through the center of the other landmark (red). Suppose that the center
coordinates of the two landmarks in OXY and the axis OXY are 1 1( , )X Y and 2 2( , )X Y ; they can
be transformed to Oxy to obtain the robot center position vector 1 1 1[ , , ]x y [25]:
Intelligent distributed cooperative control for multiple nonholonomic mobile robots
157
2 1
1
2 2
2 1 2 1
1 2 1 1 2 1
1
2 2
2 1 2 1
1 2 1 1 2 1
1
2 2
2 1 2 1
argcos
( ) ( )
( ) ( )
.
X X
θ
X X Y Y
Y Y Y X X X
x
X X Y Y
Y X X X Y Y
y
X X Y Y
(60)
In Figure 4, we can determine the coordinates of the neighborhoods, e.g., NMR 2, in Oxy.
On the surface of NMR 2, we place two color objects. The center of the green object coincides
with the center of the robot, and the center of the yellow object is located on the longitudinal
axis. The coordinates of the robot in Oxy can be identified by the following formulas:
2 1 12 12 1
2 1 12 12 1
2 1 12
( )
( )
.
x x l cos ψ θ
y y l sin ψ θ
θ θ λ
(60)
where 1 2 1 2
12 2 2( ) ( )l X Y , 12 12 12 ,
1 2
2 2
12 1 2 2 1 2 2
2 2 2 2
argcos
( ) ( )
X X
X X Y Y
,
1
12 2 12argco /s X l .
4.2. Simulation and Experimental Results
In this subsection, first, to evaluate the effectiveness of the proposed method, simulations
of the OCTC algorithm with one NN and the ACD algorithm [14], extended for the multi–NMR
system, with three NNs, are performed and compared. Then, based on the simulation results, the
experiment is implemented. The parameter values of the NMRs in the simulation are assigned to
be exactly as those in the experiment so that the converged NN weights after the simulation can
be used to initialize the NN weights in the experiment to speed up the learning process in
practice.
The velocity vector that the virtual leader uses to generate the smooth reference postures is
chosen as
2 2 1 1 1 2 2 2 2 2 1 1
0 0 0 1 1 2 2 2 2
1 1 2 2
sin( ) cos( ) sin( ) cos( )
, ( cos( )) ( cos( )) ,
( cos( )) ( cos( ))
T
T Aω ω t A ω t A ω ω t A ω t
v ω A ω t A ω t
A ω t A ω t
(62)
where
1 0.04 rad/s, 2 0.02 rad/s, 1 0.022A m/s, 2 0.02A m/s. The distance vectors (3)
between NMR i and its neighbors j ,
i j
, , 1,2,3i j , are introduced as
1 2 [0.5,0,0] , 3 2 [1.0,0,0] .
For both algorithms, the NN weights, with 15 elements for each NMR, are defined as
1 2 15
ˆ ˆ ˆ ˆ[ , , , ]i i i iW W W W , of which the initial values are zeros in OCTC but are properly chosen
for three NNs in ACD. Note that the total number of NN weights of OCTC is 45, but that of
Nguyen Tan Luy, Tran Ngoc Anh, Dang Quang Minh
158
ADC is 135. The adaptive gains are selected as 25i and 0.01i j . The activation
functions ( )i ie are chosen as
2( ) [ ,i i xie e , ,xi yi xi i xi vie e e e e e
2, , , ,xi i yi yi ie e e e e yi vie e , ,yi ie e
2 2 2, , , , ,
T
i i vi i i vi vi i ie e e e e e e e e .Select 1( )
T
ii i i i iQ e e Q e , 1iQ
5 5I , 1ii ij ii ijR R T T ,
and 1 . The PE condition is guaranteed by adding the probing noise
0.008rand( ) ti t e to
control and disturbance inputs, where rand( )t is the function that generates random signals in
the range [ 1,1] . The initial position of the leader is 0 [ 0.4, 0.6,0.6]q . The initial positions
and velocities of the NMRs are 1 [0,0,0]q , 1 [0,0]v , 2 [0, 0.5,0]q , 2 [0,0]v ,
3 [ 0.5, 0.5,0]q , 3 [0,0]v .
The evolutions of the cooperative position tracking errors for the NMRs under both OCTC
and ACD are shown in Fig. 5. After the parameters converge, all errors in both algorithms are
approximately zero. In the early periods, however, the errors under OCTC decrease faster than
do those under ACD. The cooperative trajectories x , y and in both algorithms are shown in
Figs. 6, 7 and 8, respectively. Consequently, NMR 1 tracks the leader while keeping the
formation with its neighborhood such that the tracking errors are as small as possible, i.e.,
1 1 1 0 0 0[ , , ] [ , , ]x y x y , 3 3 3 1 1 1[ , , ] [ , , ] [ 0.5,0,0]x y x y . NMR 3 keeps the
formation with NMRs 1 and 2, i.e.,
2 2 2[ , , ]x y 3 3 3[ , , ] [1,0,0]x y . Similarly, the results of
NMRs 2 and 1 can be easily deduced. By observing these figures, again, it is found that the
control performances under OCTC are better than those under ACD.
The cooperative performance of the linear and rotational velocities among the NMRs using
both algorithms is shown in Figs. 9 and 10, respectively. It can be seen that, after the parameters
converge, the performances approach approximately optimal values, i.e.,
2,3 2,3 1 1 0 0[ , ] [ , ] [ , ] , in finite time. Again, the cooperative velocity
performances of the NMRs under OCTC continue to dominate those under ACD, especially at
the peaks of the linear velocity curves.
Now, the proposed control scheme is applied to the testbed. It is important to notice that the
converged NN weights in Algorithm OCTC after the simulation are used to initialize the NNs in
the experiment. The other learning parameters are chosen as those in the simulation.
Figure 5. Evolution of the formation errors of
positions for NMRs 1, 2, 3 by Algorithms OCTC
with one NN and ACD with three NNs.
Figure 6. Evolution of the cooperative positions
for by Algorithms OCTC with one NN
and ACD with three NNs.
Intelligent distributed cooperative control for multiple nonholonomic mobile robots
159
Figure 7. Evolution of the cooperative positions for
1 2 3, ,y y y by Algorithms OCTC with one NN and
ACD with three NNs.
Figure 8. Evolution of the cooperative positions for
1 2 3, ,θ θ θ by Algorithms OCTC with one NN and
ACD with three NNs.
Figure 9. Evolution of the cooperative linear
velocities for 1 2 3, ,v v v .
Figure 10. Evolution of the cooperative linear
velocities for 1 2 3, ,ω ω ω .
Figure 11. Experimental
formation for 1 2 3, ,x x x .
Figure 12. Experimental
formation for 1 2 3, ,y y y .
Figure 13. Experimental
formation for 1 2 3, ,θ θ θ .
Figures 11, 12 and 13 show the positions , ,i i ix y , for all 1,2,3i . For the final stages of
the experiment, Fig. 14 shows that NMR 1 tracks the desired virtual trajectory while maintaining
the formations with the neighborhoods. Figs. 15(a) and 15(b) show the linear velocities i and
the rotational velocities i , respectively. It is observed that the experimental results are
consistent with the simulation.
Figure 14. Experimental formation
for .
Figure 15. Experimental velocities: (a) linear velocities
(b) rotational velocities .
Nguyen Tan Luy, Tran Ngoc Anh, Dang Quang Minh
160
5. CONCLUSION
This paper provides a new distributed optimal cooperative tracking control method with
disturbance rejection for multi-mobile robots. The method removed the phenomenon of
separating the main controller and transformed a structure ACD with three NNs into a structure
with only one NN for which the novel NN weight-tuning laws were designed. The online
optimal cooperative control algorithms of the scheme, in which the knowledge of internal
dynamics was relaxed, were proposed to approximate the Nash equilibrium solutions of the HJI
equations. The algorithms guaranteed that the value functions and the control and disturbance
laws simultaneously converged to the optimal values and that the cooperative tracking errors of
the closed-loop systems and approximation errors of NN weights were uniformly ultimately
bounded. The compared simulations were carried out, and the experimental results on the testbed
equipped with an omnidirectional vision system were consistent with the simulation. Based on
the experimental results, it can be inferred that our method is effective for a certain practical
aspect of control systems technology including multiple nonholonomic mobile mechanic agents
or autonomous vehicles, which track both positions and velocities.
REFERENCES
1. Sun D., Wang C., Shang W., and Feng G. - A synchronization approach to trajectory
tracking of multiple mobile robots while maintaining time varying formations, IEEE
Trans. Robot 25 (5) (2009) 1074–1086.
2. Loria A., Dasdemir J., and Jarquin N. A. - Leader–follower formation and tracking
control of mobile robots along straight paths, IEEE Trans. Contr. Syst. Technol. 24 (2)
(2016) 727–732.
3. Dong W. - Tracking control of multiple-wheeled mobile robots with limited information
of a desired trajectory, IEEE Trans. Robot 28 (1) (2012) 262–268.
4. Gu D. and Wang Z. - Leader–follower flocking: Algorithms and experiments,” IEEE
Trans. Contr. Syst. Tech. 17 (5) ( 2009) 1211–1219.
5. Wang Z. and Gu D. - Cooperative target tracking control of multiple robots, IEEE Trans.
Ind. Electron. 59 (8) (2012) 3232–3240.
6. Yu X. and Liu L. - Distributed formation control of nonholonomic vehicles subject to
velocity constraints, IEEE Trans. Ind. Electron. 36 (2) (2016) 1289–1298.
7. Khoo S., Xie L., and Man Z. - Robustfinite-time consensus tracking algorithm for
multirobot systems, IEEE/ASME Trans. Mechatr. 14 (2) (2019) 219–228.
8. Wang W., Huang J., Wen C., and Fan H. - Distributed adaptive control for consensus
tracking with application to formation control of nonholonomic mobile robots,”
Automatica 50 (4) (2014) 1254-1263.
9. Peng Z., Yang S., Wen G., Rahmani A., and Yu Y. - Adaptive distributed formation
control for multiple nonholonomic wheeled mobile robots, Neurocomputing 173 (3)
(2016) 1485–1494
10. Dierks T. and Jagannathan S. - Neural network output feedback control of robot
formations,” IEEE Trans, Syst., Man, and Cybern., B Cybern. 40 (2) (2010) 383–399.
11. Movric K. H. and Lewis F. L. - Cooperative optimal control for multiagent systems on
directed graph topologies, IEEE Trans. Autom. Contr. 29 (3) (2014) 769–774.
Intelligent distributed cooperative control for multiple nonholonomic mobile robots
161
12. Vamvoudakis K. G., Lewis F. L., and Hudas G. R. - Multi-agent differential graphical
games: Online adaptive learning solution for synchronization with optimality, Automatica
48 (2012) 1598–1611.
13. H. Zhang, F. L. Lewis, and A. Das - Optimal design for synchronization of cooperative
systems: State feedback, observer and output feedback, IEEE Trans. Autom. Contr. 56 (8)
(2011) 1948–1952.
14. Jiao Q., Modares H., Xu S., Lewis F., and Vamvoudakis K. G. - Multiagent zero-sum
differential graphical games for disturbance rejection in distributed control, Automatica,
69 (2016) pp. 24–34.
15. Tatari F., Naghibi-Sistani M.-B., and Vamvoudakis K. G. - Distributed learning algorithm
for nonlinear differential graphical games, Transactions of the Institute of Measurement
and Control, first published (2015), doi: 10.1177/0142331215603791.
16. Cao W., Zhanga J., and Ren W. - Leader–follower consensus of linear multi-agent
systems with unknown external disturbances, Systems & Control Letters 82 (2015) 64–70.
17. Wang J. and Xin M. - Distributed optimal cooperative tracking control of multiple
autonomous robots, Robotics and Autonomous Systems 60 (4) (2012) 572 – 583.
18. Dierks T., Brenner B., and Jagannathan S. - Neural network-based optimal control of
mobile robot formations with reduced information exchange, IEEE Trans. Contr. Syst.
Technol. 21 (4) (2013) 1407–1415.
19. Fierro R. and Lewis F. L. - Control of a nonholonomic mobile robot using neural
networks, IEEE Trans. Neur. Netw. 9 (4) (1998) 589–600.
20. Vamvoudakis K. G. and Lewis F. L. - Multi-player non-zero-sum games: online adaptive
learning solution of coupled Hamilton-Jacobi equations, Automatica. 47 (8) (2011) 1556 –
1569.
21. Huai-Ning W. and Biao L. - Neural network based online simultaneous policy update
algorithm for solving the HJI equation in nonlinear H control, IEEE Trans. Neur. Netw.
and Learn. Syst. 23 (12) (2012) 1884 –1895.
22. Zargarzadeh H., Dierks T., and Jagannathan S. - Optimal control of nonlinear continuous-
time systems in strict-feedback form, IEEE Trans. Neur. Netw. Learn. Syst. 26 (10)
(2015) 2535–2549.
23. Luy N. T. - Adaptive dynamic programming-based design of integrated neural network
structure for cooperative control of multiple MIMO nonlinear systems, Neurocomputing
(2016),
24. Khoshnam S., Alireza M. S., and Ahmadrez T. - Adaptive feedback linearizing control of
nonholonomic wheeled mobile robots in presence of parametric and nonparametric
uncertainties, Robotics and Computer Integrated Manufacturing 27 (1) (2011) 194–204.
25. Luy N. T. - Robust adaptive dynamic programming based online tracking control
algorithm for real wheeled mobile robot with omnidirectional vision system, Transactions
of the Institute of Measurement and Control (2016), doi: 10.1177/0142331215620267.
26. Chang Y. and Chen B. - A nonlinear adaptive H tracking control design in robotic
systems via neural networks, IEEE Trans. Contr. Syst. Tech. 5 (1) (1997) 13–29.
27. Wang L., Wang X., and Hu X. - Connectivity maintenance and distributed tracking for
double-integrator agents with bounded potential functions, Int. J. Robust Nonlinear
Nguyen Tan Luy, Tran Ngoc Anh, Dang Quang Minh
162
Control 25 (4) (2015) 542–558.
28. Aliyu M. D. S. - Nonlinear H control, Hamiltonian systems and Hamilton–Jacobi
equations, CRC Press, 2011.
29. Basar T. and Bernhard P. - H Optimal Control and Related Minimax Design Problems:
A Dynamic Game Approach, 2nd ed. Boston, MA:Birkhuser, 1995.
30. Abu-Khalaf M. and Lewis F. L. - Nearly optimal control laws for nonlinear systems with
saturating actuators using a neural network HJB approach, Automatica 41 (5) 779–791.
31. Hornik K., Stinchcombe M., and White H. - Universal approximation of an unknown
mapping and its derivatives using multilayer feedforward networks, Neural Networks 3
(5) (1990) 551–560.
32. P. Ioannou and B. Fidan - Advances in design and control, Adaptive control tutorial. PA:
SIAM, 2006.
33. Vamvoudakis K. G. and Lewis F. L. - Online actorcritic algorithm to solve the
continuous-time inifinite horizon optimal control problem, Automatica 46 (5) (2010)
878–888.
34. Lewis F. L., Jagannathan S., and Yesildirek A. - Neural Network Control of Robot
Manipulators and Nonlinear Systems, Taylor and Francis, Philadelphia, PA, 1999.
Các file đính kèm theo tài liệu này:
- 11967_103810382410_1_sm_2359_2061586.pdf