An Adaptability of Head Motion as Computer Input Device

We have examined the applicability of facial operation by assuming the person of hand disabilities to perform mouse scanning. That is, a trial of alternative cursor based on the Euler angles Pitch / Yaw / Roll and Translations. Then, we have proposed a hybrid parameter scheme: the Pitch and Translation as vertical scanning, Yaw and Translation as horizontal, Roll and time interval as click action. As a result, we can take an advantage of allowing inexpensive system without complexity of the recognition system by utilizing currently available tools. We have learned experimentally that the pointing operation by face action in the resolution of 6×6 matrix can be realized, although it is far behind in the keyboard scan by fingers. Overall, to quantitative evaluation was rather difficult since it was in the mankind operation as a human interface. However, we had a measure capable of function as an alternative mouse scanning by facial movements to some extent, although operability review by the persons of disability are insufficient. On the other hand, {wired connection of Kinect sensor, web camera utilization, etc.} have been left as a future work.

5 trang | Chia sẻ: huongthu9 | Lượt xem: 578 | Lượt tải: 0

Bạn đang xem nội dung tài liệu An Adaptability of Head Motion as Computer Input Device, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên

An Adaptability of Head Motion as Computer Input Device Takehiko Tomikawa, Toshiaki Yamanouchi, and Hiromitsu Nishimura Dept. of Information Media, Kanagawa Institute of Technology, Atsugi, Japan Email: {tomikawa, yama, nisimura}@ic.kanagawa-it.ac.jp Abstract—This paper describes an application scheme for human interface by utilizing the movements of body parts as an input device. The purpose of this paper is to assist the computer input for the person with hand disabilities, and to construct a system that can be inexpensive and easily implemented. Thus, the authors propose a combination parameters of “Euler angles” and “Translations” under body movements to perform mouse scanning behaved as alternative cursor. In other words, this is a trial to replace the pointing functionality of the mouse by utilizing “Translations” in neck or waist movements in addition to the “Pitch/Yaw/Roll” in face orientations. There are similar ways of thinking in the past, however, the usage of parameter combination as well as the possibility of practical realization can be hardly found. As a result of our experiments, it is to give an indication that our method can be applicable to function as a mouse scanning to some extent in spite of the simple system configurations by utilizing the current technique in both hardware and software. On the other hand, there are some problems remained as further considerations, such as, operability experiments by handicapped subjects, the system configurations in wireless linkage, and so on. Index Terms—human interface, mouse substitution, Euler angles, Kinect sensor I. INTRODUCTION In the field of mobile phone or personal computer, so called “touch sensor” input on the display screen is widely used. This method of scanning on the display device is significant as an input unit for healthy persons of the finger. The input system based on speech requires a huge dictionary registration, it is effective only in situations where the user may aloud. The former has a role as a human interface of contact type, the latter as one of non-contact type. On the other hand, there is a limit to the means of sending information for handicapped persons of hand disabilities. Conventionally, there has been a concept of applying facial movements to the alternative mouse [1]. However, there is a problem in the robustness of the recognition due to the matching errors regarding stereo vision in three-dimensional processing [2]. Recently, there is an inexpensive product to the alternative mouse by detecting the line of sight Manuscript received December 21, 2014; revised May 14, 2015. without any attachment to the human head [3]. The problems are still remained for the eye movements in visual fixation, narrow recognizable view range, etc. Our scheme seems to be common with the conventional method in terms of facial behavior, non-contact, or non-wearing, however, it becomes rather simple concepts by extracting Euler angles and Translations in head movements. By taking into account of the tendency as a background, the authors have focused on how to support handicapped person to perform computer input, and how to apply head movements to an alternative mouse without using hands. The following is the purpose and the proposal of this authors. Purpose: To assist computer input for disabilities in hand and to build the system being easily implemented without cost. Propose: To replace mouse functions by “Euler angles” and “Translations” as a hybrid scanning. Here, Fig. 1 shows Euler angles in (a) and Translations in (b), as facial movements, used in this proposal. (a) Euler Angles (b) Translations Figure 1. Euler angles and translations (a), (b) II. PREPARATIONS Let an arbitrary point of the head be (x, y), the movements can be expressed as a following matrix (1). 166 Journal of Automation and Control Engineering Vol. 4, No. 2, April 2016 ©2016 Journal of Automation and Control Engineering doi: 10.12720/joace.4.2.166-170 where, the rotation angle (θ) and the scale factor (δ), and Translation ( tx, ty ). Positioning of the conversion formula in this paper means the similarity transformation based on Euler angles (Pitch, Yaw, Roll) and the depth of camera. rotation scale translation | 𝑐𝑜𝑠(𝜃) − sin⁡(𝜃) sin(𝜃) ⁡⁡⁡⁡⁡cos⁡(𝜃) | | 𝑥 𝑦| ⁡⁡⁡⁡⁡⁡⁡⁡⁡δ | 𝑥 𝑦|⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡| 𝑥 𝑦| + | 𝑡𝑥 𝑡𝑦 | (1) Then, the recognition process proceeds from head joint  face region  face parts (eyes, mouth), and leads to the “normal vector” on the face plane. Here, the normal vector N of the rectangle ABCD can be expressed in (2) by line segments AB, AD, where, “×” means cross product of the vector. N = AB⃗⃗⃗⃗ ⃗ ⁡× AD⃗⃗⃗⃗ ⃗⁡ (2) This paper is the technique of Rotations and Translations in three dimensional head movements as a function of mouse substitution. Let the following assumptions be set as a preparation in our scheme. a) It can be done for head rotations with fixed line of sight. (Euler angles) b) It can be done for eye rotations without head movements. (contribution to the expansion of field of view) c) It can be done for visible region checking without eye and head movements. (visual field test) d) It can be done for vertical and horizontal movements without face rotations. (Translations) For a), The Euler angles in this paper are either positive or negative value, in three dimensional rotation angles of the head or face, that does not depend on the depth of camera. Here, the cursor movements are applied as a displacement values, and proportional to the Euler angles. And, the proportional constants can be determined in consideration of {field of view, scanning range} on the screen. Where, the scanning speed means the displacement value for angular velocity of the rotating face to the cursor movements. For b), there is a report of alternative cursor using the line of eye as a non-contact type noted in the previous paragraph. This is an advantage to utilize eye movements without head rotations, however the problems are still remained in operation stability and operability. For c), it is performed in the “visual field examination”, meaning the range that can be recognized the presence or absence of blinking light. The field of view is the visible range of display, and does not mean scanning range of cursor. That is, in the conversion from angle to displacement, visible field on the screen becomes roughly 2π×depth×(Euler Angle)/2π in relation to both Euler angle and depth. For d), there are face Translations in positive or negative values that does not depend on the camera depth. For applying to an alternative cursor, the displacement due to neck and waist can also be considered in this paper. Now, by using an image data obtained through the camera, the system proceeds {human recognizing  body recognizing  head recognizing  face recognizing}, and resulted in alternative mouse function. Here, they are recognition procedures of human body within { }, which have to run as algorithms that can be operated in stably but in real time. Further, by specifying the portion of the face { eyes, nose , and mouth }, it leads to find out the normal direction figured out in geometry. As a whole, the authors have reached that the amount of displacement in considering with operability as a mouse substitution is followed by Euler angles and Translations. Thus, we have decided to apply our proposed scheme to alternative mouse system by utilizing appropriate existing algorithms of recognition in head or face parts. After all, it was experimentally verified that the present proposal can be realized with simple system configurations based on existing hardware and software tools. Prerequisite as a human interface in this experiments are as follows.  In lying mode (for disabled person in hand)  Without any attachment to the body (non-contact operation)  Using facial, neck, or waist action (upper body as a recognition target) The Fig. 2 illustrates the positional relationship among face, camera, and screen. They are, the case of sitting on the chair being applied for upper body and lying in bed. Assuming that the display resolution for screen projection is 480×640 [pixels] and 70 [cm] for the distance between camera and the subject. Now, they become necessary conditions as an alternative cursor in order to move within this operating range for display contents in the field of view. (a) Sitting Mode (b) Lying Mode Figure 2. Experimental layout (a), (b) III. ALTERNATIVE CURSOR In view of assisting handicapped persons with disabilities in hand, it is conceivable use of {face, head, 167 Journal of Automation and Control Engineering Vol. 4, No. 2, April 2016 ©2016 Journal of Automation and Control Engineering or foot} as a potential. Here, we have tried to utilize only head portion as an alternative mouse in consideration with lying in bed or sitting on the wheelchair. Further, it was also performed in the case of using Translation move besides Euler angles. As a preparation, initial system parameters must be set by attempting in motional operability. Then, some evaluations by the number of subjects must be obtained through experimental trial. At this time, we have tried to verify whether the pointing operation from the face rotations is reasonable or not while scanning on the display screen with cursor moving. Now, typical Euler angles Pitch / Yaw / Roll are shown in Tbl. I. and Fig. 3. (a) - (c) as actual measurement examples by this authors. Where, the angles in table and figures are [degree] expression with signed values from -90 to 90 in a normal front vision as a reference, and vertical and horizontal axis indicate angles and time consumptions, respectively. Throughout the experiments, it can be seen that the rotation angles are narrowed in the order of Yaw  Pitch  Roll. It is considered due to the joint structure of human neck. It seems to be rather helpful tendency to summarize at the primary stage although the dynamic range of each rotation angles may not be generalized among individual differences. About Pitch and Yaw in Fig. 3 (a) (b), both of them behave in similar transition levels, while Yaw shows lager swing in operating range. In Fig. 3 (c), we can see that the Yaw and Pitch are affected by Roll operation. TABLE I. MEAN VALUES OF PITCH/YAW/ROLL ---------------------------------------------------- Pitch -35 to +37 (down to up) Yaw -44 to +42 (right to left) Roll -32 to +30 (right to left) (a) Pitch (b) Yaw (c) Roll Figure 3. Euler Angles (a) - (c) In view of head rotations, it seems to be natural to utilize Pitch and Yaw as vertical and horizontal scanning, respectively. Assuming that the 6×6 matrix size =⁡18×18 [cm] and the distance between camera and the subject = ⁡70[cm], (Pitch, Yaw)⁡≅⁡(15°, 15°) is the field of view as a normal vision. It follows that it is sufficient ranging comparing with the measured value of Pitch and Yaw in Table I. Thus, it is necessary to adjust an angular velocity in accordance with the proportional constant for the range of movement in cursor scan, in the field of view with recognizable contents. In this experiment, we let the amount of cursor movement [pixel] on the screen be proportional to the Euler angles, and the proportional constants can be necessary in view of operation in human interface. Regarding field of view, let Pitch⁡α⁡and Yaw⁡β, the distance between subject and camera be unchanged, the following relationship can be obtained in (3) and (4). Vertical displacement ∝ ⁡⁡⁡α⁡⁡ (3) Horizontal displacement ∝ ⁡⁡⁡β (4) Now, in order to perform click action of the mouse, there are several ways existed, such as {foot action, voice action, unmoved in certain period, with blinking, open/close eye action, up/down eyebrow, and so on }. In this time, both Roll operation and time holding of cursor can be experimentally used. By taking analog value of the Roll angle ⁡γ to digital on/off function, the head inclination of left/right corresponds to left/right clicking in mouse operation. That is, left click and right click are followed by⁡γ+Th, respectively, where the value of threshold Th=8 in this time. In addition, the click is set to be one-time action while occurring for Roll transaction. IV. SCANNING EXPERIMENTS Here, we had scanning experiments by focusing on the operation of cursor movements without regarding to input characters or symbols in keyboard function. The various face recognition algorithms based on learning process have been proposed in the past (for example [4]). In fact, the face recognition in recent digital cameras can be automatically included. Therefore, it must be appropriate for our purpose to utilize an optimal and available system into our recognition system. Currently, the software tools for specifying the body 168 Journal of Automation and Control Engineering Vol. 4, No. 2, April 2016 ©2016 Journal of Automation and Control Engineering joints through the depth information, so called “Kinect Sensor” has been introduced [5]-[7]. In our system, it can be utilized with the product of “Face Tracking” as a software tool which specifying the joints of the body, and Euler angles [8]. Further, the face recognizing programs have been available as useful image processing [9]. The authors have decided to utilize both library tools in view of extracting Euler angles and Translations in face movements. Thus, it can lead to recognize a portion of face from geometrical locations following joint extraction from depth image. Then, it is expected to obtain the quantitative data for rotation angles related to, so called, normal direction of the face. An experimental hardware system includes { Kinect for Windows, note type computer }, and a software system includes { human head detection / face tracking / face parts recognition or orientation, and mouse substitution }. The flow of hardware system and flow of software system are shown in Fig. 4. (a) and (b). First, we have tried an experiment to determine the operation range on the display screen in accordance with the amount of rotation angles in head action as a parameter in a polar coordinate system. Whereas, all attempts are made in the sitting mode on the chair as an upper body for camera subjects. Specifically, it was examined the scanning stability in head rotations in either 3×3 and 6×6 display matrix shown in Fig. 5. (a) and (b). Here, an each matrix element becomes 103×77 [pixels] at the 6×6 matrix in the frame of 640×480 [pixels] (excluded outer frame), but not for the projected matrix size. (a) 3×3 matrix (b) 6×6 matrix Figure 5. Scanning matrix In order to get the differences between rotation angles and Translations, we have tried to chase indicated matrix cell generated by randomized matrix points. In case of 3×3 matrix, the scanning over the matrix was rather easy and smooth in both using Euler angles and Translations. In case of 6×6 matrix, it was similar behavior as 3×3 matrix except some instabilities in scanning action during head rotations. In both cases, the limited actions can be seen in up/down directions by using Translation (ty), while faster and smooth in left/right (tx). Also, the cursor tracking with the head scanning was carried out to the corresponding position indicated by random numbers in 6×6 matrix. We have examined the position error rates while chasing indicated matrix points become 28 [%] in Euler angles, 26 to 91 [%] in Translations. Whereas, they are in the mean values of miss-chasing rate with the time interval of 2 [sec] and the number of 180 times iterations by this authors. Some higher errors can be seen in Translations depend on the matrix positions comparing with Euler angles. This is due to the difficulty for up/down shift of 91[%] errors, although the left/right shift of 26 [%] errors in Translations. It is difficult to cover the whole matrix cells by Translations, however it can be helpful to assist Euler angles to some extent. Thus the authors have arranged (3) and (4) to be hybrid type, like (3a) and (4a), where variables indicate to be proportional contributions without units. vertical displacement ∝ ( α + ty ) (3a) horizontal displacement ∝ ( β + tx ) (4a) In each case, the two terms on the right side are accelerated in the same sign, and are decelerated by the opposite sign in the horizontal displacement. Figure 6. Scanning error rate in 6×6 matrix chasing We have carried out the experiments in consideration of the arrangements, using random numbers in the same manner as previous trial. The tracking error rates are shown in Fig. 6 as height of each 6×6 matrix point in vertical and horizontal directions, where the average results of 1200 (300×4) trials at the depth of 70 [cm]. The error rates are distributed roughly 0 to 20 [%] in this case. The higher error rates around inside matrix comparing with the side or corner can be seen in the distribution, which is common to all subjects. It is because the chasing direction of the sidewall can be reduced by facing to the end point, not passing point. The lower error rates in the first row or higher error rate in the second raw is caused by the case of the same person and seems to be the individual differences. In our experiments, it was lower error rates in case of depth = 169 Journal of Automation and Control Engineering Vol. 4, No. 2, April 2016 ©2016 Journal of Automation and Control Engineering Figure 4. System flow (a) , (b) 70 [cm] comparing with 60 [cm] or 80 [cm] as shown in appendix Fig. (a) and (b). This is related to the compatibility between matrix size and operability or the tracking robustness followed by facial movements. On the other hand, for higher rates in the matrix points shown in this figure seem to be reduced by some appropriate training. V. CONCLUSION We have examined the applicability of facial operation by assuming the person of hand disabilities to perform mouse scanning. That is, a trial of alternative cursor based on the Euler angles Pitch / Yaw / Roll and Translations. Then, we have proposed a hybrid parameter scheme: the Pitch and Translation as vertical scanning, Yaw and Translation as horizontal, Roll and time interval as click action. As a result, we can take an advantage of allowing inexpensive system without complexity of the recognition system by utilizing currently available tools. We have learned experimentally that the pointing operation by face action in the resolution of 6×6 matrix can be realized, although it is far behind in the keyboard scan by fingers. Overall, to quantitative evaluation was rather difficult since it was in the mankind operation as a human interface. However, we had a measure capable of function as an alternative mouse scanning by facial movements to some extent, although operability review by the persons of disability are insufficient. On the other hand, {wired connection of Kinect sensor, web camera utilization, etc.} have been left as a future work. APPENDIX SCANNING ERROR RATES (a) Average of 4 × 300 trials, depth = 60 [cm] (b) Average of 4 × 300 trials, depth = 80 [cm] REFERENCES Takehiko Tomikawa was born in Japan in 1945. He is a Professor in the Department of Media Information, Kanagawa Institute of Technology. He holds the MS and PhD degrees from Shizuoka University. His current research interests are welfare assisting tools based on motion capture. He is a life time member of IEEE. Toshiaki Yamanouchi was born in Japan in 1968. He is an assistant professor of Kanagawa Institute of Technology, Japan. He gained BE and ME degrees from Waseda University in 1990 and 1992, respectively. He worked at the University as a research associate for three years, and he moved to Kanagawa Institute of Technology in 1997. His main research field is Digital Image Processing. He is a member of IPSJ. Hiromitsu Nishimura was born in Japan in 1972. He received his Dr. Eng. from Shin-Shu University at Japan in 2000. He is a Lecturer of the Department of Information Media, Kanagawa Institute of Technology. His current interests are adaptations of image processing, including non-visible lighted visions. He is a member of the IAPR. 170 Journal of Automation and Control Engineering Vol. 4, No. 2, April 2016 ©2016 Journal of Automation and Control Engineering [1] Microsoft Corp. Patent: 004-362569, 2004 [2] Y. Matsumoto. (2005). [Online]. Available: [3] Eye Tribe. (2014). [Online]. Available: [4] Sing-Tze Bow, “Pattern recognition,” Marcel Dekker Inc. [5] S. Tsukasa and N. Kaoru, Kinect for Windows SDK Programming Practice, Kogaku-sha. [6] J. Webb and J. Ashley, Beginning Kinect Programming with the Microsoft Kinect Sdk, Apress. [7] D. Catuehe, Programming with the Kinect for Windows: Software Development Kit, Microsoft Press. [8] G. Borenstein, Making Things See, O’Reilly Pub. [9] D. L. Baggio, et. al., “Mastering open CV with practical computer vision projects,” Packet Pub.

Các file đính kèm theo tài liệu này:

an_adaptability_of_head_motion_as_computer_input_device.pdf