User:JinOuKim

From Wikipedia, the free encyclopedia


Computational vision, the discipline of teaching machines how to see, is involved in broad field such as image restoration, image enhancement, automated visual inspection, robot vision, computer-based image understanding of general 3-dimensional scenes, and visual perception and cognition. The computational vision also refers to the studies how to reconstruct, interpret and understand a 3-Dimensional scene from its 2-Dimensional images in terms of the properties of the structures present in the scene. It also merges knowledge in computer science, electrical engineering, mathematics, physiology, biology, and cognitive science. The goal of computational vision is to make possible systems that can consistently interpret or express the visual environment under almost any operating conditions, which is to reproduce the sufficient performance of human visual perception. The future computational vision models will be developed in physics and in computer graphics. Both of these fields model how objects move and animate, how light reflects off their surfaces, is scattered by the atmosphere, refracted through camera lenses, and projected onto a flat image plane. [1]

Computational Vision in Neuroscience[edit]

The current goal of computational vision in neuroscience is to accomplish understanding of co-relations between manipulated subjects/objects and actions without any knowledge of object or models through theoretical analysis of image sequences in real time. To achieve this goal, experts in this field works for developing a real-time modular system for robotic vision and control with the use of graphical user interface to allow chronological image configuration. Also, they work on developing better dense disparity mapping, optical flow, segment relation graphing, and semantic event chain creation, and streo image acquisition and rectification. [2] [3] [4]

History[edit]

  • 1970s - High-level shape models(gc's, superquads, geons, volumetric abstractions), idealized images, simple no-text objects, blocks world-like scenes. Salient contours map to surface discontinuities
  • 1980s - Mid-level shape models(polyhedra, CAD models, low-level geometric invariants, 3-Dimensional or view based 2-Dimensional geometric templates), more complex textureless objects, well-defined geometric structure. Salient contours map to polyhedral edges, image corners to polyhedral vertices.
  • 1990s - Low-level image-based appearance models(pixel-base templates, eigenspaces), most complex objects, full texture, restricted scenes, pixels in image correspond to pixels in model.
  • 2000s - Appearance-based abstractions of local nighborhoods(SIFT, affine-invariant patches, phase-based patches, shape contexts), most complex objects, robustness to noise, occlusion, articulation, minor within-class variation, appearance of image still very close to appearance of model.

[5]

Detection[edit]

Edge Detection[edit]

Edge detection is important since the success of higher level processing relies on good edges. Gray level images have large amount of data, which are irrelevant. Therefore, the initial step is to reduce some of the data; the object is separated from the background and edges which are physically significant are identified. The edge information plays key role in the selection of tokens. In motion, the detection of moving objects is processed by identifying the time varying edges and corners. Object recognition methods based on 2-Dimensional shape use the edge detection method. There are three stages in edge detection.

Edge Filtering Stage[edit]

During the filtering stage, the image passes through a filter to remove the sound. The sound can be due to the undesirable effects introduced in sampling, quantization, blurring and defocusing of the camera, and irregularities of the surface structure of the objects. The simplest filter is the mean filter. In the mean filter, the gray level at each pixel is replaced by the average of gray levels in a small neighborhood around the pixel. With this process, the sound is averaged out.

Edge Differentiation Stage[edit]

The differentiation stage focuses the locations in the image where intensity changes are significant. When the filtering stage is skipped, the differentiation step was performed by using the finite-difference approach, and the detection was processed by locating the highest points in the gradient of the intensity function using a threshold. In these approaches, filtering is not important because only synthetic images and the industrial scenes with a controlled environment were considered. Since taking the mean is a step for filtering, better data can be collected. Determination of the threshold depends on the domain of the application and it varies in different images. Still, autonomic selection of thresholding is not an easy problem.

Edge Detection Stage[edit]

During the detection stage, those points where the intensity changes are significant are localized. In the detection stage, the computer detects the highest points in the derivative output to locate the edge points.

Object Detection[edit]

The issues for object Dectection have risen from generality and number of objects. For generality, the sensor sometimes cannot identify whether the object is in 2-dimensional or 3-dimensional. Therefore, the computer programmers need to provide the model with informed initializations in the form of rough 2-dimensional object locations and poses, which is obtained by a 2-dimensional multi-view detector. This approach reduces computation and results in good performance in estimation. Furthermore, occasionally, range of viewing conditions and segmentation or categorization of biological parts are not well-defined. Indeed, the machine frequently do not count the correct number of objects.[6]

Motion Detection[edit]

The optical flow estimation relies on the local information which is a part of the image. In a given pixel, a motion is conditioned by the existence of a non-zero spatial gradient. Therefore, the explicit information is available only inside of a non-homogeneous image. Therefore, the clear information is incomplete this non-homogeneous image only provides a partial image of the underlying motion, depending on the spatial gradient orientation at the pixel level. Therefore, the method detecting motion, not an image, has not been completely discovered. Most of studies in activity recognition in computer vision focus on sub-problems such as tracking and motion detection and often misses a second layer that spans a variety of perceptual component and fuses interprets their outputs. Therefore it is hard to develop a framework for high-level human activity recognition.[3]

Schnuck Method for Motion Detection[edit]

The Schunuck method is a simpler method for computing optical flow. In this method, multiple components are used to compute the flow. Because the gray level at a single pixel gives only one constraint, the optical flow can lie anywhere on the straight line figured by the spatial and temporal derivatives. In the second constraint from the neighboring pixel is used, then the right optical flow can be determined by computing the intersection of two lines represented by the constraints. In general, it is idealized to employ multiple constraints and Schnuck's Method utilizes eight constraints obtained from points around 3 X 3 neighborhood and this results in 8 intersections of lines. If the measurements does not contain the noise, and all pixels in a 3 X 3 neighborhood belong to the same moving object, then, in principle, all eight straight lines will intersect at a single point, which is in the correct form of optical flow.

SFM Method for Motion Detection[edit]

The structure from motion (SFM) method in computational vision refers the physical properties of the objects present in the scene, such as their 3-Dimensional structure and motion, given a series of two-dimensional projections. There are two classes of methods for SFM, which are displacement methods and instantaneous methods

Instantaneous Method[edit]

In the process of instantaneous method, the optical flow is utilize to recover the 3-Dimensional motion and depth values.

Methods[edit]

File:Deep Hierarchy vs Flat Hierarchy.png
the structures of deep hierarchy and flat hierarchy

Deep Hierarchy[edit]

Deep Hierarchy System follows through a systematic process and its process continues stage by stage when visualizing the image. Deep Hierarchy System is beneficial in computational efficiency and generalization. Computer Vision Hierarchy is divided into three levels of vision, which are low, intermediate, and high.[7] Deep hierarchies build on top of each other with exploiting the sharing ability of features among more complex compositions. Sharing ability refers to the share of common computations, which brings out computational efficiency. However, recycling commonalities between objects' models places their representations in relation to other objects, thus leading to high generalization capabilities and lower storage demands. Also, although all neuro-physiologic evidence suggests that in the human visual system number of levels are realized, it has been known that the design and learning of deep hierarchical systems is very difficult task.[2]


Low-level vision[edit]

Low-level vision processes image for feature extraction such as edge, corner, and optical flow. The present implementation of the low-level process module is a logical and conventional, by mean of the virtual image system, which is used as a tool for investigating low-level visual processes. Visual Image System is currently implemented on a Unix workstation under X-Windows. The computation efficiency of the low-level process tasks can be greatly improved if a computational model, more adequate to the data structure and to the operation that must be done on it, is adopted.[2]

Intermediate-level vision[edit]

Intermediate-level vision recognizes the object and interprets 3-Dimensional scene using features obtained from low-level vision. [8]

High-level vision[edit]

High-level vision interprets the evolving information provided by the intermediate level vision as well as directing what intermediate and low level vision as well as directing what intermediate and low level vision tasks should be performed. Also, it may include conceptual description of a scene like activity, intention and behavior. At this level, the volumetric representation constitutes the history of the visual process. It contains the information regarding the 3-Dimensional structure of the scene as achieved by the peripheral processing stages, and then modified by perceptive and conceptual reasoning. The segmentation by regions and by edges further characterizes some parts of the volumetric representation. [2]

Flat Hierarchy[edit]

Most existing computer vision systems utilizes flat hierarchy system. However, this system do not share computational resources and do not allow for generalization across tasks. More importantly, the human visual system is not flat and this system is not relevant when realizing on the screen. Flat Hierarchy a simple feature based descriptors were taken as input and processed by the task-dependent algorithms.[7]

Fields[edit]

  • Image processing - focuses on image manipulation to enhance image quality, to restore an image or to compress/decompress an image.
  • Pattern recognition - studies various techniques such as statistical techniques, neural network, and support vector machines to recognize or classify different patterns. Pattern recognition techniques are widely used in computer vision.
  • Photogrammetry - is concerned with obtaining accurate and reliable measurements from images. If focuses on accurate mensuration. Camera calibration and 3-Dimensional reconstruction are two areas of interest to both computer vision and photogrammetry researchers.


Application[edit]

  • Robotics - Localization-determine robot location automatically, obstacles avoidance, Navigation and visual serving, Assembly(peg-in-hole,welding, painting), Manipulation (PUMA robot manipulator), Human Robot Interaction (HRI): Intelligent robotics to interact with and serve people
  • Medicine - Classification and detection (lesion or cells classification and tumor detection), 2,3-Dimensional segmentation, 3-Dimensional human organ reconstruction (MRI/ultrasound), Vision-guided robotics surgery [9] [10]
  • Security - Biometrics(iris, finger print, face recognition), Surveillence-detecting certain suspicious activities or behaviors
  • Transportation - Autonomous vehicle, Safety(driver vigilance monitoring)
  • Industrial automation - Industrial inspection (defect detection and mensuration), Assembly, Barcode and package label reading, Object sorting, Document understanding (OCR)
  • Image/video databases
  • Human Computer Interface - Gaza estimation, Face expression recognition, Head and hand gesture recognition

[11]

Reference[edit]

  1. ^ Joo, Deokjin; Kwan, Ye-Seul; Song, Jongwoo; Pinho, Catarina; Hey, Jody; Won, Yong-Jin (25 October 2013). "Identification of Cichlid Fishes from Lake Malawi Using Computer Vision". PLOS ONE. 8 (10): e77686. doi:10.1371/journal.pone.0077686. PMID 24204918.
  2. ^ a b c d Tresset, Patrick; Fol Leymarie, Frederic (2013). "Portrait drawing by Paul the robot". Computers & Graphics. 37 (5): 348–363. doi:10.1016/j.cag.2013.01.012. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: date and year (link)
  3. ^ a b Florence, Germain (8). "A Computer Vision Method for Motion Detection Using Cooperative Kalman Filters". 87. 11: 303-306. {{cite journal}}: Check date values in: |date= and |year= / |date= mismatch (help); Cite journal requires |journal= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |month= ignored (help)
  4. ^ Mogol, B. A.; Gökmen, V. (2013 Nov 28). "Computer vision based analysis of foods - A non-destructive colour measurement tool to monitor quality and safety". Journal of the Science of Food and Agriculture. 94 (7): 1259–1263. doi:10.1002/jsfa.6500. PMID 24288215. {{cite journal}}: Check date values in: |date= (help)
  5. ^ Li, Cai Hua; Zhang, Jin Bo; Hu, Xiao Ping; Zhao, Guo Fu (2013). "Algorithm Research of Two-Dimensional Size Measurement on Parts Based on Machine Vision". Advanced Materials Research. 694–697: 1945–1948. doi:10.4028/www.scientific.net/AMR.694-697.1945. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: date and year (link)
  6. ^ Zia, M. Zeeshan; Stark, M.; Schiele, B.; Schindler, K. (2013). "Detailed 3D Representations for Object Recognition and Modeling". IEEE Transactions on Pattern Analysis and Machine Intelligence. 35 (11): 2608–2623. doi:10.1109/TPAMI.2013.87. PMID 24051723. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: date and year (link)
  7. ^ a b Kruger, N.; Janssen, P.; Kalkan, S.; Lappe, M.; Leonardis, A.; Piater, J.; Rodriguez-Sanchez, A. J.; Wiskott, L. (2013). "Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?". IEEE Transactions on Pattern Analysis and Machine Intelligence. 35 (8): 1847–1871. doi:10.1109/TPAMI.2012.272. PMID 23787340. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: date and year (link)
  8. ^ Ryu, Jiwon; Choi, Jaesoon; Kim, Hee Chan (2013). "Endoscopic Vision-Based Tracking of Multiple Surgical Instruments During Robot-Assisted Surgery". Artificial Organs. 37 (1): 107–112. doi:10.1111/j.1525-1594.2012.01543.x. PMID 23043484. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: date and year (link)
  9. ^ Leeds, DD (2013 Nov 22). "Comparing visual representations across human fMRI and computational vision". Journal of Vision. 13 (13): 25. doi:10.1167/13.13.25. PMC 3839261. PMID 24273227. {{cite journal}}: Check date values in: |date= (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)
  10. ^ Brunak, S (2014). "Cancer panomics: computational methods and infrastructure for integrative analysis of cancer high-throughput "omics" data- session introduction". Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 19: 1–2. PMID 24297528. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  11. ^ Perrenot, Cyril; Perez, Manuela; Tran, Nguyen; Jehl, Jean-Philippe; Felblinger, Jacques; Bresler, Laurent; Hubert, Jacques (5 April 2012). "The virtual reality simulator dV-Trainer® is a valid assessment tool for robotic surgical skills". Surgical Endoscopy. 26 (9): 2587–2593. doi:10.1007/s00464-012-2237-0. PMID 22476836.

Category:Computational Vision Category:Neurology Category:Neuroscience Category:Unsolved problems in neuroscience