facebook-pixel
$150.00
Certification

Industry recognized certification enables you to add this credential to your resume upon completion of all courses

Need Custom Training for Your Team?
Get Quote
Call Us

Toll Free (844) 397-3739

Inquire About This Course
Instructor
Dr. Stephen Huff, Instructor - Graph Models for Deep Learning

Dr. Stephen Huff

Is currently employed as a consulting scientist, advising DHS Science and Technology programs regarding machine learning technology. He earned his Ph.D. (Bioinformatics) from the University of Houston. In addition to that, he has 3 Masters and a B.S.: an M.S. (Joint Professional Military Education) from Air Command and Staff College, M.S. (Management Information Systems) from Wright State University, M.S. (Biology) from University of Houston at Clear Lake, and a B.S. (Microbiology with Chemistry Minor) from University of Texas at Arlington.

Instructor: Dr. Stephen Huff

An executive review of hot technology

  • Get basic working knowledge of the Python-based manipulation of Keras, Microsoft Cognitive Toolkit, Theano and TensorFlow deep learning platforms.
  • Obtain the ability to compare/contrast similar implementations of practical, graph-based solutions in Keras using Microsoft Cognitive Toolkit, Theano and/or TensorFlow back-end systems.
  • Instructor has a Ph.D. in Bioinformatics and works as a consultant for the DHS Agency.

Duration: 3h 04m

Course Description

This course provides a detailed executive-level review of contemporary topics in graph modeling theory with specific focus on Deep Learning theoretical concepts and practical applications. The ideal student is a technology professional with a basic working knowledge of statistical methods. Upon completion of this review, the student should acquire improved ability to discriminate, differentiate and conceptualize appropriate implementations of application-specific (‘traditional’ or ‘rule-based’) methods versus deep learning methods of statistical analyses and data modeling. Additionally, the student should acquire improved general understanding of graph models as deep learning concepts with specific focus on state-of-the-art awareness of deep learning applications within the fields of character recognition, natural language processing and computer vision. Optionally, the provided code base will inform the interested student regarding basic implementation of these models in Keras using Python (targeting TensorFlow, Theano or Microsoft Cognitive Toolkit).

What am I going to get from this course?

  • Improved Ability to discriminate, differentiate and conceptualize appropriate implementations of application-specific (“traditional” or “rule-based”) methods versus Deep Learning methods of statistical analyses and data modeling
  • Improved general understanding of Graph Models as Deep Learning concepts
  • Basic working knowledge of the Python-based manipulation of Keras, Microsoft Cognitive Toolkit, Theano and TensorFlow deep learning platforms
  • Basic ability to compare/contrast similar implementations of practical, graph-based solutions in Keras using Microsoft Cognitive Toolkit, Theano and/or TensorFlow back-end systems
     

Prerequisites and Target Audience

What will students need to know or do before starting this course?

  • Basic understanding of application-specific (“Traditional” or “Rule-Based”) methods for statistical analysis
  • Basic understanding of Deep Learning theory (Graph/Network Theory)
  • Basic general understanding of Information Technology
  • Optionally, basic working knowledge of Python programming.
     

Who should take this course? Who should not?

  • Information Technology professionals with a basic knowledge of Graph Theory and Its practical applications

Curriculum

Module 1: Introduction, Review of Background Concepts and Technological Context

Lecture 1 Course Overview

This course provides a detailed executive-level review of contemporary topics in graph modeling theory with specific focus on Deep Learning theoretical concepts and practical applications. The ideal student is a technology professional with a basic working knowledge of statistical methods. Additionally, to better inform the interested student, the final lesson of this course presents samples in Python describing the essential implementation of basic model structures. To reduce space and improve clarity, this code targets a basic Keras environment – this inclusion is not meant as an endorsement of one system over another (all provide benefits); instead, at the time of this writing, Keras simply offers a popular, facile ‘frontend’ for managing TensorFlow, Microsoft Cognitive Toolkit and Theano deep learning systems, all using this popular script. [L00 S01]

Lecture 2 About this Introduction

Drawing an analogy from the Cartesian coordinate system, which describes a point in space with three values, this introduction should orient the student within the chaotic space of a new and rapidly advancing technological domain by imparting three levels of insight. The information summarized within this introduction will provide historical background while also describing current state of the art. With these two points of reference in mind, then, the third point of reference, that of the course subject matter, should improve the student’s ability to anticipate future trends within this large and vastly influential corpus of technology. Of course, the same information will also prepare the student to derive superior understanding of the subject matter review that follows.

Lecture 3 Relevant Nomenclature & Statistical Inference and Statistical Models

Relevant Nomenclature To aid the following discussion, several terms first require formal definition. Terms presented in italicized sans-serif font may be referenced in the glossary for quick reference and definition. For example, a parameter, generally speaking, is a condition that limits or defines performance of a function. A given system may be naturally or artificially constrained to limit the variability of its outcomes by restricting input to the numeric values of one or zero, a state that might be qualitatively described as ‘on’ or ‘off’ – these constraints represent the parameters of the system. Simply stated, within the brief (and painless) overview of statistical methods that follows, the ‘dependent variable’ (or ‘response variable’) is the ‘thing’ that the model serves to predict/classify, while the ‘independent variable’ (or ‘explanatory variable’) is the ‘thing’ that provides the means of prediction. For example, a model might predict rain according to fluctuations of humidity, which can be readily measured – here, rain is the dependent variable and humidity is the independent variable. Further, if the response variables trace, more or less, a straight line through the problem space (i.e., rain varies consistently with humidity – which is probably not the case), then the model is said to be ‘linear’. Otherwise, the model will be ‘non-linear’. Finally, ‘scalar variables’ provide magnitude (quantity) only. A ‘non-scalar variable’, or ‘vector’, quantifies both magnitude and direction. Throughout the text, the term ‘system’ predominates. Again, this is a matter of convenience, since many other terms would suffice. In this context, then, the term ‘system’ references any phenomenon of interest. A system may represent the complex relationships of an entire rain forest ecosystem, but typical systems of commercial interest tend to be much simpler (comparatively speaking, since the Internet is arguably a new kind of forest).   Statistical Inference and Statistical Models ‘Statistical inference’ is the process of producing (or inferring) information about a population using data derived from it. Typically, populations of interest tend to be large enough to prevent direct data extraction from each of its individual data points. This prohibition requires the use of data samples – that is, the process of directly extracting information from only a subset of the population. Given sufficient sample size, the resultant information may be extrapolated to the population at large. A ‘statistical model’ is, in essence, a theory developed to describe some functional aspect of population performance. Application of such models to sampled data supports the identification of properties (or features) that pertain to some relevant aspect of the overall population and/or its dynamics, whatever these may be. In turn, these properties/features should support some meaningful interpretation, description or manipulation (et cetera) of the underlying system (e.g., prediction of humidity expected at a specific hour of the day, given daily temperature fluctuations within the region).

Lecture 4 Contrasts with Descriptive Statistics & Modeling Assumptions

Contrasts with Descriptive Statistics In contrast to statistical inference, ‘descriptive statistics’ use similar methods as applied to similar datasets, albeit with an alternative desired outcome. As its label implies, descriptive statistics provide a means of quantitatively describing or summarizing the population’s various aspects (as opposed to inference, which additionally uses these results to provide some practical insight into the problem domain). Lacking a predictive requirement, descriptive systems do not rely on probability theory – rather, these tend to be nonparametric problems that provide ‘details’ useful for understanding the underlying phenomenon. As a practical example, a scientific experiment might use inferential statistics to draw its functional conclusions – e.g., efficacy of dosages appropriate for a patient by gender and age. The same experiment might also use descriptive statistics to provide the context of study applicability – e.g., statements that describe the target population of the study as being restricted to male patients more than sixty-five years old (“gender” and “age” being the descriptive statists).   Modeling Assumptions To eliminate uninteresting variabilities and biases within an analysis, modelers typically attempt to simplify the underlying system by discarding irrelevant data (e.g., an analysis attempting to predict a stock price might not benefit from input that describes lottery ticket sales). Another necessary aspect of these considerations must regard the logical assumptions used to qualify application of a given statistical model (or method), since these assumptions support simplification at the cost of limiting applicability. For example, where a model or method assumes a normal distribution of data (i.e., the ‘bell curve’) and is also dependent upon a random sample extracted from same (both common assumptions of many regression-based protocols), then any attempt to apply the model to a distribution that is not normal and/or not drawn from a random sample will produce invalid or biased results. In part due to these considerations, for example, fully parametric models typically do not produce satisfactory results when applied to samples drawn from human or economic populations (unless these samples are extremely large), since the assumptions of normalcy and randomness are often inappropriate..

Lecture 5 Application-Specific Methods (General Development Paradigm & Examples)

Application-Specific Methods ‘Application-specific’ (‘hard-coded’, ‘rule-based’) methods of statistical analysis are common and useful solutions to many practical problems, especially among datasets and problem spaces that support parametric models. Unfortunately, these systems become less robust and useful as the size and complexity of the dataset increases, especially where these data derive from chaotic sources (e.g., real-time web traffic, economic trends, etc...). Because these models rely on simplifying assumptions, resultant solutions tend to be highly qualified as to their efficacy – valid results are applicable only where underlying assumptions prove true. Where these solutions suffice, however, computation of results is an efficient, useful prospect. Typically, the necessary develop paradigm is also comparatively simple (One Problem -> One Solution/Model -> One Algorithm). Again, while combinatorial approaches may compensate for parametric limitations, the resultant aggregated solution is too often irrationally complex and prohibitively costly, especially when applied to contemporary ‘big data’ problems. As a means of illustrating the utility, and limitations, of application-specific models, this course briefly reviews common examples of these methods.

Lecture 6 Predictive Modeling and Machine Learning (Overview)

‘Predictive modeling’, or ‘predictive analytics’, overlaps with the machine learning domain to the point of synonymity. As with the application-specific methods previously described, machine learning techniques attempt to provide practical insight into complex systems via application of a simplified model of the system of interest. Largely due to advances in computational hardware and software, machine learning models now present viable solutions to many otherwise intractable analytical problems. Specifically, these ‘learning systems’ rather conveniently provide viable solutions to the complex ‘big data’ problems made available for investigation by increasingly capable data-storage (and processing) hardware. In lieu of application-specific, rule-based algorithms, these AI systems accomplish this goal by autonomously ‘learning’ the parametric relationships of the model using input from the dataset of interest. Typically, analysts use these ‘artificially intelligent’ (AI) solutions to predict or classify complex, multivariate phenomena that are not amenable to the solutions previously described.

Lecture 7 Predictive Modeling and Machine Learning (Examples) & Graph Theory and Neural Networks (Overview)

Not to be confused with the visual rendition of data that often takes the form of squiggly lines or blocks of colors, within the context of this discussion, ‘graphs’ are mathematical constructs used to model relationships between objects. As with the neural networks of biology (and your brain), graphs consist of two simple concepts – nodes and edges. Within a graph, nodes (also called ‘neurons’, ‘vertices’ or ‘points’) connect to one another via edges (also called ‘arcs’, ‘lines’ or ‘synapses’) in a dizzying array of potential patterns. Graphs may use a ‘directed’ or an ‘un-directed’ architecture to describe its arrangements of nodes and edges. Within a ‘directed graph’, information flows one-way, especially when the graph contains multiple layers (i.e., nodes can only send or receive, exclusively, from a given edge). Conversely, an ‘undirected graph’ supports bi-directional transfer of information (i.e., nodes may both ‘send’ and ‘receive’ information from all of its edges). Within visual depictions of the graph, arrows indicate its ‘directedness’ (as opposed to un-directed graphs, which depict edges as straight lines). Without immediately confounding graph models with neural networks, artificial intelligence and deep learning (which follows in detail), a quick review of basic graph models as viable practical solutions will provide guiding context to the larger discussion that follows. Accordingly, briefly review examples of these tools in contemporary settings.

Lecture 8 Graph Theory and Neural Networks (Applications) & Graph Models and Deep Learning (Overview)

As implied by the label, ‘deep learning’ systems surmount and overcome the limitations of application-specific, hard-coded or rule-based algorithms by ‘learning’ trends or features within a dataset. These increasingly complex implementations achieve results via application of rationally designed ‘cascades’ of graphs – that is, multiple graph models working together within a layered hierarchy, wherein each successive layer receives its input from the preceding layer’s output. In these implementations, the top layer is the initial system input layer and the bottom layer provides the system’s final output. Deep learning networks have proven useful within the domains of prediction and classification, much like the application-specific methods discussed earlier. Their increasingly capable utility derives directly from the processing of training (learning) used to ‘teach’ underlying data ‘features’ (or useful, diagnostic characteristics). The ‘traditional’ converse of this process, loosely labeled ‘automated feature extraction’, is called ‘feature engineering’, which is, essentially, a fancy term for describing the process of manually developing an applicable model, as with the application-specific algorithms of the introduction. Feature engineering is the single costliest aspect of these traditional methods, both in terms of money and manpower, and this is the primary advantage to the use of artificially intelligent, machine-learning solutions. This introduction briefly reviewed relevant application-specific algorithms and ‘traditional’ rule-based, often hard-coded, solutions for classifying or predicting values from complex, multivariate datasets. Specifically, discussion focused on the appropriate and, more importantly, inappropriate application of these models to large, diverse, sparsely populated datasets of the kind commonly encountered within the contemporary ‘internet of things’. Where these (often highly) constrained parametric models begin to fail, graph-based models become viable solutions. In turn, as with all things technological, these simple models expanded horizontally and vertically from simple Support Vector Machines through Artificial Neural Networks into the deep learning constructs popular today. Building on these concepts, then, this course will provide general insight into the deep learning domain while specifically focusing on the role of graph models within these systems.

Module 2: Describing Model Structure with Graphs

Lecture 9 Describing Model Structure with Graphs (Lesson Overview & Relevant Nomenclature) & Logical Representation of Graph Structure (Lists & Matrices)

Recall from the preceding introduction that graphs are mathematical constructs that model relationships between objects of interest. These graphs consist of ‘nodes’ (‘neurons’, ‘units’, or ‘points’) connected to one another via ‘edges’ (‘dendrites’, ‘synapses’, ‘arcs’ or ‘lines’). Information passes from one node to another via edges, and each node performs some operation on the information (or not, depending upon the nature of the graph) before passing it to the next. In this sense, graphs may be directed or undirected, wherein information moves along one-way paths (as indicated visually by arrows) or two-way paths (visualized as lines).

Lecture 10 Describing Model Structure With Graphs (Implementation)

While list and matrix structures provide essential implementation of graph notation, they are far from the only means of deploying such systems. The student will benefit from a brief review of key concepts within this realm of study.

Lecture 11 Describing Model Structure With Graphs (Examples & Lesson Conclusion)

Analysis, development and implementation of graph models naturally requires some method of visualizing its architecture, but this visualization should not be confused for the functional graph, itself. These are logical (or formulaic) constructs that perform the actual operations attributable to model functionality. Developers encode these logical constructs in a variety of ways, which generally evolved from list and/or matrix structures. The former optimizes for sparse graphs performing in limited memory environments, while the latter optimizes for dense graphs at the cost of additional memory requirements. A variety of popular open source software now implements these systems, largely negating the need for customized development of core functionality. Many contemporary applications simply build on these backend services to perform a wide and growing range of analyses with the models thus produced. As might be expected, this rapidly expanding technology has a long, diverse and ever-growing list of ‘real-world’ applications. Within biological research, graphical models impact the analysis of regulatory genetic networks, protein structure prediction, protein-protein interactions, protein-drug interactions, free energy calculations and countless others. Information and intelligence research benefits from applications to causal inference, information extraction, natural language processing, multimedia data processing, speech recognition, computer vision and more. This course reviews key examples of these concepts in the sections that follow.

Module 3: Deep Learning and Graphical Models

Lecture 12 Deep Learning and Graphical Models (Lesson Overview & Relevant Nomenclature & Deep Learning History)

Defined in simplest terms, ‘deep learning’ (DL) models are basically ‘souped-up’ graph models in the guise of a deeply layered neural network. Hence, the ‘deep’ part of its label. These are machine learning constructs with a specific architecture (though the archetype [essential nature] of this architecture remains somewhat nebulously defined – in terms of standard, at least). As a definitive aspect, these deep-learning systems rely upon a ‘cascade’ of nonlinear processing units (neural networks). As with most neural nets, the output of each layer provides input to subordinate layers, thus the ‘cascade’ of operational flow. As with most graph models, these solutions deploy to perform feature extraction and transformation of complex, multivariate datasets. Also, the training (or learning) process may be performed in a supervised (e.g., within classification roles) or unsupervised manner (e.g.¸ within pattern analysis roles). Finally, these neural networks learn multiple levels of data representation, which are often – but not strictly – related to individual network layers within the system. The overall structure and the layer-wise architecture of deep-learning models present many computational challenges, primarily as regards the training process. Prior to the advent of multi-core CPUs, hardware/software accelerators and GPUs, this crucial aspect of deep-learning implementation presented an intractable hurdle that left many interesting problems unresolved. Indeed, optimization of training performance continues to challenge investigators as larger and more complex datasets become more accessible to graph model analyses. The student will review the most relevant examples of these systems in successive sections of this course, but a few examples bear minor introductory mention here, such as ‘Deep Belief Networks’ and ‘Deep Botzmann Machines’. Both are popular theoretical designs of deep learning that have proven themselves effective against many large real-world problems.

Lecture 13 Deep Learning and Graphical Models (Essential Concepts & Understanding Deep Learning)

Deep learning (DL) systems differ from the neural networks (and other ‘basic’ machine learning methods previously discussed) according to the depth and breadth of the underlying architecture. Deep learning systems benefit from their large ‘Credit Assignment Path’ (CAP) depth. CAP is basically a stepwise count that denotes the chain of data transformations within a given system, progressing from input to output. While no hard standard defines the formal separation of ‘shallow’ neural networks to ‘deep’ neural networks, the de facto accepted standard describes deep systems as having a CAP depth greater than two. While a brief review will illuminate many benefits and advantages of these solutions as applied to complex, multivariate datasets, their most essential benefit relates to a reduction of human intervention (thus, costly man-hours of labor). Essentially, these systems circumvent the need for teams of engineers focused exclusively on traditional methods of development using application-specific models derived from manually engineered features.

Lecture 14 Deep Learning and Graphical Models (Operational Abstraction & ANN vs DNN & Lesson Conclusion)

Though neural network theory began to evolve almost immediately after advent of affordable commercial computing, researchers first coined the term ‘deep learning’ in 1986. After the application of GPUs to DL training in 2009, the Google Brain team released TensorFlow in 2015. This is a free, open source version of its proprietary deep learning software. A deep learning system is a machine learning system implemented as a multilayer cascade of nonlinear processing units (graph models). Investigators typically use these models to perform feature extraction and transformation on large, complex, multivariate datasets that do not lend themselves well to ‘traditional’ application-specific solutions. Deep learning networks are, essentially, multi-layer feed-forward artificial neural networks. Like ANNs, these adaptable and highly capable models benefit primarily from the autonomous discovery (‘learning’) of dataset features. This affordable process obviates the need to conduct expensive manual feature engineering.

Module 4: Monte Carlo Methods

Lecture 15 Monte Carlo Methods (Lesson Overview & Relevant Nomenclature)

Overall, thus far, this course attempts to frame this steady progress from application-specific models to machine learning methods as a struggle to manage and analyze large, complex, multivariate and, most vitally, inherently chaotic systems. That being said, chaos, or randomness, is a common phenomenon that influences many things, both natural and artificial. For example, as the student completes this course – which is, ideally, ‘chock full’ of information – the learning experience is a struggle to parse signal (educational information) from noise (nonsensical interference, like a blaring horn or stereo, and extraneous data – even extraneous information).

Lecture 16 Monte Carlo Methods (Overview)

‘Monte Carlo Methods’ (MCM), as the label might imply, derive from the popular casino of the same name. The label originates in the Manhattan Project from one of its pivotal investigators, a scientist named Dr. Stanislaw Ulam (Los Alamos, 1946). While pondering the daunting task of modeling performance of a complex nuclear chain-reaction, Dr. Ulam pondered multiple possible solutions using differential equations, all of which quickly swelled to intractable proportions when considered from an exhaustive exploration of a vast problem space (e.g., certain problems expand exponentially in complexity as they grow larger – like the game of telegraph, wherein one person tells two others, who then tell two others, and so on., ad infinitum). Ultimately, perhaps given the proximity of Los Alamos to Reno, Dr. Ulam conceptualized another method for exploring an inherently large and complex series of inter-connected events. Simply put, he understood that he could 1) spend impossible quantities of time to derive a deterministic, ‘hard-coded’ formula/algorithm for describing these probabilities of interaction; OR 2) he could simply “game” the system (computers are very useful here) by running a simple, unrefined model repeatedly starting from randomly selected points, afterward performing a simple tabulation of the results to calculate resultant probabilities and further refine his model. While these randomized iterations of the model could not exhaustively explore the entire problem space, a sufficient number of trials (repetitions) efficiently defined model boundaries (or constraints or parameters – those useful features of the system).

Lecture 17 Monte Carlo Methods (Examples & Lesson Conclusion)

Monte Carlo Methods may be applied to graphical models in a variety of ways – these methods also provide viable graphical solutions, too. Of principle importance to this subject matter, the Contrasting Divergence (CD) algorithm will become a familiar experience to most deep learning professionals, since this method and its variations exemplify current best practices as regards a vital aspect of deep learning (DL) implementations. Specifically, the Markov methods described above, and their progenitors such as CD, appear repeatedly within the realm of DL training tasks, since these algorithms all attempt to optimize the exploration of a network-like construct. The perceptive student will anticipate how these protocols might impact subsequent discussions.

Module 5: Approximate Inference and Expectation Maximization

Lecture 18 Approximate Inference and Expectation Maximization (Lesson Overview & Relevant Nomenclature & Overview- & Examples)

During the following sub-sections, keep in mind the overall goal of these protocols, which attempt to make predictions (or classifications) of complex, stochastic (noisy) phenomena. These methods simply apply probabilistic thinking a bit ‘deeper’ into the same essential problem to better accommodate randomness and, eventually, use it to advantage.

Lecture 19 Approximate Inference and Expectation Maximization (Iterative Method - Lesson Conclusion)

Increasingly complex Deep Learning (DL) constructs typically require increasingly robust methods for training (whether supervised or no). Computational efficiencies are vital. Besides their obvious utility as modeling systems, in and of themselves, the methods reviewed within this section also impact the training process, as executed against the overall DL model, as well. Thus, these methods further provide a means for optimizing graph model training by reducing inference time while maximizing performance/efficacy of model refinement.

Module 6: Deep Generative Models

Lecture 20 Deep Generative Models (Lesson Overview - Examples [Discriminative Models])

While reviewing the following sections, the student will benefit from a brief description of the basic differences between these model classes. Discriminative models generally optimize for classification problems, while generative models provide classification methods combined with the capacity for generating novel data in imitation of some ‘real-world’ process. Discriminative models cannot generate this kind of data, since classification is not a generative process.

Lecture 21 Deep Generative Models (Overview [Generative Models] - Lesson Conclusion)

Both discriminative and generative models may perform classification tasks, while generative models are also capable of generating new data based on modeled features. As they become increasingly complex, these architectures also become increasingly more demanding of computational resources, in turn, requiring optimized training algorithms to produce viable ‘real-world’ solutions. The methods described above provide insight into these architectures. They also support efficient methods for training, testing and evaluating DL constructs, as exemplified by the basic operations of an autoencoder. Finally, Deep Generative Models are complex arrangements of simpler models (typically, RBMs and/or GANs) used as deep-learning (or deep-belief) systems.

Module 7: Applications to Character Recognition, Natural Language Processing and Computer Vision

Lecture 22 Applications to CR, NLP and CV (Lesson Overview - Examples [Character Recognition])

Although research into these complex tasks continues unabated, all of them intricately entwined with one another – or not, depending upon design requirements – the following summary review attempts to tell a story of progress. This story builds from the success of rule-based methods (and rudimentary graphic models like Support Vector Machines), which provided an automated means of processing hand-written zip-codes long before Deep Learning emerged as a viable recourse. Passing through the surprisingly versatile field of natural language processing – which finds expression in scientific research ranged from biological laboratories through high-end data centers around the world – this story ends with a brief discussion of computer vision applications. This is, to be sure, a case of ‘last but not least’, since the future promises ‘true artificial intelligence’, which can see, hear, touch and – perhaps – smell and taste the world around it!

Lecture 23 Applications to CR, NLP and CV (Overview [NLP] & Relevant Nomenclature)

‘Natural Language Processing’ (NLP) is a general term for the application of computational resources to the analysis of human language. As with Character Recognition, previously reviewed, and Computer Vision, reviewed next, artificial decomposition and/or comprehension of natural human language presents a noisy, multivariate problem that does not readily lend itself to rule-based approaches. Despite the fact that all human languages are rule-based constructs, their combinations, which include all possible variations and exceptions, quickly exceed astronomical proportions – infinity threatens. (Imagine compiling a list of all possible words, phrases and sentences formable by valid rules found within the English language.) By now, this should sound familiar to the diligent student, since this is the same threshold that motivates the explosion of graphical model applications into so many contemporary endeavors. Within the context of NLP, common applications include speech recognition, natural language understanding, and natural language generation. Traditionally, investigators enjoyed considerable success in the application of ‘traditional’, application-specific, rule-based methods to NLP problems – this review does not dismiss these profound innovations. Indeed, the application of complex IF-THEN-ELSE logic will readily mirror the essential rules of many human languages – to a point. Application of early graphical models of the size then amenable to contemporary computational capacities (e.g., decision trees and forests) produced key advances within the field of NLP, as well. Now that computational hardware advances to the limits of theoretical thinking, machine learning techniques quickly revolutionized the field, advancing its scope almost weekly. These systems owe their performance to their roots in the stochastic models reviewed within this course. Rather than attempt to circumvent or bypass the prodigious volume of both noise and information produced within the global enterprise of human communications (not to mention the ‘communication’ systems apparent among populations of animals, plants, ecosystems, genetic networks, traffic flow, cosmic phenomena, etc...), the deep learning models reviewed below ‘embrace the horror’ of chaos, using it to some advantage, even.

Lecture 24 Applications to CR, NLP and CV (Syntactical Analyses)

Like CR and Computer Vision (reviewed next), NLP operations inevitably hover around the concept of ‘context’, a term used frequently in this course, thus far, without formal definition. Within the realm of human communications, context is everything, as some say. Context is another way of describing the semantics of a communication, whether of a conversation, a textbook, a poem or even a picture (and much more). The identification and appropriate manipulation of contextual space represents another of those stochastic problem domains that provide fertile fields of opportunity for application of many deep learning methods. This is also, generally speaking, the domain of ‘recurrence’ and ‘convolution’ – both of which can provide this essential information, when properly deployed.

Lecture 25 Applications to CR, NLP and CV (Semantic Analysis - Recognition]

After discussion of the previous section, the student might wonder why syntactical methods returned again and again to the concept of context. As previously stated, within a purely syntactical (rule-based) world, context is everything. Indeed, many investigators expend significant resources to distinguish the semantics of a discourse or conversation based purely on analysis of syntactical phenomena. While the motivations for all this effort are probably many, the curious student might consider systems wherein the underlying language is utterly unknown. Simply by identifying a phenomenon as ‘communication’, the investigator assumes its contents somehow represent coherent, meaningful information (not simply a stream of noise). This, in turn, implies the presence of some system of rules, further implying some form of structure

Lecture 26 Applications to CR, NLP and CV (Motion Analysis - Overview [CNN])

Reviewed in summary within previous sections of this course, ‘Convolutional Neural Networks’ (CNN) currently dominate the field of computer vision (as well as the field of ‘computer audition’, the artificial sense of hearing). As such, these constructs warrant final detailed description to ensure the student possess a firm grasp of its essentials, prior to completion of this course. Visual (and auditory) information presents an obviously chaotic problem space. Like language, however, every rationally developed image (or video or sound recording) should contain an abundance of structure intermingled within the chaos – provided the scene is not simply a collection of randomized values produced as nonsense (and, even then, it will have structure by virtue of having been ‘recorded’).

Lecture 27 Applications to CR, NLP and CV (Convolutional Layers & Pooling Layers)

Convolutional layers generally act to reduce graph complexity by reducing the number of free parameters within the data space. For example, image size becomes practically irrelevant (as compared to fully connected layers, reviewed below) by tiling (convoluting, scanning, moving) regions of size 5 x 5, each with the same shared weights (again, reviewed below). These iteratively combined convolutional regions represent a series of receptive fields and filters that, more or less, ‘raster’ (pass across – sideways and down, in two-dimensional language) through the input image to scan for interesting, diagnostic features, which the CNN learns during training. ‘Pooling layers’ combine outputs of node clusters at one layer into a single node in the next layer. Two choices for pooling dominate within CNNs – ‘max pooling’ forwards the maximum value from each cluster, while ‘average pooling’ forwards the average cluster value, instead. Choice in this regard is not arbitrary, but depends upon the intended role of a given pooling layer, as well as overall design of the CNN.

Lecture 28 Applications to CR, NLP and CV (Fully Connected Layers)

This section completes an executive review of several highly innovative domains within the deep learning enterprise. Besides benefiting from the obvious factual information contained herein, this summary discussion of Character Recognition, Natural Language Processing and Computer Vision should improve the student’s technological orientation while polishing and refining a deeper understanding of origins and directions. If ‘true artificial intelligence’ is the eventual outcome of contemporary interests, then the final product must combine all these models, or their descendants, into something unified and coherent.

Lecture 29 Applications to CR, NLP and CV (Lesson Conclusion)

Module 8: Code Base

Lecture 30 Code Base (Lesson Overview & Keras Samples & Python)

Conveniently, Keras (at http://keras.io) includes a wide range of Python-based access to three popular deep learning software systems (‘back-end’ – the software that provides functional implementation of Keras-directed models), TensorFlow, Microsoft Cognitive Toolkit and Theano. Even better, to support novice developers, this well-managed product presents a variety of fast, easy-to-learn tutorials ready for copy/paste into the Integrated Development Environment (IDE) of your choice – simply compile and run them at your convenience. Even better than better, to support live examination of code (and underlying back-end software) functionality, Keras provides multiple real-world datasets, which the developer may access with trivial effort.

Lecture 31 Code Base (MLP for OCR)

Recall, Optical Character Recognition (OCR) is the algorithm processing of converting graphical renditions of letters and/or numerals into machine-interpretable equivalents – that is, the conversion of images of typed characters into logical (binary) representations of those characters. Also, recall that an MLP exemplifies construction of a basic “vanilla-flavored” neural network (or graph model).

Lecture 32 Code Base (MLP for Topic Classification)

Recall that an MLP exemplifies construction of a basic “vanilla-flavored” neural network (or graph model). The previous code sample applied an MLP to an Optical Character Recognition problem. This script applies a similarly constructed MLP to a text-based classification task, which might provide a component within a discourse-analysis or cross-reference deployment.

Lecture 33 Code Base(RNN for LSTM)

Recall that ‘Long Short-Term Memory’ (LSTM) represents the implementation of a ‘Recurrent Neural Network’ (RNN), which provides temporal (or, as in this case, sequential) context to a model. Implemented against the ‘IMDB Movie Database’ (IMDB), this script performs Sentiment Analysis by analyzing text-based movie review input for semantic relationships. In this case, recurrence represents context by providing sequential, word-part-based ‘awareness’ within the model.

Lecture 34 Code Base (CNN for Text Classification)

The previous sample used a ‘Recurrent Neural Network’ (RNN) model to construct a ‘Long Short-Term Memory’ (LSTM) network used to perform ‘Sentiment Analysis’ against the ‘IMDB Movie Database’ (IMDB). This script performs a similar analysis, that of ‘Text Classification’, using a convolutional approach implemented as a simple 1D CNN.

Lecture 35 Code Base (CNN for OCR)

A previous code sample performed this OCR task using a Multi-Layer Perceptron (MLP). Both perform the same analysis using vastly different approaches, however the MLP is more efficient for most such tasks. This example, then, demonstrates a basic CNN, as applied to a real-world problem.

Lecture 36 VAE as a GAN

Recall, a primary benefit of a GAN solution is its ability to leverage its adversarial design to automate training of one network as applied against the other, and vice versa. These models have a single input/output set, which essentially represent the input/output graphic. The VAE analyzes its input to develop a ‘style’ (e.g., the basic brush-stroke style of Renoir or Picasso), and then apply this style to a target image – in other words, a properly trained VAE can turn your family photos into priceless classical masterpieces in a single pass using the discriminative-generative halves of its design.

Module 9: Course Conclusion

Lecture 37 Course Conclusion (A Closing Analogy)

Since the belabored student has applied so much time and interest in completing this course, its conclusion will not regurgitate all that has been. Instead, it ends with yet another crude analogy. A bit like a neighborhood scavenger hunt, the deep learning enterprise is wide open and loaded with goodies! Easy scores abound! The successfully outbound student now stands at the ribbon with his or her knowledge-basket (or pocket book) in hand, ready to compete with countless others, all eager to bag the low-hanging fruit, as they say. Two attributes (features!) will essentially determine individual fortunes – experience and education. Ideally, the student’s embrace of this executive review will generate immediate rewards that lead to long-lasting benefits. In terms of the course, the student now possesses a revitalized ‘discriminative-generative cognitive model’, geared for analytical success. Think of its intended benefits as a bigger basket, better shoes, faster legs and a quicker eye! Now, then... on your marks! Get ready! Go!