Sponsored Links

Selasa, 26 Desember 2017

Sponsored Links

MEMM v.s. CRF | LiqiangGuo
src: liqiangguo.files.wordpress.com

Conditional random fields (CRFs) are a class of statistical modeling method often applied in pattern recognition and machine learning and used for structured prediction. CRFs fall into the sequence modeling family. Whereas a discrete classifier predicts a label for a single sample without considering "neighboring" samples, a CRF can take context into account; e.g., the linear chain CRF (which is popular in natural language processing) predicts sequences of labels for sequences of input samples.

CRFs are a type of discriminative undirected probabilistic graphical model. It is used to encode known relationships between observations and construct consistent interpretations. It is often used for labeling or parsing of sequential data, such as natural language processing or biological sequences and in computer vision. Specifically, CRFs find applications in POS Tagging, shallow parsing, named entity recognition, gene finding and peptide critical functional region finding, among other tasks, being an alternative to the related hidden Markov models (HMMs). In computer vision, CRFs are often used for object recognition and image segmentation.


Video Conditional random field



Description

Lafferty, McCallum and Pereira define a CRF on observations X {\displaystyle {\boldsymbol {X}}} and random variables Y {\displaystyle {\boldsymbol {Y}}} as follows:

Let G = ( V , E ) {\displaystyle G=(V,E)} be a graph such that

Y = ( Y v ) v ? V {\displaystyle {\boldsymbol {Y}}=({\boldsymbol {Y}}_{v})_{v\in V}} , so that Y {\displaystyle {\boldsymbol {Y}}} is indexed by the vertices of G {\displaystyle G} . Then ( X , Y ) {\displaystyle ({\boldsymbol {X}},{\boldsymbol {Y}})} is a conditional random field when the random variables Y v {\displaystyle {\boldsymbol {Y}}_{v}} , conditioned on X {\displaystyle {\boldsymbol {X}}} , obey the Markov property with respect to the graph: p ( Y v | X , Y w , w ? v ) = p ( Y v | X , Y w , w ~ v ) {\displaystyle p({\boldsymbol {Y}}_{v}|{\boldsymbol {X}},{\boldsymbol {Y}}_{w},w\neq v)=p({\boldsymbol {Y}}_{v}|{\boldsymbol {X}},{\boldsymbol {Y}}_{w},w\sim v)} , where w ~ v {\displaystyle {\mathit {w}}\sim v} means that w {\displaystyle w} and v {\displaystyle v} are neighbors in G {\displaystyle G} .

What this means is that a CRF is an undirected graphical model whose nodes can be divided into exactly two disjoint sets X {\displaystyle {\boldsymbol {X}}} and Y {\displaystyle {\boldsymbol {Y}}} , the observed and output variables, respectively; the conditional distribution p ( Y | X ) {\displaystyle p({\boldsymbol {Y}}|{\boldsymbol {X}})} is then modeled.

Inference

For general graphs, the problem of exact inference in CRFs is intractable. The inference problem for a CRF is basically the same as for an MRF and the same arguments hold. However, there exist special cases for which exact inference is feasible:

  • If the graph is a chain or a tree, message passing algorithms yield exact solutions. The algorithms used in these cases are analogous to the forward-backward and Viterbi algorithm for the case of HMMs.
  • If the CRF only contains pair-wise potentials and the energy is submodular, combinatorial min cut/max flow algorithms yield exact solutions.

If exact inference is impossible, several algorithms can be used to obtain approximate solutions. These include:

  • Loopy belief propagation
  • Alpha expansion
  • Mean field inference
  • Linear programming relaxations

Parameter Learning

Learning the parameters ? {\displaystyle \theta } is usually done by maximum likelihood learning for p ( Y i | X i ; ? ) {\displaystyle p(Y_{i}|X_{i};\theta )} . If all nodes have exponential family distributions and all nodes are observed during training, this optimization is convex. It can be solved for example using gradient descent algorithms, or Quasi-Newton methods such as the L-BFGS algorithm. On the other hand, if some variables are unobserved, the inference problem has to be solved for these variables. Exact inference is intractable in general graphs, so approximations have to be used.

Examples

In sequence modeling, the graph of interest is usually a chain graph. An input sequence of observed variables X {\displaystyle X} represents a sequence of observations and Y {\displaystyle Y} represents a hidden (or unknown) state variable that needs to be inferred given the observations. The Y i {\displaystyle Y_{i}} are structured to form a chain, with an edge between each Y i - 1 {\displaystyle Y_{i-1}} and Y i {\displaystyle Y_{i}} . As well as having a simple interpretation of the Y i {\displaystyle Y_{i}} as "labels" for each element in the input sequence, this layout admits efficient algorithms for:

  • model training, learning the conditional distributions between the Y i {\displaystyle Y_{i}} and feature functions from some corpus of training data.
  • decoding, determining the probability of a given label sequence Y {\displaystyle Y} given X {\displaystyle X} .
  • inference, determining the most likely label sequence Y {\displaystyle Y} given X {\displaystyle X} .

The conditional dependency of each Y i {\displaystyle Y_{i}} on X {\displaystyle X} is defined through a fixed set of feature functions of the form f ( i , Y i - 1 , Y i , X ) {\displaystyle f(i,Y_{i-1},Y_{i},X)} , which can informally be thought of as measurements on the input sequence that partially determine the likelihood of each possible value for Y i {\displaystyle Y_{i}} . The model assigns each feature a numerical weight and combines them to determine the probability of a certain value for Y i {\displaystyle Y_{i}} .

Linear-chain CRFs have many of the same applications as conceptually simpler hidden Markov models (HMMs), but relax certain assumptions about the input and output sequence distributions. An HMM can loosely be understood as a CRF with very specific feature functions that use constant probabilities to model state transitions and emissions. Conversely, a CRF can loosely be understood as a generalization of an HMM that makes the constant transition probabilities into arbitrary functions that vary across the positions in the sequence of hidden states, depending on the input sequence.

Notably in contrast to HMMs, CRFs can contain any number of feature functions, the feature functions can inspect the entire input sequence X {\displaystyle X} at any point during inference, and the range of the feature functions need not have a probabilistic interpretation.


Maps Conditional random field



Variants

Higher-order CRFs and semi-Markov CRFs

CRFs can be extended into higher order models by making each Y i {\displaystyle Y_{i}} dependent on a fixed number o {\displaystyle o} of previous variables Y i - o , . . . , Y i - 1 {\displaystyle Y_{i-o},...,Y_{i-1}} . In conventional formulations of higher order CRFs, training and inference are only practical for small values of o {\displaystyle o} (such as o <= 5), since their computational cost increases exponentially with o {\displaystyle o} .

However, another recent advance has managed to ameliorate these issues by leveraging concepts and tools from the field of Bayesian nonparametrics. Specifically, the CRF-infinity approach constitutes a CRF-type model that is capable of learning infinitely-long temporal dynamics in a scalable fashion. This is effected by introducing a novel potential function for CRFs that is based on the Sequence Memoizer (SM), a nonparametric Bayesian model for learning infinitely-long dynamics in sequential observations . To render such a model computationally tractable, CRF-infinity employs a mean-field approximation of the postulated novel potential functions (which are driven by an SM). This allows for devising efficient approximate training and inference algorithms for the model, without undermining its capability to capture and model temporal dependencies of arbitrary length.

There exists another generalization of CRFs, the semi-Markov conditional random field (semi-CRF), which models variable-length segmentations of the label sequence Y {\displaystyle Y} . This provides much of the power of higher-order CRFs to model long-range dependencies of the Y i {\displaystyle Y_{i}} , at a reasonable computational cost.

Finally, large-margin models for structured prediction, such as the structured Support Vector Machine can be seen as an alternative training procedure to CRFs.

Latent-dynamic conditional random field

Latent-dynamic conditional random fields (LDCRF) or discriminative probabilistic latent variable models (DPLVM) are a type of CRFs for sequence tagging tasks. They are latent variable models that are trained discriminatively.

In an LDCRF, like in any sequence tagging task, given a sequence of observations x = x 1 , ... , x n {\displaystyle x_{1},\dots ,x_{n}} , the main problem the model must solve is how to assign a sequence of labels y = y 1 , ... , y n {\displaystyle y_{1},\dots ,y_{n}} from one finite set of labels Y. Instead of directly modeling P(y|x) as an ordinary linear-chain CRF would do, a set of latent variables h is "inserted" between x and y using the chain rule of probability:

P ( y | x ) = ? h P ( y | h , x ) P ( h | x ) {\displaystyle P(\mathbf {y} |\mathbf {x} )=\sum _{\mathbf {h} }P(\mathbf {y} |\mathbf {h} ,\mathbf {x} )P(\mathbf {h} |\mathbf {x} )}

This allows capturing latent structure between the observations and labels. While LDCRFs can be trained using quasi-Newton methods, a specialized version of the perceptron algorithm called the latent-variable perceptron has been developed for them as well, based on Collins' structured perceptron algorithm. These models find applications in computer vision, specifically gesture recognition from video streams and shallow parsing.


Gaussian Conditional Random Field Network for Semantic ...
src: slideplayer.com


Software

This is a partial list of software that implement generic CRF tools.

  • RNNSharp CRFs based on recurrent neural networks (C#, .NET)
  • CRF-ADF Linear-chain CRFs with fast online ADF training (C#, .NET)
  • CRFSharp Linear-chain CRFs (C#, .NET)
  • GCO CRFs with submodular energy functions (C++, Matlab)
  • DGM General CRFs (C++)
  • GRMM General CRFs (Java)
  • factorie General CRFs (Scala)
  • CRFall General CRFs (Matlab)
  • Sarawagi's CRF Linear-chain CRFs (Java)
  • HCRF library Hidden-state CRFs (C++, Matlab)
  • Accord.NET Linear-chain CRF, HCRF and HMMs (C#, .NET)
  • Wapiti Fast linear-chain CRFs (C)
  • CRFSuite Fast restricted linear-chain CRFs (C)
  • CRF++ Linear-chain CRFs (C++)
  • FlexCRFs First-order and second-order Markov CRFs (C++)
  • crf-chain1 First-order, linear-chain CRFs (Haskell)
  • imageCRF CRF for segmenting images and image volumes (C++)
  • MALLET Linear-chain for sequence tagging (Java)
  • PyStruct Structured Learning in Python (Python)
  • Pycrfsuite A python binding for crfsuite (Python)
  • Figaro Probabilistic programming language capable of defining CRFs and other graphical models (Scala)
  • CRF Modeling and computational tools for CRFs and other undirected graphical models (R)
  • OpenGM Library for discrete factor graph models and distributive operations on these models (C++)
  • UPGMpp Library for building, training, and performing inference with Undirected Graphical Models (C++)

This is a partial list of software that implement CRF related tools.

  • Conrad CRF based gene predictor (Java)
  • Stanford NER Named Entity Recognizer (Java)
  • BANNER Named Entity Recognizer (Java)

Neural networks [3.2] : Conditional random fields - linear chain ...
src: i.ytimg.com


See also

  • Hammersley-Clifford theorem
  • Graphical model
  • Markov random field
  • Maximum entropy Markov model (MEMM)

DeepLab
src: liangchiehchen.com


References


Neural networks [3.7] : Conditional random fields - factors ...
src: i.ytimg.com


Further reading

  • McCallum, A.: Efficiently inducing features of conditional random fields. In: Proc. 19th Conference on Uncertainty in Artificial Intelligence. (2003)
  • Wallach, H.M.: Conditional random fields: An introduction. Technical report MS-CIS-04-21, University of Pennsylvania (2004)
  • Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields for Relational Learning. In "Introduction to Statistical Relational Learning". Edited by Lise Getoor and Ben Taskar. MIT Press. (2006) Online PDF
  • Klinger, R., Tomanek, K.: Classical Probabilistic Models and Conditional Random Fields. Algorithm Engineering Report TR07-2-013, Department of Computer Science, Dortmund University of Technology, December 2007. ISSN 1864-4503. Online PDF

Source of the article : Wikipedia

Comments
0 Comments