Natural Language Processing (NLP) and newer algorithms

by **Cr6** Thu Aug 22, 2019 11:43 pm

Just found this interesting and possibly a way to build a more recent version of the Atom2Vec algorithm for Mathis' charge field.

Recently, NLP Prediction improved when models were trained to predict a missing or next word. For translating one language to another, this proved quite effective:
-------------

http://jalammar.github.io/illustrated-transformer/

https://github.com/rusty1s/pytorch_geometric

http://geometricdeeplearning.com/

https://github.com/jessevig/bertviz

https://github.com/openai/gpt-2

https://openai.com/resources/
https://towardsdatascience.com/deconstructing-bert-distilling-6-patterns-from-100-million-parameters-b49113672f77
https://github.com/groverpr/Machine-Learning/blob/master/notebooks/06_NLP_Fastai.ipynb
https://towardsdatascience.com/deep-learning-for-image-classification-why-its-challenging-where-we-ve-been-and-what-s-next-93b56948fcef
https://medium.com/@ODSC/best-deep-learning-research-of-2019-so-far-7bea0ed22e38
https://medium.com/huggingface/introducing-fastbert-a-simple-deep-learning-library-for-bert-models-89ff763ad384?source=collection_home---6------3-----------------------
https://www.pyimagesearch.com/2019/06/03/fine-tuning-with-keras-and-deep-learning/

Older model for the Periodic Table: https://github.com/kasimebrahim/atom2vec

PyTorch Geometric (PyG) is a geometric deep learning extension library for PyTorch.

It consists of various methods for deep learning on graphs and other irregular structures, also known as geometric deep learning, from a variety of published papers. In addition, it consists of an easy-to-use mini-batch loader for many small and single giant graphs, multi gpu-support, a large number of common benchmark datasets (based on simple interfaces to create your own), and helpful transforms, both for learning on arbitrary graphs as well as on 3D meshes or point clouds.

-------------
In the last decade, Deep Learning approaches (e.g. Convolutional Neural Networks and Recurrent Neural Networks) allowed to achieve unprecedented performance on a broad range of problems coming from a variety of different fields (e.g. Computer Vision and Speech Recognition). Despite the results obtained, research on DL techniques has mainly focused so far on data defined on Euclidean domains (i.e. grids). Nonetheless, in a multitude of different fields, such as: Biology, Physics, Network Science, Recommender Systems and Computer Graphics; one may have to deal with data defined on non-Euclidean domains (i.e. graphs and manifolds). The adoption of Deep Learning in these particular fields has been lagging behind until very recently, primarily since the non-Euclidean nature of data makes the definition of basic operations (such as convolution) rather elusive. Geometric Deep Learning deals in this sense with the extension of Deep Learning techniques to graph/manifold structured data.

by **Cr6** Fri Aug 23, 2019 12:05 am

https://thegradient.pub/why-we-released-grover/
https://medium.com/ai2-blog/counteracting-neural-disinformation-with-grover-6cf6690d463b
https://github.com/rowanz/grover
https://grover.allenai.org/
Grover
—
A State-of-the-Art Defense against Neural Fake News

Online disinformation, or fake news intended to deceive, has emerged as a major societal problem. Currently, fake news articles are written by humans, but recently-introduced AI technology based on Neural Networks might enable adversaries to generate fake news. Our goal is to reliably detect this “neural fake news” so that its harm can be minimized.

To study and detect neural fake news, we built a model named Grover. Our study presents a surprising result: the best way to detect neural fake news is to use a model that is also a generator. The generator is most familiar with its own habits, quirks, and traits, as well as those from similar AI models, especially those trained on similar data, i.e. publicly available news. Our model, Grover, is a generator that can easily spot its own generated fake news articles, as well as those generated by other AIs. In a challenging setting with limited access to neural fake news articles, Grover obtains over 92% accuracy at telling apart human-written from machine-written news. For more information, please read our publication as well as our blog post with additional experiments. For updates, also check out our project page.

Here, we demonstrate how Grover can generate a realistic-looking fake news article, and then detect that it was AI-generated.

To generate a fake news article with Grover, use the ‘Generate’ tab. Fill in some article pieces, and press ‘Generate’ next to the piece you would like to generate. Grover will generate that piece based on the data provided. For instance, if the domain is “nytimes.com”, clicking ‘Generate’ for the Article will generate a fake article body as if it were written for the New York Times.
To detect whether an article was written by Grover or a human, use the ‘Detect’ tab. Fill in the input field with article text, and click ‘Detect Fake News.’

Note that, even if Grover fails to detect a given piece as fake, our findings suggest that releasing many such articles taken together would be relatively easy to spot. Thus, if a source of Neural Fake News disseminates a large number of articles, Grover will be increasingly capable of spotting these articles as malicious.
----------

https://openai.com/blog/gpt-2-6-month-follow-up/

August 20, 2019 • 5 minute read
GPT-2: 6-Month Follow-Up

We’re releasing the 774 million parameter GPT-2 language model after the release of our small 124M model in February, staged release of our medium 355M model in May, and subsequent research with partners and the AI community into the model’s potential for misuse and societal benefit. We’re also releasing an open-source legal agreement to make it easier for organizations to initiate model-sharing partnerships with each other, and are publishing a technical report about our experience in coordinating with the wider AI research community on publication norms.
Read ReportView CodeLegal Agreement
Key things we’ve learned

1. Coordination is difficult, but possible. To date, there hasn’t been a public release of a 1558M parameter language model, though multiple organizations have developed the systems to train them, or have publicly discussed how to train larger models. For example, teams from both NLP developer Hugging Face and the Allen Institute for Artificial Intelligence (AI2) with the University of Washington have explicitly adopted similar staged release approaches to us. Since February, we’ve spoken with more than five groups who have replicated GPT-2[1]

Having these conversations is difficult, as it involves talking candidly about proprietary systems and it’s unclear who to reach out to in specific organizations to discuss such models and what the appropriate processes are for inter-org discussion about unreleased research.
.

2. Humans can be convinced by synthetic text. Research from our research partners Sarah Kreps and Miles McCain at Cornell published in Foreign Affairs says people find GPT-2 synthetic text samples almost as convincing (72% in one cohort judged the articles to be credible) as real articles from the New York Times (83%)[2]

These samples were generated via a “human-in-the-loop” process meant to simulate contemporary disinformation operations, where a human generated samples and periodically selected some for exposure to people.
. Additionally, research from AI2/UW has shown that news written by a system called “GROVER” can be more plausible than human-written propaganda. These research results make us generally more cautious about releasing language models.

3. Detection isn’t simple. In practice, we expect detectors to need to detect a significant fraction of generations with very few false positives. Malicious actors may use a variety of sampling techniques (including rejection sampling) or fine-tune models to evade detection methods. A deployed system likely needs to be highly accurate (99.9%–99.99%) on a variety of generations. Our research suggests that current ML-based methods only achieve low to mid–90s accuracy, and that fine-tuning the language models decreases accuracy further. There are promising paths forward (see especially those advocated by the developers of “GROVER”) but it’s a genuinely difficult research problem. We believe that statistical detection of text needs to be supplemented with human judgment and metadata related to the text in order to effectively combat misuse of language models.

by **Cr6** Sat Aug 24, 2019 1:51 am

Here's more info on the ImageNet competition. These algorithms could be used to map Miles' Charge Field and Nevyn's MBL renderer current and future versions (including Jared's and Airman's contributions) to valid and invalid atomic structures. Basically the algorithm could create all possible molecules found in the real world and possibly still undiscovered(?) along with their properties in the real world. Personally, I believe that a room-temperature superconductor could be found via this approach. Call me crazy geek

but it is a matter of mining what is valid and invalid in both the terms of Miles and in the terms of the current periodic table. An algorithm could possibly cross-walk these structures. What Miles' theory allows in terms of bonding prediction will likely be much richer as his papers have shown:

https://machinelearningmastery.com/introduction-to-the-imagenet-large-scale-visual-recognition-challenge-ilsvrc/

https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/

Natural Language Processing (NLP) and newer algorithms Image_classification_006

Natural Language Processing (NLP) and newer algorithms Image_classification_006

by **Cr6** Sat Aug 24, 2019 2:13 am

Also Keras is available:
https://www.learnopencv.com/keras-tutorial-using-pre-trained-imagenet-models/

12 July 2018
natural language processing
NLP's ImageNet moment has arrived

http://ruder.io/nlp-imagenet/
Natural Language Processing (NLP) and newer algorithms Feature_visualization

Natural Language Processing (NLP) and newer algorithms Feature_visualization

http://image-net.org/challenges/LSVRC/

....

So far, our argument for language modeling as a pretraining task has been purely conceptual. Pretraining a language model was first proposed in 2015 [26:1], but it remained unclear whether a single pretrained language model was useful for many tasks. In recent months, we finally obtained overwhelming empirical proof: Embeddings from Language Models (ELMo), Universal Language Model Fine-tuning (ULMFiT), and the OpenAI Transformer have empirically demonstrated how language modeling can be used for pretraining, as shown by the above figure from ULMFiT. All three methods employed pretrained language models to achieve state-of-the-art on a diverse range of tasks in Natural Language Processing, including text classification, question answering, natural language inference, coreference resolution, sequence labeling, and many others.

In many cases such as with ELMo in the figure below, these improvements ranged between 10-20% better than the state-of-the-art on widely studied benchmarks, all with the single core method of leveraging a pretrained language model. ELMo furthermore won the best paper award at NAACL-HLT 2018, one of the top conferences in the field. Finally, these models have been shown to be extremely sample-efficient, achieving good performance with only hundreds of examples and are even able to perform zero-shot learning.

....

In light of this step change, it is very likely that in a year’s time NLP practitioners will download pretrained language models rather than pretrained word embeddings for use in their own models, similarly to how pre-trained ImageNet models are the starting point for most CV projects nowadays.

However, similar to word2vec, the task of language modeling naturally has its own limitations: It is only a proxy to true language understanding, and a single monolithic model is ill-equipped to capture the required information for certain downstream tasks. For instance, in order to answer questions about or follow the trajectory of characters in a story, a model needs to learn to perform anaphora or coreference resolution. In addition, language models can only capture what they have seen. Certain types of information, such as most common sense knowledge, are difficult to learn from text alone[34] and require incorporating external information.

One outstanding question is how to transfer the information from a pre-trained language model to a downstream task. The two main paradigms for this are whether to use the pre-trained language model as a fixed feature extractor and incorporate its representation as features into a randomly initialized model as used in ELMo, or whether to fine-tune the entire language model as done by ULMFiT. The latter fine-tuning approach is what is typically done in CV where either the top-most or several of the top layers are fine-tuned. While NLP models are typically more shallow and thus require different fine-tuning techniques than their vision counterparts, recent pretrained models are getting deeper. The next months will show the impact of each of the core components of transfer learning for NLP: an expressive language model encoder such as a deep BiLSTM or the Transformer, the amount and nature of the data used for pretraining, and the method used to fine-tune the pretrained model.

Natural Language Processing (NLP) and newer algorithms Elmo

Natural Language Processing (NLP) and newer algorithms Elmo

The improvements ELMo achieved on a wide range of NLP tasks. (Source: Matthew Peters)

by **Cr6** Sat Aug 24, 2019 3:04 am

BagNet – Solving ImageNet with a Simple Bag-of-features Model5 min read
Posted on February 14, 2019 by Ran Reichman

Prior to 2012, most machine learning algorithms were statistical models which used hand-created features. The models were highly explainable and somewhat effective but failed to reach a high accuracy in many language and computer vision tasks. In 2012, AlexNet, a deep neural network model, won the 2012 ImageNet competition by a large margin, and ignited the deep learning revolution of the past 6 years.

Deep learning models have proven to be significantly more accurate than standard ML algorithms, assumingly because of their ability to ‘intuitively’ understand a concept without receiving hand-created features which characterize it. Unfortunately, due to their ‘intuitive’ understanding, deep learning models suffer from an explainability problem. It’s difficult to understand how a deep learning algorithm reached its conclusion, and accordingly, why it made a mistake when it did.

https://openreview.net/pdf?id=SkfMWhAqYQ

BagNet, a new paper from University of Tübingen (Germany), sheds new light on the tradeoff between accuracy and explainability in machine learning. It presents a model which achieves state-of-the-art results on ImageNet for non-deep learning models, comparable to results achieved by VGG-16 and surpassing AlexNet. The result could provide new insights into the capabilities of non-deep learning algorithms, and set a higher standard for both ML algorithms and challenges.

https://www.lyrn.ai/2019/02/14/bagnet-imagenet-with-a-simple-bof-model/

by **Cr6** Sun Aug 25, 2019 9:49 pm

BTW, Databricks has a pretty good intro presentation here on using GraphFrames and Keras/TensorFlow:

https://pages.databricks.com/rs/094-YMS-629/images/Keras%20MNIST%20CNN.html

On-Time Flight Performance with GraphFrames for Apache Spark
https://mbostock.github.io/d3/talk/20111116/airports.html

https://www.slideshare.net/databricks/introduction-to-neural-networks-122033415

Why Graph?

The reason for using graph structures is because it is a more intuitive approach to many classes of data problems: social networks, restaurant recommendations, or flight paths. It is easier to understand these data problems within the context of graph structures: vertices, edges, and properties. For example, flight data analysis is a classic graph problem:

- airports are represented by vertices
- flights are represented by edges.
- numerous propertiesassociated with these flights including but not limited to departure delays, plane type, and carrier.

https://dennyglee.com/2016/05/28/on-time-flight-performance-with-graphframes-for-apache-spark/
https://databricks.com/blog/2016/03/03/introducing-graphframes.html

https://databricks.com/try-databricks
https://databricks.com/mlflow
https://github.com/mlflow/mlflow/tree/master/mlflow/R/mlflow

by Sponsored content

Natural Language Processing (NLP) and newer algorithms

Natural Language Processing (NLP) and newer algorithms

Re: Natural Language Processing (NLP) and newer algorithms

Re: Natural Language Processing (NLP) and newer algorithms

Re: Natural Language Processing (NLP) and newer algorithms

Re: Natural Language Processing (NLP) and newer algorithms

Re: Natural Language Processing (NLP) and newer algorithms

Re: Natural Language Processing (NLP) and newer algorithms