Generate Fake News using GPT-2

Make sure GPU is enabled (edit->notebook settings->Hardware Accelerator GPU)

1 Preparation

In [1]:
!git clone https://github.com/nshepperd/gpt-2.git
Cloning into 'gpt-2'...
remote: Enumerating objects: 212, done.
remote: Total 212 (delta 0), reused 0 (delta 0), pack-reused 212
Receiving objects: 100% (212/212), 4.37 MiB | 14.07 MiB/s, done.
Resolving deltas: 100% (112/112), done.
In [6]:
cd /content/gpt-2/
/content/gpt-2
In [3]:
! pip install -r requirements.txt
Requirement already satisfied: fire>=0.1.3 in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 1)) (0.1.3)
Requirement already satisfied: regex==2017.4.5 in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 2)) (2017.4.5)
Requirement already satisfied: requests==2.21.0 in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 3)) (2.21.0)
Requirement already satisfied: tqdm==4.31.1 in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 4)) (4.31.1)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from fire>=0.1.3->-r requirements.txt (line 1)) (1.11.0)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests==2.21.0->-r requirements.txt (line 3)) (1.22)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests==2.21.0->-r requirements.txt (line 3)) (2019.3.9)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests==2.21.0->-r requirements.txt (line 3)) (3.0.4)
Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests==2.21.0->-r requirements.txt (line 3)) (2.6)
In [7]:
! python download_model.py 117M
Fetching checkpoint: 1.00kit [00:00, 887kit/s]                                                      
Fetching encoder.json: 1.04Mit [00:00, 40.7Mit/s]                                                   
Fetching hparams.json: 1.00kit [00:00, 1.01Mit/s]                                                   
Fetching model.ckpt.data-00000-of-00001: 498Mit [00:07, 67.7Mit/s]                                  
Fetching model.ckpt.index: 6.00kit [00:00, 5.18Mit/s]                                               
Fetching model.ckpt.meta: 472kit [00:00, 45.0Mit/s]                                                 
Fetching vocab.bpe: 457kit [00:00, 47.8Mit/s]                                                       

2 Generate fake news

In [8]:
cd /content/gpt-2/src
/content/gpt-2/src
In [0]:
import json
import os
import numpy as np
import tensorflow as tf

import model, sample, encoder

raw_text = ''

def interact_model(
    model_name='117M',
    seed=None,
    nsamples=1,
    batch_size=1,
    length=None,
    temperature=1,
    top_k=0,
):
    """
    Interactively run the model
    :model_name=117M : String, which model to use
    :seed=None : Integer seed for random number generators, fix seed to reproduce
     results
    :nsamples=1 : Number of samples to return total
    :batch_size=1 : Number of batches (only affects speed/memory).  Must divide nsamples.
    :length=None : Number of tokens in generated text, if None (default), is
     determined by model hyperparameters
    :temperature=1 : Float value controlling randomness in boltzmann
     distribution. Lower temperature results in less random completions. As the
     temperature approaches zero, the model will become deterministic and
     repetitive. Higher temperature results in more random completions.
    :top_k=0 : Integer value controlling diversity. 1 means only 1 word is
     considered for each step (token), resulting in deterministic completions,
     while 40 means 40 words are considered at each step. 0 (default) is a
     special setting meaning no restrictions. 40 generally is a good value.
    """
    global raw_text
    if batch_size is None:
        batch_size = 1
    assert nsamples % batch_size == 0

    enc = encoder.get_encoder(model_name)
    hparams = model.default_hparams()
    with open(os.path.join('models', model_name, 'hparams.json')) as f:
        hparams.override_from_dict(json.load(f))

    if length is None:
        length = hparams.n_ctx // 2
    elif length > hparams.n_ctx:
        raise ValueError("Can't get samples longer than window size: %s" % hparams.n_ctx)

    with tf.Session(graph=tf.Graph()) as sess:
        context = tf.placeholder(tf.int32, [batch_size, None])
        np.random.seed(seed)
        tf.set_random_seed(seed)
        output = sample.sample_sequence(
            hparams=hparams, length=length,
            context=context,
            batch_size=batch_size,
            temperature=temperature, top_k=top_k
        )

        saver = tf.train.Saver()
        ckpt = tf.train.latest_checkpoint(os.path.join('models', model_name))
        saver.restore(sess, ckpt)

        # while True:
        raw_text = input("Article title >>> ")
        while not raw_text:
            print('Title should not be empty!')
            raw_text = input("Article title >>> ")
        context_tokens = enc.encode(raw_text)
        generated = 0
        for _ in range(nsamples // batch_size):
            out = sess.run(output, feed_dict={
                context: [context_tokens for _ in range(batch_size)]
            })[:, len(context_tokens):]
            for i in range(batch_size):
                generated += 1
                text = enc.decode(out[i])
                print("=" * 40 + " SAMPLE " + str(generated) + " " + "=" * 40)
                with open('out.txt', 'w') as out_file:
                    out_file.write(text)
                print(text)
        print("=" * 80)
In [10]:
cd ..
/content/gpt-2

Enter the title of the fake news. (eg. Donald Trump was caught shoplifting from Abercrombie and Fitch on Hollywood Boulevard today.)

In [11]:
interact_model(top_k=40)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /content/gpt-2/src/sample.py:51: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /content/gpt-2/src/sample.py:53: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.random.categorical instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/117M/model.ckpt
Article title >>> Donald Trump was caught shoplifting from Abercrombie and Fitch on Hollywood Boulevard today.
======================================== SAMPLE 1 ========================================


Trump's girlfriend, Lizzie Meyer, said she knew of Trump's purchase and it happened between the time the actress was on air and around 9:20 p.m. ET. A photographer with an AP photo booth spotted her, who went inside her home in Manhattan to check on her. "I see Trump walk up to her house," she said.

Hollywood real estate manager Tom Schaffer, who has worked with Trump multiple times, said on the show that Trump used "a great combination of good looking, a very pleasant personality and a good deal of success" and that "the other guy did just great."

Trump has a past in movies for making fun of women in public. In 2004, after the release of the 1995 Oscar-winning "Lincoln," Trump told fans that his previous efforts to "redefine the way he behaves around the world" failed, according to the Hollywood Reporter. (That was before Trump became president.) He told The Wall Street Journal that Trump's other previous films included "The One with the Great Hall of the Chief Engineer, 'I Wish I Was Your President,'" "One Flew Over the Cuckoo's Nest," "I Can Fly," "The Man in the High-Flying Chair" and "JFK's Last Ride." (He has also been known for appearing in Disney movies.)<|endoftext|>By Michael Pimentel and Robert Cattaneo

Washington (CNN) — The GOP has lost the Senate election.

But the party's top leader is still up for reelection next year. "Our best hope for 2018 is now that the Supreme Court decision has been overturned," said Senate Majority Leader Mitch McConnell. "If it wasn't, we could have won the Senate, now that the court case has been upheld, but it seems that the president, rather than some outside force, decided what is in the best interest of the country."

"And this is a political reality for most of the country as we know it," said McConnell. "Democrats believe it must be." For them, it's the end result of a long, long struggle.

The party's political and financial elite are divided over whether the Senate is a free-fall or a rigged election. At their most cynical, Republicans maintain a hold over the GOP in Senate races and hope to pick up one seat if Republicans lose by 5-to-1. Democrats have an advantage. Their political calculations have been made that they need a majority of just one Senate
================================================================================
In [12]:
! pip install names
Requirement already satisfied: names in /usr/local/lib/python3.6/dist-packages (0.3.0)
In [0]:
import names
import datetime


title = '# ' + raw_text[:-1]

repoter1 = names.get_full_name()
repoter2 = names.get_full_name()

time = datetime.datetime.now()

file = open('out.txt','r')
text = file.read()
idx = text.rfind('\n')

full_text = title + '  \n\n' + '**By ' + repoter1 + ' and ' + repoter2 + '**  \n' + str(time)[:10] + '  \n\n' + '**(FNN) -**' + text[1:idx]

with open('fakenews.md', 'w') as out_file:
  out_file.write(full_text)

The generated fake news is saved in fakenews.md