Primed Toolkit
==============


.. image:: https://travis-ci.org/eyb1/primed.svg?branch=master
   :target: https://travis-ci.org/eyb1/primed
   :alt: Build Status

.. image:: https://coveralls.io/repos/github/eyb1/primed/badge.svg?branch=master
   :target: https://coveralls.io/github/eyb1/primed?branch=master
   :alt: Coverage Status

.. image:: https://readthedocs.org/projects/primed/badge/?version=latest
   :target: http://primed.readthedocs.io/?badge=latest
   :alt: Documentation Status

.. image:: https://requires.io/github/eyb1/primed/requirements.svg?branch=master
   :target: https://requires.io/github/eyb1/primed/requirements/?branch=master
   :alt: Requirements Status

.. image:: https://badge.fury.io/py/primed.svg
   :target: https://badge.fury.io/py/primed
   :alt: PyPI version

.. image:: https://img.shields.io/pypi/l/primed.svg
   :target: https://pypi.python.org/pypi/primed/
   :alt: PyPI license


NLP functions designed for fast execution times. General utility functions added as required. Currently under development, so use at your own risk, although it is likely to be stable if using ``pip``\ .

Installation
------------

Simply use ``pip``\ :

.. code-block:: bash

   pip install primed --upgrade

Then, import the module:

.. code-block:: python

   import primed as ptk

NLP Examples
------------

Text should ideally be cleaned first (i.e. free of punctuation). You can use ``ptk.clean()``.

Clean text
^^^^^^^^^^

Removes extra spaces, punctuation, and optionally lowers the text. Careful using this if parsing for names, countries, smileys etc.

.. code-block:: python

   ptk.clean('Ha, this   is fun! YUP!!!', lower=True)

Case-insensitive replace
^^^^^^^^^^^^^^^^^^^^^^^^

Possibly the fastest method for a case-insensitive replace, tested against both using an arrayed string and using ``re``.

.. code-block:: python

   ptk.ireplace('I want a hIPpo for my birthday', 'hippo', 'giraffe')

Get all (x to y) grams
^^^^^^^^^^^^^^^^^^^^^^

Returns a dictionary of the ngrams with counts. Possibly the fastest method when compared with ``itertools``\ , ``textblob``\ , ``sklearn``\ , and ``nltk``.

.. code-block:: python

   ptk.ngrams('I love cats meow like really really love cats', min_grams=1, max_grams=10)

Create a comma-separated string using the Oxford comma
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Because this is the grammatically correct way...

.. code-block:: python

   ptk.oxfordize(['cats', 'kittens', 'quantum', 'simulation'])

Capitalize i's
^^^^^^^^^^^^^^

Pretty elementary implementation, included just in case.

.. code-block:: python

   ptk.capi('i am british, and i also codify things.')

Correct a / an select
^^^^^^^^^^^^^^^^^^^^^

Uses the Carnegie Mellon University Pronouncing Dictionary (CMUdict), based on the DoD ARPAbet. Currently uses a naive fall-back; a better alternative would be to guess / learn using existing words in CMUdict.

.. code-block:: python

   ptk.a('university')

Convert to snake text
^^^^^^^^^^^^^^^^^^^^^

Existing underscores are preserved.

.. code-block:: python

   ptk.snake('Hello  There! ')

Convert to Wikipedia URI
^^^^^^^^^^^^^^^^^^^^^^^^

Naive implementation for now, hoping redirects will help with the majority of capitalization issues for words subsequent to the first.

.. code-block:: python

   ptk.wiki_uri('DELTA-V Budget')

For a given text, match all elements in a given list
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Highly-optimized (super-fast) method to return the first, or all occurrences of all elements within a list, in a given text. You also have the option of specifying whether the match must be exact, i.e. equals. Wildcards ``?``\  (exactly one word) and ``*``\  (0 or more words, up to ``max_star``\ ) are implemented. See tests for more examples. Part of our Cythonized codebase at Primed.

.. code-block:: python

   ptk.match_elements('Hello, I am having a rather wonderful day today, and I enjoy coding.', ['will not match', 'next one will', 'having * wonderful day', 'rather * wonderful day'], only_first=False, exact_match=False, max_star=3)

Utility Examples
----------------

Colourful printing
^^^^^^^^^^^^^^^^^^

.. code-block:: python

   ptk.cprint('Text or object to be stringified', style='OK', bold=True, underline=True, newline=True)

Styles available:

.. code-block:: python

   'OK':    '\033[92m'
   'INFO':  '\033[94m'
   'WARN':  '\033[93m'
   'ERROR': '\033[91m'
   'FATAL': '\033[31m'

Notes
-----

``keeper``
^^^^^^^^^^^^^^

Using ``string.translate`` is quicker than using regular expressions (see https://stackoverflow.com/a/26517161/2178980).