Primed Toolkit
==============


.. image:: https://travis-ci.org/eyb1/primed.svg?branch=master
   :target: https://travis-ci.org/eyb1/primed
   :alt: Build Status

.. image:: https://coveralls.io/repos/github/eyb1/primed/badge.svg?branch=master
   :target: https://coveralls.io/github/eyb1/primed?branch=master
   :alt: Coverage Status

.. image:: https://readthedocs.org/projects/primed/badge/?version=latest
   :target: http://primed.readthedocs.io/?badge=latest
   :alt: Documentation Status

.. image:: https://requires.io/github/eyb1/primed/requirements.svg?branch=master
   :target: https://requires.io/github/eyb1/primed/requirements/?branch=master
   :alt: Requirements Status

.. image:: https://badge.fury.io/py/primed.svg
   :target: https://badge.fury.io/py/primed
   :alt: PyPI version

.. image:: https://img.shields.io/pypi/l/primed.svg
   :target: https://pypi.python.org/pypi/primed/
   :alt: PyPI license


General utility and NLP functions. Currently under development, so use at your own risk. Tests and thorough documentation will be added at some point (meaning probably never, but I will bear it in mind).

Installation
------------

Simply use ``pip``\ :

.. code-block:: bash

   pip install primed --upgrade

Then, import the module:

.. code-block:: python

   import primed as ptk

NLP Examples
------------

Text should ideally be cleaned first (i.e. free of punctuation). You can use ``ptk.clean()``.

Clean text
^^^^^^^^^^

Removes extra spaces, punctuation, and optionally lowers the text. Careful using this if parsing for names, countries, smileys etc.

.. code-block:: python

   ptk.clean('Ha, this   is fun! YUP!!!', lower=True)

Case-insensitive replace
^^^^^^^^^^^^^^^^^^^^^^^^

Possibly the fastest method for a case-insensitive replace, tested against both using an arrayed string and using ``re``.

.. code-block:: python

   ptk.ireplace('I want a hIPpo for my birthday', 'hippo', 'giraffe')

Get all (x to y) grams
^^^^^^^^^^^^^^^^^^^^^^

Returns a dictionary of the ngrams with counts. Possibly the fastest method when compared with ``itertools``\ , ``textblob``\ , ``sklearn``\ , and ``nltk``.

.. code-block:: python

   ptk.ngrams('I love cats meow like really really love cats', min_grams=1, max_grams=10)

Create a comma-separated string using the Oxford comma
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Because this is the grammatically correct way...

.. code-block:: python

   ptk.oxfordize(['cats', 'kittens', 'quantum', 'simulation'])

Capitalize i's
^^^^^^^^^^^^^^

Pretty elementary implementation, included just in case.

.. code-block:: python

   ptk.capi('i am british, and i also codify things.')

Correct a / an select
^^^^^^^^^^^^^^^^^^^^^

Uses the Carnegie Mellon University Pronouncing Dictionary (CMUdict), based on the DoD ARPAbet. Currently uses a naive fall-back; a better alternative would be to guess / learn using existing words in CMUdict.

.. code-block:: python

   ptk.a('university')

Convert to snake text
^^^^^^^^^^^^^^^^^^^^^

Existing underscores are preserved.

.. code-block:: python

   ptk.snake('Hello  There! ')

Convert to Wikipedia URI
^^^^^^^^^^^^^^^^^^^^^^^^

Naive implementation for now, hoping redirects will help with the majority of capitalization issues for words subsequent to the first.

.. code-block:: python

   ptk.wiki_uri('DELTA-V Budget')

Utility Examples
----------------

Colourful printing
^^^^^^^^^^^^^^^^^^

.. code-block:: python

   ptk.cprint('Text or object to be stringified', style='OK', bold=True, underline=True, newline=True)

Styles available:

.. code-block:: python

   'OK':    '\033[92m'
   'INFO':  '\033[94m'
   'WARN':  '\033[93m'
   'ERROR': '\033[91m'
   'FATAL': '\033[31m'

Notes
-----

``keeper``
^^^^^^^^^^^^^^

Using ``string.translate`` is quicker than using regular expressions (see https://stackoverflow.com/a/26517161/2178980).