Text translation is widely used in different applications to translate text from one language to other for readablity of text and easy understanding. In Machine Learning, we categorize this as a Natural Language Processing problem and there are different open source projects for text translation. There are also different online services that help identify current language of text and translate to another language with high efficiency and accuracy score.

In this tutorial, we will be using Google Cloud speech translation api to identify text language and translate to other languages. With this api, we can get this done with some lines of code and need no compute on device. For usage, we need Google Speech translation service from Google Cloud and after enabling this service, we can setup a service account and get credentials. For more help on this, view this url.

https://cloud.google.com/translate

We are using python library provided by google for text translation and can install using pip.

pip install --upgrade google-cloud-translate

After installation, we can set google application credentials to environment using terminal or can use python os module.

# Windows
set GOOGLE_APPLICATION_CREDENTIALS="PATH_TO_CREDENTIALS"

# Ubuntu
export GOOGLE_APPLICATION_CREDENTIALS="PATH_TO_CREDENTIALS"

Or, if you want to use in python code, we can use os.environ module.

import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "KEY_PATH"

Now we can start working with google translation library and can perform different operations.

Initialize

First, we load python packages and initialize speech translation package.

from google.cloud import translate

parent = f"projects/PROJECT_ID"
client = translate.TranslationServiceClient()

Available Languages

We can list down available languages and their code for usage in our code.

response = client.get_supported_languages(parent=parent, display_language_code="en")
languages = response.languages

print("Total Languages:", len(languages))

# print first 5 languages
for language in languages[:5]:
    print(f"{language.language_code}: {language.display_name}")
Total Languages: 111

af: Afrikaans
sq: Albanian
am: Amharic
ar: Arabic
hy: Armenian

Detect Language

We can also detect current language by using this service. Output for detection of a sentence is a language code which we can use to identify this language.

text = "现在几点"
response = client.detect_language(parent=parent, content=text)

# show response with confidence
for language in response.languages:
    print(language.language_code, ": Confidence:", language.confidence)
zh-CN : Confidence: 1.0

Output will depend on detection results, there could be multiple languages as output and we can select top with higher confidence.

Text Translation

Now, we will pass different sentences from different languages and can translate to our target languages and check how it performs.

sentences = [
    "现在几点", # chinese
    "Je suis ravi de vous rencontrer", # French
    "میں اس سے بہت دنوں سے جانتا ہوں", # urdu
    "Encantado de conocerte", # Spanish
]

response = client.translate_text(
    contents = sentences, target_language_code = "en", parent = parent,
)
for translation in response.translations:
    print(translation.translated_text)
what time is it now
Nice to meet you
I have known him for many days
Pleased to meet you

So, as we can see, for each sentence in our list, it has converted it to english language and output is good for each input and it is very easy to use this api with only a few lines of code. For more information relate to this, view pip documentation or view this codelab on google.