Monday, December 24, 2018

Secure Language Translations - Part 1 of Many



Is the Language Industry Secure?. I am proud to have worked for the past three years with an extraordinary team at Protranslating to bring security to Language Service Providers (LSP). "The language Industry is Big Business". It is an Industry that undeniably will continue to grow as the Fourth Industrial Revolution continues to unfold. And the bigger this Industry gets, the more vulnerable it will be.

Are current Language Service Providers (LSP) really secure? Are your translation vendors (linguists) offering secure translations? Is your company sensitive, private, confidential and proprietary information being translated by secure processes, partners, employees and vendors?

Even if you have a strong background check and do business only with security diligent vendors your risk exposure might be way bigger than what you can imagine. The investment to provide secure translations is not little.

Let us see some security challenges with a non comprehensive list:
  1. No company can completely control a personal device used by a vendor unless a significant cost is incurred in Mobile Device Management (MDM) software. Add to it the transaction costs related to vendor agreements to deploy the MDM in their own machines. Add the local regulations under which those vendors operate, and the MDM related liability in terms of privacy (MDM controls the remote device from head to toes). Add the IT support you must put in place for vendors under MDM. This cost will go beyond the usual profit behind today's translation services prices. Can a company provide secure translations without incurring in the high cost associated to MDM?
  2. Data kept in an end user computer, even if encrypted under current best standards, can be stolen when such device is for whatever reason in the hands of a criminal. Will an LSP be able to provide secure translations if linguists are using their own devices?
  3. Devices break and need to be repaired so unmanaged technician devices will end up with copies of your data at their disposal.
  4. Encrypting data is not as easy as some might think. Without proper key management the best encryption in the world is as good as plain text. You cannot trust end users to manage encryption of your data and therefore you can't trust them to deliver secure translations from their own devices.
  5. Everybody backs up data without looking at retention needs. Retention must apply to daily work related data and backups as well, but most backup software is not prepared to cherry pick what to delete. As a consequence even when your data is deleted from operational systems, copies of it are still accessible from backups. Without a sound retention policy, secure translations can't exist.
  6. Is deletion even enough? We know for sure it is not because of data remanence so who will take care of real data purging? If there is a chance of data remanence any offered secure translation is just a wishful thinking.
  7. Information about contacts and linguists is stored in company owned systems, most of the time separated from each other even if they communicate via bridges that keep the information synched. The fact of the matter is that private information might be all over the places.
The last mile of translation (the linguist device) is the most vulnerable and yet the most overlooked security issue in millions of insecure translations that are today performed across LSPs and linguists owned servers and devices.

Companies' hosted translation services are also at serious risks. Last year we witnessed contracts and passwords publicly available because of miss handling of confidential information.

Some might claim the usage of Secure Socket Layer (SSL) as proof of security due diligence and due care when the protocol implementation is actually insecure. Furthermore, data must be protected not only in transit but at rest and in use. SSL which in reality should not be used because it has been superseded by Transport Layer Security (TLS) just protects your data when in transit. Some believe that just using SSL and having the green lock in the browser is enough. You need to look under the hood. Finding common vulnerabilities on SSL/TLS for any website is actually quite simple: Just hit SSL Labs Server Test to start right now if you wish. You will be surprised to see how many LSP portals and online translation management systems (TMS) out there do not have an "A+" rating. Of course the question is, why some fail to protect the most basic entry point to your data? Security of data in transit therefore should not be taken for granted.

It requires a big deal of key management procedures to make sure the encryption keys do not fall in the hands of unintended recipients to guarantee that such encrypted storage is actually secure. It requires serious backup and restore considerations before you can be completely sure that your data is not actually exposed. We must not forget big leaks related to poor encryption and bad media destruction practices. Security of data at rest should not be taken for granted.

Not less effort is required from applications and users that are authorized to read in plain text any confidential data that is encrypted in transit or at rest. Application logs containing sensitive data are way more common than what you would like, and applications that show intrinsic logs containing sensitive data are not unheard of. Data in use protection demands strong security-first architecture rarely practiced even in the most regulated environments. Most LSP own or lease technology that exposes functionality via Application Programming Interfaces (API) through which intimate details could be exposed as we have all seen in the news recently. Security of data in process should not be taken for granted.

You cannot expect your translator to be a security expert but you can create secure environments. Unfortunately to do that you will need to have at least one security expert on board because it is not enough to just check for the existence of reports and certifications. The devil is in the details. Here is a start:
  1. Have comprehensive internal information security policies (ISP) that align with your business objectives
  2. Ask your LSP for their ISP and make sure they comply
  3. Ask your LSP for their internal organizational service controls (SOC) and confirm the ISP is covered by such controls
  4. Ask your LSP for internal audits on their SOC including results showing material evidence of such audit being conducted. Confirm the controls comply with your expectations about security
  5. Ask your LSP for an external audit on their SOC by Certified Public Accountants (CPA) in the form of a SOC 2 type II report. Confirm that the report complies with your expectation about security. For example you might be concerned not just about the security principle but also confidentiality, integrity, availability and privacy principles. You want to assert your LSP third parties like cloud providers do have a SOC 2 type II report and that it is periodically reviewed. You want to make sure there are SOC 2 Type II reports in the whole chain of providers. Do not go for less like SOC 1, SOC 2 Type I or SOC 3 reports. It is your right to know who handles your data and how it is handled
  6. Ask your LSP for an ISO 27001 certification. Since this certification does not come with a report then ask them for the following documents: Information security risk assessment process, Information security risk treatment process, Results of the information security risk assessment, Results of the information security risk treatment, Evidence of the monitoring and measurement of results, The documented internal audit process, Evidence of the audit programs and the audit results, Evidence of the nature of the non-conformities and any subsequent actions taken, Evidence of the results of any corrective actions taken
  7. Ask your LSP for their practices on Privacy. You want to know in exactly how many systems private information is stored. Ask about what they store in their TMS, CRM, ERP and Marketing automation tool and where are these hosted. Ask if they retain data to avoid Ransomware impact and if when they do that the retention policy kicks in on such retention as well. Ask how users can opt-out from the service, how can they be forgotten. Make sure your LSP is GDPR compliant not because they have policies in place but because you actually audit them or you trust their external auditor as a recognized and authorized CPA firm.
  8. Ask your LSP for the names of individuals in charge of handling security and check their credentials
What about real practical examples of what a secure translation process should be like? What features should a secure TMS have across its whole infrastructure and architecture design, its software development lice cycle (SDLC), its end user interface (UI) and its API. In Part 2 I start analyzing risk management frameworks and threat modeling through a real life attack example. In further parts I will continue with top measures you cannot live without as a start when it comes to protecting highly sensitive information.

Wednesday, December 19, 2018

Android Bluetooth Setup

  1. Turn on Bluetooth device and make sure it is not connected to any other device
  2. Go to Settings in your Android
  3. Search Settings for (or select the option if found): Connected Devices
  4. If the device was previously connected, then select "Previously Connected devices" and select the Bluetooth device from the list
  5. If the device was not previously selected, then select "Pair new device" and connect to it
  6. If the above does not work restart your Android, turn off the Bluetooth device and start all over following this guide

Sunday, October 14, 2018

NodeJS cleanup: clean cache or delete node_modules

From time to time node_modules accumulate changes that either delay execution or even make the program miss behave. It is a good idea from time to time to clean cache:
npm cache clean --force
Or perhaps better
rm -fr node_modules && npm i
Here is one of those cases I found that this procedure helped with. Note that this is not to say the below issue will 100% of the time be related to uncleaned node_modules:
(node:25899) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1) (node:25899) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Sunday, September 02, 2018

If your linux terminal clipboard stops working

Manipulate the X selection with xsel command. In this case "clean clipboard":
alias clear-xsel='xsel -cp && xsel -cs && xsel -cb \
        && echo Primary, secondary and clipboard X selections were cleared'

Wednesday, July 18, 2018

Best practice is the enemy of common sense

I hear this "best practice" term so much that I can't resist to ask "says who?" LOL.

When we look at any great discovery of human kind, the common denominator was actually "common sense" and in fact the so considered "best practice" at the time was actually holding further discoveries from being achieved.

With "best practice" you are guaranteed to become one more, to integrate more. With "common sense" you are guaranteed to become a different one, to differentiate more. And it is precisely there, in differentiation, where you get a competitive edge.

"Common practice" leads to a a perfect competition, stagnant, rigid and dogmatic environment. But "common sense" leads us to constant adaptation and survival that are key to achieve competitive advantage (even though such advantage is really ephemeral, unless of course everybody looks at such advantage as a "common practice" ;-)

Decimus Iunius Iuvenalis (Juvenal) apparently wrote more than 2100 years ago that "Rarus enim ferme sensus communis", meaning "Common sense is generally rare". This is the bad news for business because it is only "common sense" to expect that in the modern world of instant gratification, thinking is a hard thing to do. It is easier to go after "best practices", unfortunately.

I would go forever but I think Albert Einstein puts it in short, beautifully with "common sense": "Don't stop questioning"

Tuesday, June 05, 2018

Calendar can’t save the attachment '' to the Exchange Server - OWA, MAC Calendar and Iphone Calendar could be failing

Delete the calendar entry and start from scratch.

This has happened to me twice and in both instances it was the prelude of a bigger issue, the Exchange/Domain password expires but the error is not shown because Microsoft products cache credentials in a way that some services might still work with old passwords (caching).

In fact I have seen that the Calendar stops working even from the Outlook Web Access (OWA): I was able to interact perfectly fine with emails but my Calendar was unable to show my events.

Unfortunately this time my OWA, iPhone and MAC calendars all failed to me for three days until I realized this. Even though it was the second time it happened to me, the first time I did not realize the root cause of the problem but now I recalled this was exactly the case.

Monday, May 28, 2018

NodeJS static code security analysis

Use the ESLint security plugin to find out potential vulnerabilities in your nodejs code and the node security package (nsp) to find vulnerabilities in your dependencies. Here is a quickest way to get an idea where you are: Install eslint and the security plugin: Have a minimal eslint-sec.json file somewhere locally. Note that there is an issue I reported with one of the rules: Without messing with your project details use the plugin to get a report of where your code is in terms of common possible vulnerabilities:
eslint --no-eslintrc -c /path/to/eslint-sec.json /path/to/project/source/code/dir/
Here is a quick intro to nsp: Up to you to automate this and include it in your pipeline. No kidding, do it!

Saturday, May 05, 2018

Run kubernetes on specific cluster or context

First get credentials
gcloud container clusters get-credentials ${CLUSTER_NAME} --zone ${ZONE_NAME} --project ${PROJECT_NAME}
Then list the context:
kubectl config get-contexts
Then either switch to a context:
kubectl config use-context $CONTEXT_NAME
Or simply run each command using the --context flag. For example to list the pods in a specific cluster run:
kubectl --context $CONTEXT_NAME get pods
To avoid verbosity, create functions in ~/.profile:
kubetest() {
    kubectl --context=$TEST_CONTEXT_NAME "$@"
}

kubeprod() {
    kubectl --context=$PROD_CONTEXT_NAME "$@"
}

Friday, April 06, 2018

Removing text blocks containing repetition with Unix or Linux Power Tools

Let us illustrate the issue with an example. In the Translation Industry a TMX file is an XML representation of a translation memory (TM). This format is useful to exchange TMs. It contains translation units (tu node) with properties (prop node) with translation unit variants (tuv node) and segments (seg node) that contain the source language and the target for translation language. Many times the same segment is added again and again by the Computer Aided Translation (CAT) Tool and while useful to get more precise translations it can become a burden if you try to process such a big TMX with an open source CAT Tool like OmegaT. Since OmegaT is client side only, processing big TMX would be problematic. In such case you might want to compromise on more precise translations versus being able to use the free tool. These repetitions are mostly related to the addition of context around the specific segment (x-context-post and x-context-post seg type attribute).

The question is then how to remove the whole "tu" node containing duplicated segments and leaving just one of them (again we are losing precision in the translation output but it might be worth it because of the savings when using a free CAT Tool).

The straightforward answer would be to export the TMX from the original tool using some options provided by such tool that would allow less data to be exported, specifically ignoring context specific translations. If that is not as possibility we are left with building a tool to clean it up.

First we can get an idea of which segments are duplicated and how many times each:
cat input.tmx | grep '<seg>' \
| sort | uniq -c | sort -nr \
| grep -v '^ *1 ' > tmx-repetitions.txt
Then we can replace them by a string like DUPLICATE_NODE_PLEASE_REMOVE
cat input.tmx \
| awk '{if($0 ~ /<eg>/ && !seen[$0]++ || $0 !~ /<seg>/) print $0; \
else print "DUPLICATE_NODE_PLEASE_REMOVE"}' > input-with-marked-duplicates.tmx
Finally we can try removing the whole translation unit (tu) node with perl:
cat input-with-marked-duplicates.tmx \
| perl -0pe 's#<tu(.*?)DUPLICATE(.*?)</tu>##gs'
But if the file is big enough this won't work as expected, probably because of how perl does multiline parsing in this particular commend (in memory). This is the reason why I built open sourced bash-multiline-replace project which contains a simple bash script (multilineReplace.sh) that will eliminate full blocks from start to end patterns if they contain an inner pattern.
cat input-with-marked-duplicates.tmx \
| ./multilineReplace.sh '<tu ' 'DUPLICATE' '</tu>' 

Saturday, March 24, 2018

Pdf Bash Tools - Ghostscript - Watermarks, password protection, search, split, merge and beyond

So much pdf processing that you can do including searching, splitting, merging, pdf password protection and watermarking. Yup, for free. Check and contribute to my pdf bash tools project.

Friday, March 16, 2018

Manage HP ProCurve Switches programmatically from *nix

Just released ProCurve Commander. Repeating yourself is not fun. This is not only true when it comes to management multiple switches but also to auditing them. This same idea can be used to manage Cisco switches and in general any device accessible via SSH but not friendly to remove command invocation.

Thursday, March 15, 2018

Hardening HP ProCurve HP switches

Enable SSH:
telnet 
# config
(config)# crypto key generate ssh
(config)# ip ssh
(config)# show ip ssh
(config)# exit
# exit
> exit
Confirm ssh works and disable telnet:
ssh 
# config
(config)# no telnet
(config)# exit
# exit
> exit
Change default users and set complex passwords:
password operator user-name 
password manager user-name 
Identify the switch:
# config
(config)# hostname "My ProCurve Switch  "

Wednesday, March 14, 2018

Java Applets in MAC OS X

Your only option is Safari, just as your only option is Internet Explorer for Windows. If the applet is insecure it won't run but you can always add exceptions at your own risk. From Apple System Preferences click on Java | Security tab | Edit Site List | Add | Apply | OK | Restart Safari.

Parsing CSV from bash

In one word csvkit. To install, use python pip and make sure you export the bin path:
pip install --user csvkit
export PATH="$HOME/.local/bin:$PATH"
To extract for instance the second column from clients.csv:
cat clients.csv | csvcut -c 2
An alternative ... csvtool.

To install it in Ubuntu:
sudo apt-get install csvtool
To install it in OS X:
brew install opam
opam init
eval `opam config env`
opam install csvtool
csvtool --help
To extract the second column (index 1) from sample.csv:
cat sample.csv | ~/.opam/system/bin/csvtool col 1 -
Find more from:
~/.opam/system/bin/csvtool --help

Monday, February 26, 2018

pm2 error: unknown option `--auto-exit'

Kubernetes cluster panic!!! App down!!! Libraries changes cause these specially when you do not make sure you use specific package versions. Our docker had just the below configuration which of course will deploy latest version of pm2:
...
RUN npm install pm2 -g
RUN pm2 update
...
CMD ["pm2-docker", "start", "--auto-exit", "process.yml"]
Not only pm2-docker was renamed to pm2-runtime (a symlink still exists) but in addition the --auto-exit flag does not exist any longer. Now you need to specify --no-auto-exit if needed.

Friday, February 23, 2018

Gmail to EML Add-On Terms of Service

I have coded GMail to EML because I wanted to avoid multiple clicks to download a gmail message. While I expect this Gmail add-on to help others I cannot make myself liable for any loses. Therefore: GMAIL TO EML ADD-ON (THE SOFTWARE) IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY.

Monday, January 29, 2018

GCP VM not available via SSH - ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]

Getting this error after upgrading a VM:
$ gcloud compute ssh myvm -- -vvv
OpenSSH_***, LibreSSL *.*.*
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug1: /etc/ssh/ssh_config line 56: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to *.*.*.* [*.*.*.*] port *.
debug1: connect to address *.*.*.* port *: Connection refused
ssh: connect to host *.*.*.* port *: Connection refused
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
Went ahead and activated the serial port access:
gcloud compute instances add-metadata myvm \
    --metadata=serial-port-enable=1
Accessed it from Google Cloud Console "Remote Access / Connect to Serial Console". Without doing anything else I was able to connect to the VM. I should assume that "Connect to Serial Console" restarts ssh.

Not using Selenium but got Error retrieving a new session from the selenium server - ECONNREFUSED

I have seen this issue in the past I believe with Protractor but today it was NightWatch which is configured to use just ChromeDriver without Selenium and yet was spitting out the below:
Error retrieving a new session from the selenium server Connection refused! Is selenium server started? { Error: connect ECONNREFUSED 127.0.0.1:9515 at Object._errnoException (util.js:1024:11) at _exceptionWithHostPort (util.js:1046:20) at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1182:14) code: 'ECONNREFUSED', errno: 'ECONNREFUSED', syscall: 'connect', address: '127.0.0.1', port: 9515 }
The issue was an outdated chromedriver version. Check your chrome version and use the correct driver per driver from https://sites.google.com/a/chromium.org/chromedriver/downloads

Followers