Table of Contents Introduction Input Embedding Positional Encoding (PE) The Encoder Self-attention Mecanism Multi-head attention mechanism Feedforward Network The Decoder Masked Multi-head Attention Multi-head Attention Feedforward Network Linear and Softmax Layer Transformer Training Conclusion Introduction The Transformer is currently one of the most popular architectures for NLP. We can periodically...

Continue reading...# Python

## Python Profiling – Memory Profiling (Part 3, Final)

Table of Contents memory_profiler PySpy DISassembling Final Recommendations memory_profiler Similar to line_profiler, memory_profiler provides detailed memory usage measurements, with the aim of efficiently reducing memory consumption and optimizing memory usage to improve application performance.. ⚠️ Before starting using this tool, it is important to mention the impact on the execution...

Continue reading...## Python Profiling – cProfile and line_profiler Tools (Part 2)

Table of Contents cProfile SnakeViz, for cProfile insights Line-by-line Profiling About @profile decorator Other useful tools gprof2dot Pyinstrument Conclusion Appendix Install with pip or conda cProfile The Python standard library includes the profile tool by default; however, it additionally includes cProfile, which is an optimization of profile written in C....

Continue reading...## Python Profiling – Time Profiling (Part 1)

Table of Contents Introduction Time Profiling Tools functools.wraps timeit module time / gtime Conclusion Introduction Many times, the code we write requires optimizations, and profiling helps us find the problematic sections of code, investing the least amount of work on fixing the issue, while aiming for the goal of gaining...

Continue reading...## Support Vector Machines (SVM) for Classification

The purpose of this document is to present the linear classification algorithm SVM. The development of this concept has been based on previous ideas that have supported the development of SVM as an algorithm with good generalization capacity, based on an optimization criterion that minimizes complexity; with which we have...

Continue reading...## C# Sudoku Solver

(GitHub Repo: https://github.com/alulema/SudokuSolverNet) I was revisiting a couple of basic AI concepts: Depth First Search and Constraint Propagation, and I found a very good explanation by Professor Peter Norvig (Solving Every Sudoku Puzzle), I just want to add a couple of simple explanations for a better understanding of the concepts. Constraint...

Continue reading...## TensorFlow High-Level Libraries: TF Estimator

TensorFlow has several high-level libraries allowing us to reduce time modeling all with core code. TF Estimator makes it simple to create and train models for training, evaluating, predicting and exporting. TF Estimator provides 4 main functions on any kind of estimator: estimator.fit() estimator.evaluate() estimator.predict() estimator.export() All predefined estimators are...

Continue reading...## TensorFlow Way for Linear Regression

In my two previous posts, we saw how we can perform Linear Regression using TensorFlow, but I’ve used Linear Least Squares Regression and Cholesky Decomposition, both them use matrices to resolve regression, and TensorFlow isn’t a requisite for this, but you can use more general packages like NumPy. One of...

Continue reading...## Cholesky Decomposition for Linear Regression with TensorFlow

Although Linear Least Squares Regression is simple and precise, it can be inefficient when matrices get very large. Cholesky decomposition is another approach to solve matrices efficiently by Linear Least Squares, as it decomposes a matrix into a lower and upper triangular matrix (L and LT). Finally, linear regression with Cholesky decomposition...

Continue reading...## Linear Least Squares Regression with TensorFlow

Linear Least Squares Regression is by far the most widely used regression method, and it is suitable for most cases when data behavior is linear. By definition, a line is defined by the following equation: For all data points (xi, yi) we have to minimize the sum of the squared...

Continue reading...