Data scientists and data engineers often work closely together. Data scientists work on the analytics side of this partnership, creating data models and analytics studies. Once a data model or study is prepared by data scientists, data engineers deploy it by creating automated data pipelines that
Data Engineers are often compared to librarians: they won’t do your research for you, but they will make sure the resources you need are properly cataloged and accessible. When a team at a company needs data, it is the data engineer’s job to make sure the data they need exists and is organized in a database or business intelligence tool.
Data Engineers use programming languages like Python and SQL to work with data. To automate processes, they also often work with the command line interface, a tool for sending commands directly to a computer.
Python is a general-purpose computer programming language that has become one of the most common programming languages in the data world. Most data scientists and engineers who use Python work with pandas, a set of special commands developed in Python that make handling data easier and more efficient.
SQL (Structured Query Language) is a programming language designed specifically for working with databases. Programs written in SQL are called queries since they are often used to ask for information from databases. But SQL can also be used to create new tables of data or restructure existing tables.
Cloud deployment is the process by which data engineers move data onto specialized database servers accessible over the internet.
A comment is a piece of text within a program that is not executed. It can be used to provide additional information to aid in understanding the code.
# character is used to start a comment and it continues until the end of the line.
# Comment on a single lineuser = "JDoe" # Comment after code
Python supports different types of arithmetic operations that can be performed on literal numbers, variables, or some combination. The primary arithmetic operators are:
%for modulus (returns the remainder)
# Arithmetic operationsresult = 10 + 30result = 40 - 10result = 50 * 5result = 16 / 4result = 25 % 2result = 5 ** 3
The plus-equals operator
+= provides a convenient way to add a value to an existing variable and assign the new value back to the same variable. In the case where the variable and the value are strings, this operator performs string concatenation instead of addition.
The operation is performed in-place, meaning that any other variable which points to the variable being updated will also be updated.
# Plus-Equal Operatorcounter = 0counter += 10# This is equivalent tocounter = 0counter = counter + 10# The operator will also perform string concatenationmessage = "Part 1 of message "message += "Part 2 of message"
A variable is used to store data that will be used by the program. This data can be a number, a string, a Boolean, a list or some other data type. Every variable has a name which can consist of letters, numbers, and the underscore character
The equal sign
= is used to assign a value to a variable. After the initial assignment is made, the value of a variable can be updated to new values as needed.
# These are all valid variable names and assignmentuser_name = "codey"user_id = 100verified = False# A variable's value can be changed after assignmentpoints = 100points = 120
A modulo calculation returns the remainder of a division between the first and second number. For example:
4 % 2would result in the value 0, because 4 is evenly divisible by 2 leaving no remainder.
7 % 3would return 1, because 7 is not evenly divisible by 3, leaving a remainder of 1.
# Modulo operationszero = 8 % 4nonzero = 12 % 5
An integer is a number that can be written without a fractional part (no decimal). An integer can be a positive number, a negative number or the number 0 so long as there is no decimal portion.
0 represents an integer value but the same number written as
0.0 would represent a floating point number.
# Example integer numberschairs = 4tables = 1broken_chairs = -2sofas = 0# Non-integer numberslights = 2.5left_overs = 0.0
Python supports the joining (concatenation) of strings together using the
+ operator. The
+ operator is also used for mathematical addition operations. If the parameters passed to the
+ operator are strings, then concatenation will be performed. If the parameter passed to
+ have different types, then Python will report an error condition. Multiple variables or literal strings can be joined together using the
# String concatenationfirst = "Hello "second = "World"result = first + secondlong_result = first + second + "!"
The Python interpreter will report errors present in your code. For most error cases, the interpreter will display the line of code where the error was detected and place a caret character
^ under the portion of the code where the error was detected.
if False ISNOTEQUAL True:^SyntaxError: invalid syntax
A ZeroDivisionError is reported by the Python interpreter when it detects a division operation is being performed and the denominator (bottom number) is 0. In mathematics, dividing a number by zero has no defined value, so Python treats this as an error condition and will report a ZeroDivisionError and display the line of code where the division occurred. This can also happen if a variable is used as the denominator and its value has been set to or changed to 0.
numerator = 100denominator = 0bad_results = numerator / denominatorZeroDivisionError: division by zero
A string is a sequence of characters (letters, numbers, whitespace or punctuation) enclosed by quotation marks. It can be enclosed using either the double quotation mark
" or the single quotation mark
If a string has to be broken into multiple lines, the backslash character
\ can be used to indicate that the string continues on the next line.
user = "User Full Name"game = 'Monopoly'longer = "This string is broken up \over multiple lines"
SyntaxError is reported by the Python interpreter when some portion of the code is incorrect. This can include misspelled keywords, missing or too many brackets or parentheses, incorrect operators, missing or too many quotation marks, or other conditions.
age = 7 + 5 = 4File "<stdin>", line 1SyntaxError: can't assign to operator
A NameError is reported by the Python interpreter when it detects a variable that is unknown. This can occur when a variable is used before it has been assigned a value or if a variable name is spelled differently than the point at which it was defined. The Python interpreter will display the line of code where the NameError was detected and indicate which name it found that was not defined.
misspelled_variable_nameNameError: name 'misspelled_variable_name' is not defined
Python variables can be assigned different types of data. One supported data type is the floating point number. A floating point number is a value that contains a decimal portion. It can be used to represent numbers that have fractional quantities. For example,
a = 3/5 can not be represented as an integer, so the variable
a is assigned a floating point value of
# Floating point numberspi = 3.14159meal_cost = 12.99tip_percent = 0.20
print() function is used to output text, numbers, or other printable information to the console.
It takes one or more arguments and will output each of the arguments to the console separated by a space. If no arguments are provided, the
print() function will output a blank line.
print("Hello World!")print(100)pi = 3.14159print(pi)