Picking up Python as a scientist

4 minute read


My PhD supervisor prophetically decreed back in 2013 that “it might do thee some good to learn Python” (not his exact words).

I did heed his advice and looked up some tutorials but soon abandoned it since it was not directly relevant to what I was doing. Now that Python is all the buzz, I feel a tinge of regret for not persisting to learn Python. I do believe the ability to code is now essential for a scientist regardless of fields, but especially in biology where we encounter growing size of data. This Nature article <Programming: Pick up Python> lays out this argument in more details with some resources for you to begin picking up Python.

Exhibit A: Me picking up some python...

I did pick up Python starting late last year, right after my thesis defense, while waiting for my work visa. While it is true that many resources and tutorials are out there, since Python is a popular language, I often found that the tutorials are more geared towards software developers. Scientists often do not need the full-blown Python capabilities or follow certain style guidelines, since we often write small scripts with fewer than 100 lines, instead of a big software.

So here I am listing some points and good practices I pick up from learning Python that are good to know, and mention some advanced features that you can skip from tutorials if you just want to pick up Python quickly. Some of the points are not Python-specific, but good coding practices and/or ‘philosophy’ of sorts (try googling Zen of Python, for example).

Learning points for scientists from coding:

  • Code documentation
    I often opened my old script files and took a few minutes to figure out what on earth I was doing. For scripts, code comments are often enough for documentation (here is a good commenting guide). But don’t overcomment, e.g.:
    # assign value to counter -> redundant comment
    counter = 0

    Furthermore, I think scientists can learn from the way software developers do their documentation, specifically applied to our lab notebook. One needs to deliberately devote time and effort to document. I will elaborate on lab notebooking in another blogpost.

  • Code review and refactoring
    This is related to the previous point. If you write readable and well-structured code, sometimes comments become unnecessary. Code has to be revisited and revised, not only for the logic, but for structure.
  • Naming files, variables, functions
    Related to code readability. Variable name user_id is self-explanatory compared to x.
  • Dynamic typing; object types: int, float, str
  • Conditional and loop statements
    Often these make up the bulk of the logic that you need from a script, so be sure to know them well.
  • Functions and abstractions
    View your script in modular fashion: break them to steps and tuck each step in a function. This way of thinking is powerful to solve a big problem by breaking it down to manageable smaller problems.
    # bash script
    # the inner workings of do_thing_a and do_thing_b are not shown (abstracted): 
    # - overall logical structure becomes clear
    # - easy to comment out
    #do_thing_b # skip the whole function with line, instead of block, comment
  • File I/O (opening, reading, and writing file)
  • Python-specific:
    • Object types: list, tuple, set, dictionary
    • List comprehension
      Not that important, since it can always be replaced by loop, but it is a powerful Python feature. It is more succinct and faster than loop. On the other hand, it can be too terse that readability suffers.
    • Packages: sys, sys.argv, os, math, numpy, pandas, matplotlib
    • Pick a good text editor/IDE (I use vi, Jupyter notebook, and VSCode)
    • So far don’t need: assertion, try, exception, classes, decorator

Finally, here are some Python courses/tutorials I have tried:

Leave a Comment

Your email address will not be published. Required fields are marked *