Distributions¶

Overview¶

The polykin.distributions module offers a number of classes to deal with analytical and experimental (data) distributions commonly encountered in polymerization kinetics.

Class	Required arguments	Optional arguments	Category
`Flory`	DPn	M0, name	analytical
`Poisson`	DPn	M0, name	analytical
`LogNormal`	DPn, PDI	M0, name	analytical
`SchulzZimm`	DPn, PDI	M0, name	analytical
`DataDistribution`	size_data, pdf_data	kind, M0, name, ...	data

All analytical distributions have the number-average degree of polymerization (DPn) as first required positional argument. Additionally, the LogNormal and SchulzZimm distributions take the polydispersity index (PDI) as second required positional argument. In the case of DataDistribution, the required arguments are the measured/simulated values of the distribution curve. The average molar-mass of the repeating units (M0) and the distribution name (name) are optional keyword arguments.

Analytical distributions¶

In [1]:

Copied!

# %pip install polykin
from polykin.distributions import Flory, LogNormal, Poisson, SchulzZimm
# %pip install polykin
from polykin.distributions import Flory, LogNormal, Poisson, SchulzZimm

To instantiate an analytical distribution, we call the respective class constructor with the desired argument values. Here are some examples.

In [2]:

Copied!





# Poisson distribution with DPn=100, and default M0=0.100 kg/mol and name=''
p = Poisson(100)

# Flory with DPn=120, and user-defined M0 and name                                
f = Flory(120, M0=0.065, name='polymer-F')

# Log-Normal with DPn=150, PDI=1.5, default M0 and user-defined name
g = LogNormal(150, 1.5, name='polymer-L')

# Schulz-Zimm with DPn=100, PDI=3, and user-defined M0 and name                      
s = SchulzZimm(100, 3.0, M0=0.80, name='polymer-Z') 
# Poisson distribution with DPn=100, and default M0=0.100 kg/mol and name=''
p = Poisson(100)

# Flory with DPn=120, and user-defined M0 and name                                
f = Flory(120, M0=0.065, name='polymer-F')

# Log-Normal with DPn=150, PDI=1.5, default M0 and user-defined name
g = LogNormal(150, 1.5, name='polymer-L')

# Schulz-Zimm with DPn=100, PDI=3, and user-defined M0 and name                      
s = SchulzZimm(100, 3.0, M0=0.80, name='polymer-Z') 

The most important properties of a distribution can be displayed by evaluating the object directly.

In [3]:

Copied!

f
f

Out[3]:

type: Flory
name: polymer-F
DPn:  120.0
DPw:  239.0
DPz:  358.5
PDI:  1.99
M0:   0.065 kg/mol
Mn:   7.800 kg/mol
Mw:   15.535 kg/mol
Mz:   23.302 kg/mol

or, equivalently, by using the print() function.

In [4]:

Copied!

print(g)
print(g)

type: LogNormal
name: polymer-L
DPn:  150.0
DPw:  225.0
DPz:  337.5
PDI:  1.50
M0:   0.100 kg/mol
Mn:   15.000 kg/mol
Mw:   22.500 kg/mol
Mz:   33.750 kg/mol

The independent properties of the distribution (i.e., the variables which are arguments of the respective class constructor) can be modified after instantiation, if desired.

In [5]:

Copied!





f.DPn = 100
f.M0 = 0.084
f.name = 'polymer-Y'
print(f)
f.DPn = 100
f.M0 = 0.084
f.name = 'polymer-Y'
print(f)

type: Flory
name: polymer-Y
DPn:  100.0
DPw:  199.0
DPz:  298.5
PDI:  1.99
M0:   0.084 kg/mol
Mn:   8.400 kg/mol
Mw:   16.716 kg/mol
Mz:   25.074 kg/mol

Any of these properties can be accessed by its name.

In [6]:

Copied!

f.Mw/f.Mn # this should equal PDI
f.Mw/f.Mn # this should equal PDI

Out[6]:

1.9899999999999995

Probability values¶

Probability density function (pdf) as well as cumulative distribution function (cfd) values can be obtained for any chain-length by using the methods pdf() and cdf(), respectively.

In [7]:

Copied!

# mass pdf for a single chain-length
f.pdf(120) 
# mass pdf for a single chain-length
f.pdf(120)

Out[7]:

0.003628853228028265

In [8]:

Copied!

# mass pdf for a list/array of chain-lengths
f.pdf([110, 120, 130]) 
# mass pdf for a list/array of chain-lengths
f.pdf([110, 120, 130])

Out[8]:

array([0.00367815, 0.00362885, 0.00355536])

In [9]:

Copied!

# number pdf for a list/array of chain molar masses
p.pdf([9.0, 10.0, 11.0], kind='number', sizeasmass=True) #
# number pdf for a list/array of chain molar masses
p.pdf([9.0, 10.0, 11.0], kind='number', sizeasmass=True) #

Out[9]:

array([0.25042941, 0.40061472, 0.23418417])

In [10]:

Copied!

# mass cdf at the first 3 characteristic DP averages
f.cdf([f.DPn, f.DPw, f.DPz], kind='mass')
# mass cdf at the first 3 characteristic DP averages
f.cdf([f.DPn, f.DPw, f.DPz], kind='mass')

Out[10]:

array([0.26793532, 0.59535432, 0.80159978])

The pdf and cdf values are, of course, internally consistent, as illustrated by the example below.

In [11]:

Copied!





import numpy as np

x = np.arange(1, p.DPn)
pdf = p.pdf(x, kind='number')
cdf = p.cdf(x[-1], kind='number')
print('sum(pdf):', np.sum(pdf))
print('cdf:     ', cdf)
import numpy as np

x = np.arange(1, p.DPn)
pdf = p.pdf(x, kind='number')
cdf = p.cdf(x[-1], kind='number')
print('sum(pdf):', np.sum(pdf))
print('cdf:     ', cdf)

sum(pdf): 0.486634197669204
cdf:      0.4866341976692096

Random samples¶

Random samples of chain-lengths — for use in stochastic simulations, etc. — can be generated using the method random().

In [12]:

Copied!

p.random() # single value
p.random() # single value

Out[12]:

In [13]:

Copied!

p.random(5) # vector of length 5
p.random(5) # vector of length 5

Out[13]:

array([ 98, 104,  94,  96,  87])

In [14]:

Copied!

g.random((3, 4)) # array of shape (3,4)
g.random((3, 4)) # array of shape (3,4)

Out[14]:

array([[ 99., 147., 150., 620.],
       [101., 102.,  76., 157.],
       [ 40.,  43., 152., 277.]])

The random values are, of course, consistent with the distribution properties, as illustrated below for DPw.

In [15]:

Copied!





x = f.random(10**5)
DPw_random = np.sum(x**2)/np.sum(x)
print("DPw random:", DPw_random)
print("DPw 'true':", f.DPw)
x = f.random(10**5)
DPw_random = np.sum(x**2)/np.sum(x)
print("DPw random:", DPw_random)
print("DPw 'true':", f.DPw)

DPw random: 199.40217668586013
DPw 'true': 198.9999999999998

Plots¶

The method plot() allows for a quick visualization of the corresponding number, mass or GPC probability density function (pdf) or cumulative distribution function (cdf). If no arguments are specified, the mass pdf is displayed.

In [16]:

Copied!

f.plot() # same as f.plot('mass')
f.plot() # same as f.plot('mass')

No description has been provided for this image

Many other types of plots can be generated. For example, we can draw a plot overlaying the number and mass pdf, with an x-axis based on molar mass rather than chain-length.

In [17]:

Copied!

f.plot(['number','mass'], sizeasmass=True)
f.plot(['number','mass'], sizeasmass=True)

If we select a GPC-type plot, the x-axis scale is automatically switched to 'log'.

In [18]:

Copied!

g.plot('gpc', sizeasmass=True)
g.plot('gpc', sizeasmass=True)

The plot() method tries to automatically adjust the range of the x-axis. The built-in algorithm works rather well for kind='number' and kind='mass', but less so for kind='gpc'. Whenever required, the range can be adjusted with the keyword argument xrange=(xmin, xmax).

In [19]:

Copied!

g.plot('gpc', sizeasmass=True, xrange=(1.0, 300.0))
g.plot('gpc', sizeasmass=True, xrange=(1.0, 300.0))

The cdf can also be displayed by setting the corresponding keyword cdf to the desired y-axis. If cdf=1, the cdf is displayed on the primary axis (replacing the pdf). If cdf=2, the cdf is displayed on the secondary axis, while the pdf remains on the primary axis.

In [20]:

Copied!

g.plot('gpc', sizeasmass=True, cdf=1) # cdf on primary y-axis (replaces pdf)
g.plot('gpc', sizeasmass=True, cdf=1) # cdf on primary y-axis (replaces pdf)

In [21]:

Copied!

g.plot('gpc', sizeasmass=True, cdf=2) # cdf on secondary y-axis
g.plot('gpc', sizeasmass=True, cdf=2) # cdf on secondary y-axis

If we want to save the plot or further manipulate it, we can set return_objects=True, which will give us handles to the figure and axes objects.

In [22]:

Copied!

# fig, ax = g.plot('gpc', sizeasmass=True, cdf=2, return_objects=True)
# fig.savefig(g.name, bbox_inches='tight') # save the plot to `polymer-L.png`
# fig, ax = g.plot('gpc', sizeasmass=True, cdf=2, return_objects=True)
# fig.savefig(g.name, bbox_inches='tight') # save the plot to `polymer-L.png`

Lastly, the function plotdists() can be used to overlay multiple distributions on the same plot. The optional keywords arguments are those of the plot() method.

In [23]:

Copied!

from polykin.distributions import plotdists
from polykin.distributions import plotdists

In [24]:

Copied!

_ = plotdists([f, g, s], kind='gpc', xrange=(1, 1e4))
_ = plotdists([f, g, s], kind='gpc', xrange=(1, 1e4))

Mixture distributions¶

Any of the abovementioned individual distributions can be combined in any proportion to obtain a resulting mixture distribution. This feature can be used, for instance, to compute the distribution of a polymer blend, as illustrated below.

First, let's define some sample polymer distributions.

In [25]:

Copied!





a = Poisson(10, name="A")
b = Flory(50, name="B")
c = LogNormal(1000, 2.0, name="C")
d = SchulzZimm(10**4, 2.0, name="D")
a = Poisson(10, name="A")
b = Flory(50, name="B")
c = LogNormal(1000, 2.0, name="C")
d = SchulzZimm(10**4, 2.0, name="D")

Now, we define the blend using common algebra language. The (positive) numerical factors are the mass parts used to prepare the blend.

In [26]:

Copied!

blend = 0.5*a + 1*b + 0.5*c + 0.2*d
blend = 0.5*a + 1*b + 0.5*c + 0.2*d

As illustrated above, we can the object directly or use the print() command to display the key values of the distribution and of its components.

In [27]:

Copied!

blend
blend

Out[27]:

type: MixtureDistribution
name: A+B+C+D
DPn:  31.2
DPw:  2320.2
DPz:  24295.4
PDI:  74.37
M0:   0.100 kg/mol
Mn:   3.120 kg/mol
Mw:   232.020 kg/mol
Mz:   2,429.542 kg/mol

 #   Weight   Distribution        DPn        DPw    PDI
-------------------------------------------------------
 1    0.500        Poisson   1.00e+01   1.09e+01   1.09
 2    1.000          Flory   5.00e+01   9.90e+01   1.98
 3    0.500      LogNormal   1.00e+03   2.00e+03   2.00
 4    0.200     SchulzZimm   1.00e+04   2.00e+04   2.00

In [28]:

Copied!

print(blend.components_table)
print(blend.components_table)

 #   Weight   Distribution        DPn        DPw    PDI
-------------------------------------------------------
 1    0.500        Poisson   1.00e+01   1.09e+01   1.09
 2    1.000          Flory   5.00e+01   9.90e+01   1.98
 3    0.500      LogNormal   1.00e+03   2.00e+03   2.00
 4    0.200     SchulzZimm   1.00e+04   2.00e+04   2.00

And the built-in plot() method is also available for mixtures.

In [29]:

Copied!

blend.plot('gpc', cdf=2)
blend.plot('gpc', cdf=2)

Data distributions¶

In [30]:

Copied!

from polykin.distributions import DataDistribution
from polykin.distributions import DataDistribution

Data distributions are constructed from numerical values of the distribution curve. A typical example, would be a molar mass distribution curve determined by size-exclusion chromatography (SEC). Since we do not have experimental data at hand, we'll use sample data provided with the package.

In [31]:

Copied!

from polykin.distributions import sample_mmd
from polykin.distributions import sample_mmd

In [32]:

Copied!

sampleX = DataDistribution(sample_mmd['size_data'], sample_mmd['pdf_data'],
                           kind=sample_mmd['kind'], name='sample-X')
sampleX.plot('gpc')
sampleX = DataDistribution(sample_mmd['size_data'], sample_mmd['pdf_data'],
                           kind=sample_mmd['kind'], name='sample-X')
sampleX.plot('gpc')

Most of the methods illustrated above for analytical and mixture distributions are also available for DataDistribution(s).

A data distribution can be numerically "deconvoluted" to find the combination of weights and individual distributions which best describe it using the method fit(). For example, let's try deconvoluting the above distribution as a combination of 4 distributions of type LogNormal.

In [33]:

Copied!

sampleX_fit = sampleX.fit(LogNormal, 4)
sampleX_fit = sampleX.fit(LogNormal, 4)

 #   Weight   Distribution        DPn        DPw    PDI
-------------------------------------------------------
 1    0.490      LogNormal   1.05e+02   5.17e+02   4.92
 2    0.251      LogNormal   1.06e+03   1.50e+03   1.41
 3    0.199      LogNormal   5.50e+03   7.69e+03   1.40
 4    0.060      LogNormal   9.75e+04   1.72e+05   1.76

In [34]:

Copied!

_ = plotdists([sampleX, sampleX_fit], 'gpc')
_ = plotdists([sampleX, sampleX_fit], 'gpc')

The fit is quite good, but it can be made even better. Go ahead, give it a try!